Information Theory by Analytic Methods: The Precise Minimax Redundancy

Wojciech Szpankowski

Department of Computer Science, Purdue University (USA)

Algorithms Seminar

March 5, 2001

[summary by Thomas Klausner]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

1  Introduction

The redundancy-rate problem of universal coding is concerned with determining by how much the actual code length (representation of a word in a code) exceeds the optimal code length. Revisiting the theme of his last year's seminar talk [1], Szpankowski went into more detail explaining different models for redundancy, and introduced the generalized Shannon code in order to solve the minimax redundancy problem for a single memoryless source.

A code is defined as follows:
Definition 1   A code Cn is a mapping from the set An of all sequences of length n over the alphabet A to the set {0,1}* of binary sequences.

Most of the time we use source models which specify probabilities for specific messages. For these, P(x1n) is the probability of the message x1n, the code length of a message x1n=x1... xn, with xi A, in the code Cn will be denoted by L(Cn, x1n), and Hn(P)=-x1n P(x1n)logP(x1n) is the entropy of the probability distribution, where log is taken to base 2.

2  Basic Results

A prefix code or instantaneous code is a code in which no codeword is a prefix for another codeword; in other words, if you present the codewords as a binary trie, the valid codewords are only in the leaves (not in the internal nodes).

For prefix codes the following inequality holds:
Lemma 1  [Kraft's inequality]   For any prefix code (over a binary alphabet), the codeword lengths l1, l2, ..., lm satisfy the inequality
m
i=1
2-li 1.

A related problem is to find out how many tuples l1, ..., lm exist where equality holds. This has been tackled and solved by Flajolet and Prodinger [2]. Asymptotically, it grows as a fm, where a 0.254 and f 1.794.

Another important result is Shannon's classic lower bound on the average code length (see [3]):
Lemma 2  [Shannon]   For any code, the average code length E [L(Cn,X1n)] cannot be smaller than the entropy of the source Hn(P):
E [ L(Cn,X1n) ] Hn(P)

Trivially, one can see that there must exist at least one x1~n with
L(
~
x
1
n) -logP (
~
x
1
n).

A lemma by Barron deals with the individual lengths of the code words:
Lemma 3  [Barron]   Let L(X1n) be the length of a codeword in a code satisfying Kraft's inequality, where X1n is generated by a stationary ergodic source. For any sequence of positive constants an satisfying 2-an < , the following holds:
P { L(X1n) -logP(X1n)-an } 2-an.
From this we immediately get
L(X1n) -logP(X1n)-an   (almost surely).

3  Redundancy

Redundancy measures the distance to the optimal code state, reaching the lower bound given by the entropy. Since there are different ways to define the ``worst case,'' we define three types of redundancy: pointwise Rn(Cn, P; x1n), average Rn_(Cn, P) and maximal R*(Cn, P):
Rn(Cn,P; x1n) = L(Cn,x1n)+logP(x1n)    ( -an (a.s.)),                  
_
R
n
(Cn, P)
= EX1n [ Rn(Cn, P; X1n) ]
                 
 
= E [ L(Cn,X1n) ] - Hn(P),
                 
R*(Cn, P)
=
 
max
x1n
[ Rn(Cn, P; x1n) ] .
                 

The redundancy-rate problem consists in finding the rate of growth of the corresponding minimax quantities
_
R
n
(S)
=
 
min
Cn
 
sup
P S
E [ Rn(Cn,P; x1n) ] ,
                 
Rn*(S)
=
 
min
Cn
 
sup
P S
 
max
x1n
[ Rn(Cn,P; x1n) ] ,
                 
as n for a class S of source models.

There are also other measures of optimality, e.g. for coding, gambling, or predictions. For these, the following functions, called minimax regret functions, are used:
_
r
n
=
 
min
Cn
 
sup
P S
 
x1n
P(x1n)


Li + log
 
sup
P
P(x1n)


,
                 
rn*
=
 
min
Cn
 
max
x1n



Li + log
 
sup
P
P(x1n)


.
                 
Note that rn* = Rn*. Sometimes, the maximin regret is of interest:
~
r
n
=
 
sup
P S
 
min
Cn
 
x1n
P(x1n)


Li + log
 
sup
P
P(x1n)


.

These functions are sometimes called the average minimax regret (rn_), the maximal minimax regret (rn*), and the average maxmin regret (rn~). One can interpret these functions as target functions for the game theoretical problem of choosing L so that for all x1n, the value of the function gets as good as possible, that is, -logsupP(x1n).

In the following, we will only look at the redundancy functions.

4  Precise Maximal Redundancy

In 1978, Shtarkov proved the following bounds for the minimax redundancy:
log


 
x1n
 
sup
P S
P(x1n)


Rn*(S) log


 
x1n
 
sup
P S
P(x1n)


+1.

We want to find a precise result for Rn*(S). We start with the easier problem of finding the optimal code for maximal redundancy for a known source P
Rn*(P )=
 
min
Cn C
R*(Cn, P).
We already know that for the average redundancy of one known source
_
R
n
(P ) =
 
min
Cn C
Ex1n [ Rn(Cn, P; x1n) ] ,
the Huffmann code is optimal---indeed, it is designed so as to solve this optimization problem. For the maximal redundancy problem we introduce a new code, the generalized Shannon code.

In the ordinary Shannon code, the length of its symbol in the code for a given P is 1/P(x1n) . In the generalized Shannon code, on the other hand, we set the length to be 1/P(x1n) for some symbols x1n L and 1/P(x1n) for the others in such a way that Kraft's inequality holds. For non-dyadic codes (dyadic ones fulfill Rn*(P) = 0), we sort the probabilities P(x1n):
0 < -logp1 > < -logp2 > ... < -logp|A|n > 1    (where < x > = x - x )
and choose j0 to be the maximal j such that Kraft's inequality still holds:
j-1
i=0
pi 2
< -logpi >
 
+
|A|n
i=j
pi 2
< -logpi > -1
 
1.
Then Rn*(P) = 1 - < -logpj0 > and the generalized Shannon code with L = {1, ..., j0 } is optimal.

Now we generalize to systems of probability distributions S. Let
Q*(x1n)=
 
sup
PS
P(x1n)
 
y1n An
 
sup
PS
P(y1n)
.
Then
Rn*(S) = Rn*(Q*) + log


 
x1nAn
 
sup
PS
P(x1n)


,
with
Rn*(Q*) = 1- < -logqj0 >
as above.

If we now take the generalized Shannon code that minimizes the maximal redundancy, we get for a sequence generated by a single memoryless source, for n , and a=log1-p/p irrational:
Rn*(Pp) = -
loglog2
log2
+ o(1) = 0.5287 + o(1).

5  Average Minimax Redundancy

In the simple case where S consists of one distribution P, the computation of Rn_H is the Huffman problem:
_
R
n
H (P ) =
 
min
Cn C
 
x1n
P(x1n) Rn(Cn, P; x1n).

From known results (where we have Rn_H Rn*), we conjecture:
Conjecture 1   Under certain additional conditions, we have, as n,
_
R
n
= Rn*+Q(1)=log


 
x1n An
 
sup
PS
P(x1n)


+ Q(1).

6  Average Redundancy for Particular Codes

For single memoryless sources, we have explicit results for n for some codes. In particular, we have for the Huffman code
_
R
n
=






3
2
-
1
ln2
if a irrational,
3
2
-
1
M



< Mnb > -
1
2



- ( M(1-2-1/M) ) -12
- < Mnb > /M
 
if a =
N
M
,
for the Shannon code
_
R
n
=






1
2
if a irrational,
1
2
-
1
M



< Mnb > -
1
2



if a =
N
M
,
and for the generalized Shannon code
_
R
n
=
3
2
- 2 ln2 + o(1) 0.113705639.

For more basics and in-depth knowledge regarding analytic information theory, the interested reader is referred to Szpankowski's book [4].

References

[1]
Flajolet (Philippe). -- Analytic information theory and the redundancy rate problem [ summary of a talk by Wojciech Szpankowski ]. In Chyzak (Frdric) (editor), Algorithms Seminar, 1999--2000, pp. 133--136. -- Institut National de Recherche en Informatique et en Automatique, November 2000. Research Report n°4056.

[2]
Flajolet (Philippe) and Prodinger (Helmut). -- Level number sequences for trees. Discrete Mathematics, vol. 65, n°2, 1987, pp. 149--156.

[3]
Shannon (C. E.). -- A mathematical theory of communication. Bell System Technical Journal, vol. 27, 1948, pp. 379--423 and 623--656.

[4]
Szpankowski (Wojciech). -- Average-case analysis of algorithms on sequences. -- John Wiley & Sons, Chichester, New York, March 2001, Wiley-Interscience Series in Discrete Mathematics.

This document was translated from LATEX by HEVEA.