Information Theory by Analytic Methods: The Precise Minimax Redundancy

Information Theory by Analytic Methods: The Precise Minimax Redundancy

Wojciech Szpankowski

Department of Computer Science, Purdue University (USA)

Algorithms Seminar

March 5, 2001

[summary by Thomas Klausner]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

1 Introduction

The redundancy-rate problem of universal coding is concerned with determining by how much the actual code length (representation of a word in a code) exceeds the optimal code length. Revisiting the theme of his last year's seminar talk [1], Szpankowski went into more detail explaining different models for redundancy, and introduced the generalized Shannon code in order to solve the minimax redundancy problem for a single memoryless source.

A code is defined as follows:

Definition 1 A code C_n is a mapping from the set Aⁿ of all sequences of length n over the alphabet A to the set {0,1}^* of binary sequences.

Most of the time we use source models which specify probabilities for specific messages. For these, P(x₁ⁿ) is the probability of the message x₁ⁿ, the code length of a message x₁ⁿ=x₁... x_n, with x_i Î A, in the code C_n will be denoted by L(C_n, x₁ⁿ), and H_n(P)=-å_x₁ⁿ P(x₁ⁿ)logP(x₁ⁿ) is the entropy of the probability distribution, where log is taken to base 2.

2 Basic Results

A prefix code or instantaneous code is a code in which no codeword is a prefix for another codeword; in other words, if you present the codewords as a binary trie, the valid codewords are only in the leaves (not in the internal nodes).

For prefix codes the following inequality holds:

Lemma 1 [Kraft's inequality] For any prefix code (over a binary alphabet), the codeword lengths l₁, l₂, ..., l_m satisfy the inequality

i=1

2^-l_i £ 1.

A related problem is to find out how many tuples l₁, ..., l_m exist where equality holds. This has been tackled and solved by Flajolet and Prodinger [2]. Asymptotically, it grows as a f^m, where a » 0.254 and f » 1.794.

Another important result is Shannon's classic lower bound on the average code length (see [3]):

Lemma 2 [Shannon] For any code, the average code length E [L(C_n,X₁ⁿ)] cannot be smaller than the entropy of the source H_n(P):

[

L(C_n,X₁ⁿ)

]

³ H_n(P)

Trivially, one can see that there must exist at least one x₁^~ⁿ with

ⁿ) ³ -logP

(

ⁿ).

A lemma by Barron deals with the individual lengths of the code words:

Lemma 3 [Barron] Let L(X₁ⁿ) be the length of a codeword in a code satisfying Kraft's inequality, where X₁ⁿ is generated by a stationary ergodic source. For any sequence of positive constants a_n satisfying å 2^-a_n < ¥, the following holds:

{

L(X₁ⁿ) £ -logP(X₁ⁿ)-a_n

}

£ 2^-a_n.

From this we immediately get

L(X₁ⁿ) ³ -logP(X₁ⁿ)-a_n (almost surely).

3 Redundancy

Redundancy measures the distance to the optimal code state, reaching the lower bound given by the entropy. Since there are different ways to define the ``worst case,'' we define three types of redundancy: pointwise R_n(C_n, P; x₁ⁿ), average R_n^_(C_n, P) and maximal R^*(C_n, P):

R_n(C_n,P; x₁ⁿ)

= L(C_n,x₁ⁿ)+logP(x₁ⁿ) (³ -a_n (a.s.)),

(C_n, P)

= E_X₁ⁿ

[

R_n(C_n, P; X₁ⁿ)

]

= E

[

L(C_n,X₁ⁿ)

]

- H_n(P),

R^*(C_n, P)

max

x₁ⁿ

[

R_n(C_n, P; x₁ⁿ)

]

The redundancy-rate problem consists in finding the rate of growth of the corresponding minimax quantities

(S)

min

C_n

sup

P Î S

[

R_n(C_n,P; x₁ⁿ)

]

R_n^*(S)

min

C_n

sup

P Î S

max

x₁ⁿ

[

R_n(C_n,P; x₁ⁿ)

]

as n® ¥ for a class S of source models.

There are also other measures of optimality, e.g. for coding, gambling, or predictions. For these, the following functions, called minimax regret functions, are used:

min

C_n

sup

P Î S

x₁ⁿ

P(x₁ⁿ)

é
ê
ê
ë

L_i + log

sup

P(x₁ⁿ)

ù
ú
ú
û

r_n^*

min

C_n

max

x₁ⁿ

é
ê
ê
ë

L_i + log

sup

P(x₁ⁿ)

ù
ú
ú
û

Note that r_n^* = R_n^*. Sometimes, the maximin regret is of interest:

sup

P Î S

min

C_n

x₁ⁿ

P(x₁ⁿ)

é
ê
ê
ë

L_i + log

sup

P(x₁ⁿ)

ù
ú
ú
û

These functions are sometimes called the average minimax regret (r_n^_), the maximal minimax regret (r_n^*), and the average maxmin regret (r_n^~). One can interpret these functions as target functions for the game theoretical problem of choosing L so that for all x₁ⁿ, the value of the function gets as good as possible, that is, -logsupP(x₁ⁿ).

In the following, we will only look at the redundancy functions.

4 Precise Maximal Redundancy

In 1978, Shtarkov proved the following bounds for the minimax redundancy:

log

æ
ç
ç
è

x₁ⁿ

sup

P Î S

P(x₁ⁿ)

ö
÷
÷
ø

£ R_n^*(S) £ log

æ
ç
ç
è

x₁ⁿ

sup

P Î S

P(x₁ⁿ)

ö
÷
÷
ø

+1.

We want to find a precise result for R_n^*(S). We start with the easier problem of finding the optimal code for maximal redundancy for a known source P

R_n^*(P

min

C_nÎ C

R^*(C_n, P).

We already know that for the average redundancy of one known source

) =

min

C_nÎ C

E_x₁ⁿ

[

R_n(C_n, P; x₁ⁿ)

]

the Huffmann code is optimal---indeed, it is designed so as to solve this optimization problem. For the maximal redundancy problem we introduce a new code, the generalized Shannon code.

In the ordinary Shannon code, the length of its symbol in the code for a given P is é1/P(x₁ⁿ) ù. In the generalized Shannon code, on the other hand, we set the length to be ë1/P(x₁ⁿ) û for some symbols x₁ⁿ Î L and é1/P(x₁ⁿ) ù for the others in such a way that Kraft's inequality holds. For non-dyadic codes (dyadic ones fulfill R_n^*(P) = 0), we sort the probabilities P(x₁ⁿ):

0 £

-logp₁

-logp₂

£ ... £

-logp_|A|ⁿ

£ 1 (where

= x - ë x û)

and choose j₀ to be the maximal j such that Kraft's inequality still holds:

j-1

i=0

p_i 2

-logp_i

|A|ⁿ

i=j

p_i 2

-logp_i

-1

£ 1.

Then R_n^*(P) = 1 - < -logp_j₀ > and the generalized Shannon code with L = {1, ..., j₀ } is optimal.

Now we generalize to systems of probability distributions S. Let

Q^*(x₁ⁿ)=

sup

PÎS

P(x₁ⁿ)

y₁ⁿÎ Aⁿ

sup

PÎS

P(y₁ⁿ)

Then

R_n^*(S) = R_n^*(Q^*) + log

æ
ç
ç
è

x₁ⁿÎAⁿ

sup

PÎS

P(x₁ⁿ)

ö
÷
÷
ø

with

R_n^*(Q^*) = 1-

-logq_j₀

as above.

If we now take the generalized Shannon code that minimizes the maximal redundancy, we get for a sequence generated by a single memoryless source, for n ®¥, and a=log1-p/p irrational:

R_n^*(P_p) = -

loglog2

log2

+ o(1) = 0.5287 + o(1).

5 Average Minimax Redundancy

In the simple case where S consists of one distribution P, the computation of R_n^_^H is the Huffman problem:

^H (P

) =

min

C_n Î C

x₁ⁿ

P(x₁ⁿ) R_n(C_n, P; x₁ⁿ).

From known results (where we have R_n^_^H » R_n^*), we conjecture:

Conjecture 1 Under certain additional conditions, we have, as n®¥,

= R_n^*+Q(1)=log

æ
ç
ç
è

x₁ⁿ Î Aⁿ

sup

PÎS

P(x₁ⁿ)

ö
÷
÷
ø

+ Q(1).

6 Average Redundancy for Particular Codes

For single memoryless sources, we have explicit results for n ® ¥ for some codes. In particular, we have for the Huffman code

ì
ï
ï
í
ï
ï
î

ln2

if a irrational,

æ
ç
ç
è

Mnb

ö
÷
÷
ø

(

M(1-2^-1/M)

)

^-12

Mnb

if a =

for the Shannon code

ì
ï
ï
í
ï
ï
î

if a irrational,

æ
ç
ç
è

Mnb

ö
÷
÷
ø

if a =

and for the generalized Shannon code

- 2 ln2 + o(1) » 0.113705639.

For more basics and in-depth knowledge regarding analytic information theory, the interested reader is referred to Szpankowski's book [4].

References

[1]: Flajolet (Philippe). -- Analytic information theory and the redundancy rate problem [ summary of a talk by Wojciech Szpankowski ]. In Chyzak (Frédéric) (editor), Algorithms Seminar, 1999--2000, pp. 133--136. -- Institut National de Recherche en Informatique et en Automatique, November 2000. Research Report n°4056.
[2]: Flajolet (Philippe) and Prodinger (Helmut). -- Level number sequences for trees. Discrete Mathematics, vol. 65, n°2, 1987, pp. 149--156.
[3]: Shannon (C. E.). -- A mathematical theory of communication. Bell System Technical Journal, vol. 27, 1948, pp. 379--423 and 623--656.
[4]: Szpankowski (Wojciech). -- Average-case analysis of algorithms on sequences. -- John Wiley & Sons, Chichester, New York, March 2001, Wiley-Interscience Series in Discrete Mathematics.

This document was translated from L^AT_EX by H^EV^EA.