A Combinatorial Approach to Golomb Trees

Mordecai J. Golin

Hong Kong University

Algorithms Seminar

September 22, 1997

[summary by Philippe Dumas and Michèle Soria]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

Abstract
Given a set of weights, the problem of finding the binary-tree with minimum weighted external path length is very well understood. It can be solved using Huffman encoding. The problem of finding such an (infinite) tree, with minimal path-length for an infinite set of weights, is not nearly as well studied. Twenty years ago Gallager and Van Voorhis described such trees for the case in which the infinite set of weights is a geometric series. These trees are now known as Golomb trees. Here, the problem is handled with a combinatorial approach.



Let F be an alphabet equipped with a probability distribution. The problem is to encode the alphabet into a language on the binary alphabet {0,1}, in such a way that the codeword length mean value is minimal. Such a code is said to be optimal. For a finite alphabet, the problem is known to be solved by Huffman encoding [3]: a tree is built in which each leaf is associated to a (prefix-free) codeword. Hence the path-length of the tree is the codeword length.

When the alphabet is infinite, the problem is solved only for the geometric case, that is the case when the set F is an infinite sequence a0,a1,... and the probability of letter ai occurring in a message is (1-p)pi with 0<p<1. For example, suppose that we have a string of x's and y's in which each character occurs independently of every other one, x's occurring with probability p, and y's occurring with probability 1-p. Every infinite message can be uniquely written as the concatenation of words ai=xiy, each ai occurring with probability (1-p)pi. The geometric case was studied by Gallager and Van Voorhis [1], who exhibited an optimal tree. Their technique is to construct the Huffman tree for each finite case {a0, a1, ...,an}, and take the limit in some sense when n goes to infinity. They show that the infinite limit tree is an optimal tree.

Golin's approach is based on combinatorial transformations of trees, which preserve optimality. For his purpose, the important combinatorial feature of the tree is not the whole topological structure, but only its profile, that is the number of internal nodes at each level. He extends the problem to d-ary trees. Considering the number pÎ]0,1[ and the integer d³ 2, there is a unique positive integer m which satisfies
pm+pm+1£ 1< pm+pm-1.
Define ak to be the unique positive root of equation
1-a =a k(d-1)(1-a d),
with the particular case a0=0. Using this notation, Golin's result can be stated in the following way.

Theorem 1   If am-1<p<am, then there is a unique optimal tree profile: the first levels from the root are 1,d,d2,... as long as the powers of d are smaller than m, and all the next levels are equal to m.

If
p=am, then there is an infinite set of optimal tree profiles. They all begin as in the previous case, but after the transition each level is either m or m+1.

Notice that this result extends the work of Gallager and Van Voorhis, who did not study the uniqueness of the solution.

The key point is that the geometric character of the distribution entails that the width of an optimal tree at each level is bounded. The proof is valid only for the geometric distribution, since it strongly uses the fact that the shift from a level to the next one translates into a multiplication of the weights pi's. It does not extend to other types of distributions.

References

[1]
Gallager (Robert G.) and Van Voorhis (David C.). -- Optimal Source Codes for Geometrically Distributed Integer Alphabets. IEEE Transactions on Information Theory, March 1975, pp. 228--230.

[2]
Golin (Mordecai J.). -- A Combinatorial Approach to Golomb Forests. -- September 1997. Preprint.

[3]
Huffman (D. A.). -- A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE, n°40, September 1952, pp. 1098--1101.

This document was translated from LATEX by HEVEA.