ANALYSIS of ALGORITHMS [RN 10-97]

## Research News, October 1997

Previous News Next News

These pages are edited by Philippe.Flajolet@inria.fr.

Issue of October 1997.

[28/10/97] Large deviations and Quicksort. Given a parameter like the cost function of an algorithm, mean and variance estimates provide valuable informations on the distribution. For instance, this is the basis of the second moment method: if a parameter has a standard deviation much smaller than the mean, then the distribution is concentrated around the mean. The book by Motwani and Raghavan, Randomized Algorithms provides several illsutrations of this basic fact.

Often tail results stronger than what moment methods provide are available. The probability of being far away from the mean may decay exponentially or even superexponentially fast (in units given by the standard deviation).

In a paper entitled Large Deviations for Quicksort that appeared in Journal of Algorithms, 21 (1996), pp. 476-507, McDiarmid and Hayward provide such bounds for the Quicksort algorithm. Their main theorem provides very precise quantitative information and they extend it to the classical median of 3, 5, etc, improvements of Quicksort.

There's a recent preprint entitled "Quicksort algorithm again revisited" by Knessl and Szpankowski that deals with a very detailed study of these bounds and is available on the web. What Knessl and Szpankowski do is to start from the basic recurrence over probability generating functions and proceed via nonlinear integral equations that approximate them. The method uses matched asymptotic expansions and the WKB method from applied mathematics. It is found that the limit distribution (known to exist by works of Regnier, Hennequin, and Roesler, 1989-1992) has a left tail that is much "thinner", i.e. doubly exponential than the right tail that is only simply exponential.

Remember that it is an outstanding open problem to determine whether the limit distribution (see the icon below for the exact distributions at n=20,25,30) can be characterized in terms of classical functions of analysis. At least, we know that the moments of the limit distribution are expressible in terms of values of the zeta function at the integers.

[28/10/97] Random generation of combinatorial structures. Frank Ruskey has joined these pages. Frank has done extensive research on random generation and ranking-unranking. You want to test a conjecture, need to simulate an algorithms on certain inputs of some sort. How to generate a random instance or list all instances of a fixed size? Examples may be permutations of a multiset, graphs, rooted trees, balanced trees, etc. The Combinatorial Object Server provides access to a rich collection of algorithms interactively on the web.

For combinatorial objects that are nicely decomposable, the Combstructpackage is available under Maple starting with version 5.4. For instance, you can draw at random an unlabelled binary tree of size 50 by the command

draw([B,{B=Union(Z,Prod(B,B))},unlabelled],size=50);
and all legal "grammars" are supported.

An excellent introduction to these questions is still the classic book Combinatorial Algorithms by Nijenhuis and Wilf (second edition, Academic Press, 1978). By the way, a related book called Algorithms and Complexity by Herb Wilf is now freely available on the web.

[29/10/97] Rooted trees. Frank Ruskey's pages also contain a lot of information on classical combinatorial problems. Above is a display of all rooted trees of size 5 found on his pages.

The counting sequence under combstruct can also be obtained automatically by the simple sequence of commands (Maple version 5.4):

with(combstruct);
RootedTrees:=[T,{T=Prod(Z,Set(T))},unlabelled];
seq(count(RootedTrees,size=j),j=1..25);

1, 1, 2, 4, 9, 20, 48, 115, 286, 719, 1842, 4766, 12486,
32973, 87811, 235381, 634847, 1721159, 4688676,
12826228, 35221832, 97055181, 268282855, 743724984, 2067174645

This just says that we have combinatorial objects (T) defined recursively as an "atomic" node (Z, by convention) attached to a multiset of similar objects (of type T).

Given an integer sequence like this, it is always a good idea to check the Encyclopedia of Integer Sequences [EIS] by Sloane and Plouffe (Academic Press, 1995. ISBN: 012-558630-2). Sure enough, the sequence appears there as M1180. One can even access the list (and many more sequences) on line! There, we'll find automatically for the sequence 1, 1, 2, 4, 9, etc:

%N Rooted trees with n nodes.
%F G.f. satisfies: Sum a(n) x^n = Product(1-x^{n+1})^{-a(n)}.

and much more!

See also the Resouces pages of AofA for many other interesting sites.

[29/10/97] New books. The new edition of Knuth's the Art of Computer Proramming is available, as publicized in the July issue of Research News.

I have also received the latest edition (the third) of Sedgewick's Algorithms in C (Addison-Wesley, 1997; ISBN 0-201-31452-5). What was the earlier book is now split into two volumes (!). The material of the current volume concerns fondamentals, data structures, sorting, and searching, and it has been considerably expanded. I especially enjoy the visual traces corresponding to various algorithms and various data sets. Very clever! New data structures have also found their ways into the book, like skip lists, ternary search tries, randomized binary search trees, and so on. All, as usual, carefully and simply implemented. The back cover suggests that the distribution of the previous editions was around 250,000. I guess that if the new edition is equally successful, Bob will have no trouble buying a beer to all members of this list. (Subscribe!) :-) :-)