ANALYSIS of ALGORITHMS [RN 11-97]

Research News, November 1997

Previous News Next News

These pages are edited by Philippe.Flajolet@inria.fr.

Issue of November 1997.

[02/11/97] Knuth's first analysis of an algorithm. Thanks to the web, everybody can now have easy access to an original of Don Knuth's NOTES ON "OPEN" ADDRESSING . The note has 6 pages. It provided for the first time a precise analysis of hashing with linear probing. As Knuth repeatedly indicates, this problem has had a strong influence on his carreer.

The notes are dated dated July 27, 1963, with a handwritten mention
"My first analysis of an algorithm, originally done during Summer 1962 in Madison."

There, you'll find for the first time the connection between the analysis of full tables and what was to become later known as the Knuth-Ramanujan function Q(n). Of course, the analysis of sparse tables is also there.

==============

[02/11/97] The Ramanujan-Knuth Q-function. Knuth's note of 1963 concludes with a conjecture about the asymptotics of the Knuth-Ramanujan function (which Knuth proved in 1965). We now know that this function together with a statement of its asymptotic behaviour was in fact already appearing in Ramanujan's first letter to Hardy dated Januray 16, 1913. Here is a version of a paper that provides a recent asymptotic analysis of this function (P. Flajolet, P. Grabner, P. Kirschenhofer, H. Prodinger. In J. Computational and Applied Mathematics, vol. 58 (1), 1995, pp. 103-116). Bruce Berndt has devoted a considerable part of his life to editing Ramanujan's works. See his home page for more about this fascinating adventure.

As we see from Knuth's note, Ramanujan's analysis entails that the total displacement of elements, i.e., the construction cost, for a linear probing hashing table that is full is about N^(3/2).

[02/11/97] Cost distribution in linear probing hashing. In the July issue of Research Notes, we saw that Poblete-Viola and Knuth had independently solved the problem of the determining the variance of the construction cost of linear probing hashing. Knuth's analysis from July 1997 is available on the web here. In July 1997, Poblete and Viola merged forces with Flajolet. As a result, the variance analysis can be in fact extended to a full distributional analysis of linear probing hashing. The report is available on the web.

Here's what goes on. For sparse tables, that is to say tables whose filling ratio a is less than 1, then there are a large number of clusters that tend to be small. Accordingly, the distribution of costs is Gaussian in the asymptotic limit. For full tables, the distribution has a standard deviation that is of the same order as the mean and thus it is rather spread out. In the limit, what we have for this case of full tables is a so-called Airy distribution that is related to our pet icon and involves Airy functions.

For reasons explained in the above papers, the Airy distribution appears in a diversity of combinatorial problems like: (i) path length in trees, (ii) area below random walks and excursions, (iii) inversions in trees, (iv) connectivity in random graphs.

Here's the way clusters are formed in a linear probing table as it fills up (from top to bottom). See Sedgewick's new book for more of such displays (and also better ones!!)

[10/11/97] Knuth's original note in TeX. Here is the TeX version of Knuth's "Notes on `Open' Addressing". Helmut Prodinger has very kindly retyped them into TeX for everyone to see and enjoy.
Versions: [TeX | dvi | postscript | ps.gz]

[30/11/97] Odlyzko's monograph on "Asymptotic enumeration methods". In 1995, Elsevier published the Handbook of Combinatorics edited by R. L. Graham, M. Groetschel, and L. Lovasz. Andrew Odlyzko has there a survey of asymptotic enumeration methods (Volume 2 Chapter 22, pp. 1063-1229) that is really a thorough monograph of some healthy 166 pages! A preprint version of this inspiring text is on the web, as well as many other goodies that Andrew is offering on his Home Pages. I cannot resist listing the table of contents from this document that is a must for anyone interested in combinatorial asymptotics.

• Identities, indefinite summations. Basic estimates of factorials and binomial coefficients. Estimates of sums (alternating, Euler-Maclaurin and Poisson summation, bootstrapping, integrals). Generating functions (GF's). Formal power series. Elementary estimates for convergent GF's. Recurrences. Analytic generating functions. Small singularity of analytic functions (singularity analysis). Large singularities of analytic functions (saddle points). Multivariate GF's. Mellin and other integral transforms. Functional equations. Algorithmic and automated asymptotics.

It is amusing to contrast a quotation of Odlyzko (1995)

• "Analytic methods are extremely powerful and when they apply, they often yield estimates of unparalleled precision."
and a quotation of John Riordan (1968):
• "Combinatorialists use recurrence, generating functions, and such transformations as the Vandermonde convolution; others to my horror, use contour integrals, differential equations, and other resources of mathematical analysis."
This was 29 years ago...

Under Odlyzko's page, you'll find a lot of relevant papers from the guru of analytic methods. :-) Here are some that are relative to general methods and are offered electronically:

• Analytic methods in asymptotic enumeration, A. M. Odlyzko, Discrete Math. 153 (1996), pp. 229-238.
• Asymptotic enumeration methods, A. M. Odlyzko, in Handbook of Combinatorics, vol. 2, R. L. Graham, M. Groetschel, and L. Lovasz, eds., Elsevier, 1995, pp. 1063-1229.
• Singularity analysis of generating functions, P. Flajolet and A. M. Odlyzko, SIAM J. Discrete Math., 3 (1990) pp. 216-240.
• Asymptotic expansions for the coefficients of analytic generating function, A. M. Odlyzko and L. B. Richmond, Aequationes Math., 28 (1985), pp. 50-63.
• Explicit Tauberian estimates for functions with positive coefficients, A. M. Odlyzko, J. Computational Appl. Math., 41 (1992), pp. 187-197.