Previous News
Next News
Issue last updated August 20, 2000.
The Sixth Seminar on Analysis of Algorithms was held in Krinica Morska, an hour away from Gdansk, in a nice see resort. Attendees had the whole hotel all by themselves. The organization was superb, due to the efforts of Chairman Szpankowski and the constant tactful monitoring of the local organizers, especially Ryszard Sobczak and Andrzej Kusiuk. Nearly 60 participants came from four continents, with a bulk of the participants from Austria, France, and North America. The Piwo [beer] and Zubrówka [Polish vodka with the herb on which the wild buffalo feeds] were flowing as participants needed some ``magic potion'' to absorb the contents of some 43 lectures and communications. A nice feature this year was the possibility of having more of the younger gang, many of which talked on their Ph.D. thesis topic and gave excellent presentations.
There were five main lectures exploring probabilistic and analytic aspects of analysis of algorithms as we perform it to-day. (An account of other lectures will be given in a later issue of this bulletin).
![]() |
(1) |
![]() |
(2) |
There are several interesting points in Aldous lecture. First, the general approach of the probabilistic methods consists in designing an infinite (continuous) model in which the finite scale models are immersed; see Aldous' continuum random tree. This is dual to analytic-combinatorial methods that aim at an exact modelling by generating function complemented by subsequent asymptotic analysis. (``First approximate, then analyse!" versus ``First analyse then approximate!'') Second, Aldous spent quite some time advocating ``pure thought'' proofs: this is the way he envisions the probabilistic approach. This made me wonder, however, as to the amount of technology that is needed. My impression is that everything is in the eye of the beholder, and perhaps what is ``pure thought'' for some is hard work for others? Conversely, perhaps, we, analysts, should devote more time structuring our proofs by taking the ``pure thought''motto as an inspiration?
A last fact regarding this motivating lecture. One may consider the
analogous problem of the cost of a minimal spanning tree of
Kn with edge weights that are uniform
(0,1). Then Frieze showed that
the expected cost tends to
as
. I am not aware
of a finite n version of Frieze's result.
Since 1986, a number of researchers have tried to quantify the
distribution of height. This includes Devroye, Reed, Drmota, and
Robson. There is in fact a fierce competition between probabilistic
and analytic approaches, with Drmota's approach illustrating the
latter. Perhaps
there is a whole family of discrete limit distributions depending
on the values of n.
Drmota shows that the variance of tree height is O(1),
a result obtained independently and simultaneously by Reed using
probabilistic arguments.
Such properties are also related to second order
asymptotics of mean height.
Drmota's approach is based on the following nonlinear
differential recurrence for the GF of trees of height at most h:
| |
(3) |
| |
(4) |
| |
(5) |
![]()
With almost complete certainty, Drmota's analysis captures precisely the truth . This leaves however as an important open problem to establish (4): see the open problem section.
![]() |
(6) |
Many limit distributions of analytic combinatorics are known to be attainable though perturbation of a singularity analysis or a saddle point analysis. The approximations are of an exponential quadratic form, e-x2, which usually leads to Gaussian laws. However, when there is some confleunce of singularities or some ``coalescence'' of saddle points, approximations of a more complicated form should be sought. Precisely, coalescence of two saddle points is known in applied mathematics to lead to expressions involving the Airy function.
The talk then focusses on the case of random maps. Recall that a map
is a connected planar graph given together with a rigid embedding on
the plane or the Riemann sphere. Consider now the
core which is the largest 2-connected
component of a map (this is in essence the largest submap
obtained by breaking the original map at its articulation points).
Then core size admits a limiting distribution that has several
surprising features: the tails are highly dissymmetric,
decaying like x-5/2 on the left and like e-x3 on the right.
We proposed to call the distribution
the ``map-Airy'' distribution: it arises
precisely from a confluence of two saddle points (as seen via
Lagrange inversion) or, equivalently, from a certain type of confluence of
singularities (in the realm of the original generating functions)
and it involves the Airy function--whence the name given to the
distribution.
Indeed, the map-Airy distribution is found to have density
| |
(7) |
The talk by Soria replaces these results into the more general framework of composition of singularity schemes. The talk by Schaeffer makes explicit the generality of the approach. In fact a dozen or so types of maps exhibit the distribution (7) and this has implication in the fast random generation of maps with higher connectivity indices (till 4, as of now). Finally, readers of these pages have already heard about the Airy function, e.g., in the context of linear probing hashing. As a matter of fact, there is good hope to attack the evolution of the random graph Gn,m (n vertices and m edges) and of linear probing hashed tables by means of coalescing saddle points. See the AofA logo.
The first method builds upon a direct description
of Dmn in terms of a simple functional of the occupancy profile
of the table (that is, how many elements Xi adopt cell i as their
fisrt choice). In the coalescence regime R3,
this quantity becomes a functional of the Brownian bridge
in the asymptotic limit, namely,
| |
(8) |
The second method is purely analytic. It makes use of the exact combinatorial correspondance between connected graphs and linear probing hashing (Knuth, Algorithmica, 1998). A careful analysis of generating functions in the coalescence region gives access to the moments that in turn determine the law Wa.
The third method deals with the intermediate range R2. Then, there is a fairly large number of blocks in the table, each with enough variability, so that a normal law is expected. Here, this is established by a general lemma giving the distribution of sums of 2-dimensional random vectors conditioned by one of their coordinates. The approach is technically similar to earlier saddle point approaches but phrased in more probabilistic terms.
The fourth method deals with the very sparse regime and is much simpler than the other ones. Roughly, in the region R1, collisions are rare and to first order asymptotics only blocks of size 2 tend to be created. For such rare events, Poisson approximations are found by well established methods (moments or Stein's method) of the theory of rare events.
![]() |
(9) |
As exemplified by Equation (9),
many alternating sums of interest to us involve binomial
coefficients. The talk offered a perspective
on both classical and newer instances.
The starting point is known amongst computer scientists as ``Rice's
formula'', following exercises in Knuth's books and the contributions
of S.O. Rice who made use of it in several discrete models.
In fact, the formula is developed already by Nörlund in his famous
Vorlesungen über Differenzenrechnung
(Kopenhagen, 1923). This formula reads
![]() |
(10) |
A new application of Rice's integrals is describes in the talk. It is relative to distributed maxima finding and sorting. There, n persons communicate over a shared access channel (if two or more talk at the same time, the communication is scrambled). The three algorithms studied in this context are: maxima finding; selection sort; quicksort. The point is that, in such an environment, finding the maximum or the pivot of quicksort now requires a variable number of communications over the channel. What is employed for resolving conflicts over the channel is in essence the leader-election algorithm of Prodinger (``How to select a loser'', Discrete Mathematics 1993), itself a simplified version of the tree protocol of Capetanakis-Tsybakov-Mikhailov. Since the protocols are closely related to digital trees, the analyses involve a mixture of recurrences. The paper pointed at below shows that, indeed, Rice integrals and allied techniques are instrumental and provide very precise information on the behaviour of the three algorithms. Technically, the integral representations involve the Riemann zeta function that lifts the Bernoulli numbers present in the analysis. (This is also related to what Knuth calls ``the hardest asymptotic nut that he ever had to crack''.)
Naturally, there is a general philosophy behind Rice's formula: In
order to cope
with a finite sum, extrapolate analytically the coefficients as
well as the weights
so as to obtain an equivalent complex integral representation
by virtue of the residue theorem. Prodinger showed in his lecture how
this could be extended to q-binomial alternating sums
using functions akin to the q-Gamma function that itself lifts the
q-factorials
![]()
To be continued!