Previous News
Next News
Issue last updated July 30,
1998.
[30/07/98] The Princeton Meeting on Analysis of Algorithms. Here is a first account. I will provide further summaries of results in the forthcoming Research News bulletins.
Please send to me or to the Bulletin Board pointers to the paper that matches best your presentation at the meeting. If you don't have personal web pages, I can also store copies here. This will be useful to me in preparing this series of accounts and will enable others to have access to your text. Thanks!
Analysis of algorithms is almost as old as algorithms themselves. An example that I am particularly fond of is the design by Babbage around 1837of his analytical engine that was truly a universal calculator. When considering two different ways of performing multiplication, Babbage compares them according to their natural cost, namely the number of turns of the handle! A similar remark applies to Von Neumann in the 1940's when he discussed binary versus decimal designs for digital computers and came up with an early analysis of carry propagation in adders!
Algorithms are flourishing all over computer science and at the same time there takes place a tangible shift towards more effective, more algorithmic, and more computational mathematics. It is thus fitting to have a series of conferences specifically devoted to the analysis of algorithms. The fourth such meeting was just held in Princeton, during the week of July 20-25, 1998. About 50 people registered with an attendance usually close to 35 or 40. An earlier thread describes the context and programme of the meeting. I will attempt in this and the next column a synthesis of the many interesting communications.
Lectures at the meeting largely focussed on analytic methods. These are based on exact descriptions of combinatorial configurations and costs by means of generating functions (GF's). There, techniques from modern combinatorial analysis have proven their efficiency. Indeed, the meeting largely confirmed the drift from elementary attacks based on recurrences (i.e., what happens when size increases from n to n+1?) to more ``global'' methods that are based on combinatorial decompositions and on the symbolic translation into generating functions. In this way, problems that would have seem incredibly intricate a few decades ago often receive nowadays a direct and transparent treatment. As could be seen at the meeting, the symbolic approaches make progress both in the variety of objects and in the complexity of parameters considered on strings, trees, special graphs, permutations, etc. At the same time new directions emerge like the highly interesting use of operator methods from dynamical systems theory.
Asymptotic methods continue to play an important rôle. There again, the drift is from real analysis (e.g., approximations of sums by integrals) to complex variables methods. Singularities and saddle points are then essential. Of course, a benefit of these approaches is that we no longer need to relie on explicit forms of counting sequences but can deal instead with a great variety of generating functions that are only implicitly defined by functional equations. The newly emerging area of analytic information theory is deeply rooted in these principles.
Researchers in analysis of algorithms nowadays tend to ask more demanding questions. Twenty-five to thirty years ago, as we see it when reading The Art of Computer Programming, average-case estimates were the ultimate goal. Much more attention is devoted nowadays to probability distributions. It is a recognized fact (obvious from simulations) that almost any ``reasonable'' algorithm has its cost function governed by a limit probability distribution. Variance analyses and large deviation inequalities usually result from these studies. They enable us to quantify the risk of an exceedingly long running time; put otherwise, they provide pretty valuable indications on the relevance of our average case analyses.
Limit probability distributions thus abound in random combinatorial structures and in companion algorithms. This was a recurrent theme of the meeting. A priori, two different types of approaches are possible.

The algorithmic problems considered at Princeton can be accommodated broadly into the following categories:
I propose to elaborate on results of the Princeton meeting in this and the following chronicles of the AofA pages. Let's start with trees, a subject with which I am a bit more familiar.
Search trees are well-known to be a way of organizing searches that provides logarithmic access costs while accommodating data that change dynamically under the effect of insertions and deletions. They are useful if data are in sufficiently random order or, better, if some randomization scheme is used (for instance the scheme of Martinez and Roura ). In addition, a well known equivalence principle asserts the equivalence between random binary search trees (BST's) and Quicksort applied to random data.
![]()
![]()
There Y is a symbol for a random variable (RV) whose distribution is the
desired limit, Y', Y'' are independent copies
and U is a uniform RV.
This reads of quite easily: ``After normalisation with respect to the
mean (C(x) takes care of this), the cost of quicksort is that of its
two recursive calls
(Y',Y'') but applied to data of size a fraction U and (1-U) of
the original size.''
In particular, the k-th moment of this magic limit law is a polynomial
in the zeta values
, a fact first observed by
Hennequin that becomes especially easy to prove from the recursive
equation above.
In a similar vein, Fill considered path length of so-called recursive trees (the unfortunate term is due to Meir and Moon). These are labelled non-plane trees whose labels go in increasing order away from the root. Their number is plainly (n-1)! as can be seen from the fact that their exponential generating function satisfies
![]()
Golin has examined the depth of random circuits, a problem motivated
by the study of boolean models of computation. Start with a small
collection of source gates (nodes) and successively create new gates
(nodes) that are attached to f (the ``fan-in'' parameter) previously
existing nodes. For instance, with a single source and
with a fan-in of
f=1, the number of possibilities is plainly n! and we
rediscover slightly
disguised the recursive
trees considered by Fill.
A tough problem is that of the expected height
(longest branch) of these beasts, which
corresponds to a parallel computation time of the circuit. At ESA'96
Tsukiji and Xhafa presented an extended abstract that claims that
the expected height is
, with e=2.71828...
Golin showed methods reminiscent of those first used by Robson in 1979
in the case of search trees in order to establish rigorous upper and
lower bounds.
It is somewhat of a mystery to me that the constant e appears
similarly in the height of union-find trees (Devroye 1987), where
elements gradually merge into classes and the usual union-find
algorithm based on trees is employed.
(I believe that Devroye's result
also provides the analysis of the one-sided height of
standard bst's, hence, the height of recursive trees as considered by
Fill.) At any rate, this suggests
that there might be a rich analytic-combinatorial
structure behind the elegant model of random circuits.
Many efficient algorithms are based on some sort of ``switching'' when small subfiles are encountered. For instance, most practical quicksort implementations will switch to a simpler nonrecursive algorithm, like insertion sort, below a size threshold of maybe 10 or 20. Hwang's contribution addresses the analysis of such switching algorithms. What happens when using median-of-three on large files (in fact for k levels of recursion) but only standard quicksort afterwards. Hwang's result has the remarkable feature to provide uniform asymptotics for the whole range of values of k. Though there is no clear probabilistic reason, the gaussian error function pops up in the analysis. A copy of Hwang's paper has been snatched and put here in postscript.
Median variants of quicksort are also equivalent to locally balanced binary search trees, where only partial restructurings are performed (on fringe subtrees of size 3, or 5, or 7, etc). Prodinger considers the case of 3-rebalancing in a neat paper that is available on-line. The problem is to estimate the number of rotations, but not only on average (that has been known for some time) but also in distribution. The differential equation of the bivariate gf tells the story
![]()
Last in this chronicle there comes the talk by Martinez.
The random binary search tree model also applies to
multidimensional search and the k-dimensional (``k-d'') tree
of Bentley; see the figure above. Flajolet and Puech had analysed the
expected cost of
partial matches where s out of k attributes (coordinates)
are specified. (This took place in prehistorical pre-web times.) For
k-dimensional data, one has to study singularities of a
k-dimensional linear system that reflects the successive choices of
discriminating attributes in the cyclic order
.Martinez (with Panholzer and Prodinger, see the
rich paper
under Prodinger's home page) proposes instead to randomize on
which attribute is taken as
a discriminant at each node in the tree. Wooops! All of a sudden, all
one has to cope with is a second-order linear differential equation
that even admits of
a hypergeometric solution. It's good and everything becomes then much more
explicit. Sure enough, you lose a little bit in the exponent
when compared to the standard algorithm that has, e.g., for
(s,k)=(1,2),
an expected cost of
![]()