Research News, August 2001
Previous News
Next News
Return to Research News Index
These pages are edited by
Philippe.Flajolet@inria.fr.

Tatihou poster: original design by Mariola
Szpankowska. Brigitte Vallée and others at Tatihou.
The AofA Metting in Tatihou, July 1-7, 2001
The Seventh Seminar on Analysis of Algorithms was held
in Tatihou, a truly amazing place. Tatihou is a small island about half a
mile off the coast of Normandy in the (English?) Channel.
The isle is small but it contains a fort controlling the
bay built in the seventeenth century by Vauban
(a famous French military architect). There is also a bird
reserve, a maritime museum, and, of course, a sensational
conference center operated by the local county council.
Brigitte with her team, Ali Akhavi, Jérémie Bourdon,
Julien Clément, was responsible for making this venture possible.
Centre's portal; Abelard [left] with Julien
Clément, Brigitte Vallée, Ali Akhavi,
Julie Vallé, and Benoit Daireaux.
One of the buildings; the Vauban Tower.
There were eight long lectures, of an hour and a quarter each, which
provided ample time for speakers to develop their ideas.
These were given at the start of each half-day by Hwang, Chauvin,
Viola, Szpankowski, Devroye, Pittel, Flajolet, and Schaeffer.
Here's a brief summary.
- Hsien Kuei Hwang investigated the method of moments in relation
to the analysis of recursive algorithms of the divide-and-conquer
type. The method of moments is familiar from probability theory:
for a
distribution
characterized by its moments (this is the case for most
usual distributions, including the Gaussian law), convergence of
the moments of random variables
to those of
implies
convergence in distribution of
to
, namely,
.
In analysis of algorithms, the
are a normalized form
(usually centred around the mean, possibly rescaled by a standard
deviation) of the cost of an algorithm over inputs of size
.
Hsien Kuei examines various examples related to Quicksort,
Quickselect, Mergesort, Digital Trees, Leader Election (in a
distributed environment) and so on. In all these cases, there is a
recurrence satisfied by the probability generating function
(or equivalently, the moment generating function
of the
. One gets the first two moments by the usual devices. Then, it
is often possible to guess--then prove--a form of the limit distribution as it
should satisfy approximately the basic recurrence on probability (or
moment) generating functions. Hsien Kuei gives many examples of this
approach. The speed of convergence then derives from the Berry-Esseen
inequalities (reminder: Berry-Esseen translates distances between
characteristic functions into distances between distribution
functions). There is here a method of considerable generality which
in a sense parallels the recursive method developed by Rösler,
Rüschendorf, and Neininger in similar contexts.
Many papers by Hsien
Kuei nicely illustrating this method are to be found at
his web site: see esp. his
publications in years
.
As related reading, I also recommend the study by Boris Pittel,
``Normal convergence problem? Two moments and a recurrence may
be the clues'', Ann. Appl. Probab. 9 (1999),
1260-1302, corresponding to his talk at the AofA'98 in Princeton.
- Brigitte Chauvin helped many of us discover the power of
martingale arguments in the analysis of algorithms and data
structures. Brigitte
started her by a comprehensive review
tuned to the needs of analysis of algorithms.
She then proceeded to develop a
martingale argument due to herself and her student Jean Jabbour
which gives truly surprising results regarding the
probable shape of binary search trees. Take a random binary search
tree and assocaite to it its level polynomial
defined by teh
fact that
is the number of external nodens at
level
. This is a random polynomial whose expectation is well known
to equal
. The big news is that
the normalized sequence of (random) polynomials
form a martingale. The consequences are cornucopious: this gives
quantitative properties that hold almost surely when a binary search tree
evolves under successive insertions from the empty tree till the end of
time--a point of view that differs somewhat from what is done usually
in AofA. It also shows that the profile of the tree is in probability
almost Gaussian. This property was known in the expected value sense since
the 1980's (Louchard). Here, the strong form of this property says the
following: simulate a random BST of size (say)
and build the
histogram of depths of extrenal nodes; you will only find bell-shaped
histograms. This property had been conjectured by Sedgewick and myself
when we did some empirical simulations, and much to our astronishment
invariably found bell-shaped images like this one.
Another surprising consequence is that the width of a random BST is in
probability
. It's hard to imagine such a
result to be accessible by analytic methods alone!
See the tutorial by Brigitte Chauvin
``Discrete martingales and applications to analysis of algorithms''
[for those who read French].
See the
joint paper
with Drmota and Jabbour for martingales associated to BSTs.
- Alfredo Viola examined Rabin's irreducibility test for polynomials.
We are talking here of algorithms that operate on polynomials, which are
basic objects of symbolic manipulation systems, but also of
cryptography and of the theory of error-correcting codes.
Following a tradition started by Berlekamp, Knuth (in
volume 2 of TAOCP) showed that we can subject these algorithms
to the usual questions: What is the expected complexity?
What is the distribution of costs? The subject is currently pursued by
Daniel Panario (Carleton University, Ottawa) and Alfredo Viola
(Montevideo) and we start knowing quite a bit on these algorithms.
For instance, the usual polynomial factorization chain
(based on elimination of repeated factors, distinct degree
factorization, and Cantor-Zassenhaus equal-degree factorization) has
been completely analysed in Journal of Algorithms 40(1), 2001,
pp. 37-81. The interest of these studies is enhanced by the fact that
the algorithms are often probabilistic in nature, so that the
randomness assumptions apply to perfection. In his talk, Alfredo
reviewed Rabin's irreducibility test which answers the question:
When is a polynomial irreducible (i.e., not factorable, a sort of ``prime'')?
He presented results from the paper by
Daniel Panario, Boris Pittel, Bruce Richmond, Alfredo Viola,
``Analysis of Rabin's irreducibility test for polynomials over finite
fields'', Random
Structures and Algorithms 19(3-4): 525-551 (2001)
which improves on the earlier LATIN-1998 conference version.
Alfredo, where's your home page with your publications?
- Wojtek Szpankowski
presented a survey talk on
New and Old Problems in Pattern Occurrences.
Let
and
be sequences generated ovr
er a finite alphabet. We call
the pattern and
the text. The basic questions are
How many times does
occurs in
?
how long one has to wait until
occurs in
?
In the string matching problem the pattern is considered to be a
string of consecutive symbols (a ``factor''), while in
the subsequence matching problem the pattern is a subsequence.
We may ask that the pattern occurs either exactly or approximately.
In all cases, we assume that the
pattern is given and
the text is random and generated among all
possible strings
by some source model of sorts, the simplest cases being the uniform model,
the Bernoulli, and the Markov sources. Wojtek
showed us how to compute moments, limiting distributions and large
deviations for the number
of pattern occurrences and the waiting time.
Throughout he uses combinatorial (e.g., symbolic) and analytic
methods, based on
generating functions (GFs) and singularity analysis.
For this dense survey, we refer to Wojtek's recent book and to his web
page. I note that the limiting distributions are usually normal, a
fact that may arise from moment methods (based on GFs) or from
Quasi-Powers approximations associated to multivariate generating
functions that are rational functions. This is an area started by
Guibas and Odlyzko (who developed the correlation polynomial) in the
early 1980's, then pursued by Mireille Régnier, Wojetk,
Hosam Mahmoud, and
several others in the
ensuing decade. A recent
unification of seemingly very different source models is due to
Brigitte Vallée, our gracious hostess at Tatihou;
see her model of ``dynamical sources''.
A zillion
publications
are available from Wojtek's web pages.
Go get Wojtek's book
Average Case Analysis of Algorithms on Sequences, Wiley, 2001.
Brigitte's
Research Pages including papers on dynmaical sources start from here.
- Luc Devroye surveyed the ubiquitous digital tree also known
as trie. This wonderful structure can be approached by exact
analytic models (ordinary, exponential, or Poisson generating
functions), Mellin transforms, and saddle point methods,
including Jacquet and Szpankowski's analytic depoissonization,
whenever the source model is simple enough (typically, Bernoulli or Markov).
It can also be approached by probabilistic methods and Luc has been a
pioneer in this area. Luc started by recalling a few basic facts
for a binary alphabet and a trie built on i.i.d. equally probable
letter, as regards depth of a random node, height, and fill-up level:
These are results ``in probability'' (hence the
symbols).
The fill up level is the maximal level at which no external node
appears. The estimate on fill-up level is of particular interest given
the structure of Level Compressed trie [LC trie] invented a few
years ago by Nilsson and Andersson: these two authors created a
surprise by showing the efficiency of this compressed structure for
managing tables of IP addresses in routers. The idea is to extract the
maximum perfect tree present at the root of a regular trie and encode
it by an array; then proceed recursively on subtries. When you do
this, the resulting structure is such that the depth
(= the length of the access path) is with high probability close to
, where
is the number of times the (binary)
logarithm must be applied to
in order to obtain a number
,
e.g.,
.
The height is
for a Patricia variety
[one-way branching is avoided] of LC tries against
for a standard LC-trie, due to a few long filaments at the bottom of
regular tries.
For a source of
i.i.d. symbols with leter probabilities
, the depth goes
up to
. On the other hand, and this is a famous result
of Luc in the late 1980's, the depth remains close to
for tries
built on binary representations of random variables
over (0,1) as soon as these admit a
probability density.
Luc then went on to discuss various other models of data on which tries may
be built: this includes continued fraction digits, a subject
researched extensively by Brigitte Vallée and Julien Clément
in recent years. I have unfortunately scanty notes on the rest:
Azuma's and Talagrand's inequalities were flying around; Luc also
mentioned Werner Schachinger's recent results on Gaussian-ness of
trie parameters, and discussed briefly problems
arising from partial match queries.
Download a version of "An analysis of random LC tries", which appeared in Random Structures and Algorithms, vol.
15, pp. 359-375, 2001.
- Boris Pittel thoroughly discussed the integer partitioning problem.
This is of great value since the problem is well known to be
-complete. In Boris' formulation: you are given
randomly chosen numbers in the interval between 1 and
and want
to partition them into two subsets, so that the ``discrepancy'' (the
absolute value of the difference of their sums) is minimized. So far,
very few such hard combinatorial problems have succumbed to analytic
approaches, most of the existing results being based on the probabilistic
method.
Say that a partition is perfect if the discrepancy is zero. Boris
shows that everything depends on the ratio
and a sharp
``phase transition'' is detected: if
, there are almost
surely many perfect partitions while if
almost surely none
exists. Boris also shows much finer results describing the transition
which takes place in a
-region region of width about
near 1. It is interesting to note that this study started from
partly heuristic work by Stephan Mertens (based on the replica method
of statistical physics and on Derrida's random energy model) well
supported by extensive simulations. I cannot do justice to the
richness of the methods here. Let me just say that it starts from
a representation by oscillating integrals based on orthogonality on the unit
circle; for instance, the number of perfect partitions given a
sequence
is
Then, one of the questions is to determine under which conditions the
saddle point method (or Laplace's method under the real-variables
formulation) is, with high probability, applicable. Jump at the
references to see how to carry out this beautiful
programme. A column in the American Scientist serves as a good
introduction to the problem.
The STOC-2001 paper by Chayes, Borgs, and Pittel.
Related publication in Random Structures and
Algorithms 19(3-4): 247-288 (2001).
The Mathematics
ArXiv has a preliminary version of the related paper
``Phase Diagram for the Constrained Integer Partitioning Problem'',
by C. Borgs, J. T. Chayes, S. Mertens, B. Pittel. 62
pages.
- Philippe Flajolet: I talked about ``algebraic analytic asymptotics''.
Gee! Many
problems lead to generating functions that are algebraic
functions,
that is roots of polynomial equations
. First and
foremost you have objects described by context-free grammars, which
lead to systems of polynomial equations, hence algebraic generating
functions (by elimination).
This covers all sorts of combinatorial families of trees, e.g., the
ones defined by restricting the node degrees to a fixed finite set
of allowed values, but also lattice path that nicely decompose,
certain geometric diagrams, parenthesis systems and related formal languages.
Next, the ``kernel method'' invented by Fayolle, Malyshev, and
Iasnogorodski (in the 1970s) and the related ``quadratic method'' of
Tutte (in the 1960s) show that
certain functional equations may have (highly nontrivial) algebraic
solutions. Then, the following question arises: Given a generating
function that is algebraic, can we determine automatically an
asymptotic expansion of its coefficients? No suspense: the answer is
Yes!.
Simpler systems have a vectorial form
where
is a
vector of GFs and
is a vector of polynomials with nonnegative
coefficients. In this case,
we have at our disposal a classical theorem, which I like to call
the
Drmota-Lalley-Woods Theorem (in fact the three authors worked
independently in the 1990s): it asserts that, assuming total dependence [strong
connectivity of the dependency
graph],
any component generating function must have a dominant singularity
of the square-root type.
Thus coefficients are composed of asymptotic elements of the form
, with
and
algebraic
numbers. In general however, a system may well not satisfy these
positivity/dependency conditions.
Here I show (based on joint work with Bruno Salvy and
Cyril Chabaud) that there exist decision procedures for finding
the dominant singularities and determine the corresponding
singular expansions. (The latter are known to be of the Newton-Puiseux
type, that is, they involve only fractional exponents.) The difficulty
is one of choosing amongst the various branches that an algebraic
function may have--this is
sometimes called a ``connection problem''.
The net result is an explicit algorithm by which
one can find the right expansions for a branch at the origin of an
algebraic function specified by enough initial conditions:
coefficients of any algebraic functions are asymptotically linear combinations
of elements of the form
, for
computable numbers
(rational) and
(algebraic).
See
Functional Equations, Rational and Algebraic Functions,
by Flajolet & Sedgewick, Chapter 8 of
Analytic Combinatorics, 2001, 98p.
(INRIA Res. Rep. #4103).
Slides for this lecture.
- Gilles Schaeffer talked on the enumeration of planar maps.
These are connected planar graphs with an embedding in the plane (or on the
sphere) ``frozen''. Tutte in the 1960s, in an attempt to disprove
what was at the time the 4-colour conjecture launched a
far-reaching programme destined to enumerate all sorts of planar maps.
(Tutte was hoping to enumerate 4-colourable maps and show that they
are in smaller number than all maps.)
Surgery on maps leads to functional equations for bivariate GFs
enumerating maps according to size (i.e., number of edges, vertices, or
faces depending the on situation) as
well as an auxilairy parameter. For instance, take a rooted map
(rooted along an oriented edge) and delete the root: you get either
two maps not connected at all to one another or a map with a new face that
combines two other faces. With
recording the number of edges and
the degree of the face containing the root, a moment's
reflection shows that the bivariate GF of maps satisfies the
functional equation
Tutte's quadratic method will solve this: the answer is plainly a
quadratic algebraic function from which there derives the closed form
for the number of maps with
edges.
Starting from this elementary example, Gilles discussed similar functional
equations that admit of algebraic function solutions (part of this
is joint work with Mireille Bousquet-Mélou in Bordeaux). He then went on
to describe bijective approaches which have the further advantage of
leading to fast random generation algorithms. Finally he went on to
describe a conjecture on the height of ``blossom trees'' and related
properties of the diameter. (Some of these have since received a
positive answer thanks to works of Gilles with Philippe
Chassaing in Nancy). All in all, this implies that maps
must be added to the
collection of combinatorial objects resorting to analytic
combinatorics. I'd expect in the near future some analysis of
combinatorial algorithms coming out of such beautiful enumerative
results.
Gilles Schaeffer's
old Home Page at LORIA (he's now at the Polytechnique near Paris).
Mireille Bousquet-Mélou's Publication pages.
To be continued!