Research News, August 2001


Previous News Next News

Return to Research News Index


These pages are edited by Philippe.Flajolet@inria.fr.

   
Tatihou poster: original design by Mariola Szpankowska. Brigitte Vallée and others at Tatihou.

The AofA Metting in Tatihou, July 1-7, 2001

The Seventh Seminar on Analysis of Algorithms was held in Tatihou, a truly amazing place. Tatihou is a small island about half a mile off the coast of Normandy in the (English?) Channel. The isle is small but it contains a fort controlling the bay built in the seventeenth century by Vauban (a famous French military architect). There is also a bird reserve, a maritime museum, and, of course, a sensational conference center operated by the local county council. Brigitte with her team, Ali Akhavi, Jérémie Bourdon, Julien Clément, was responsible for making this venture possible.

    
    
Centre's portal; Abelard [left] with Julien Clément, Brigitte Vallée, Ali Akhavi, Julie Vallé, and Benoit Daireaux.
One of the buildings; the Vauban Tower.


There were eight long lectures, of an hour and a quarter each, which provided ample time for speakers to develop their ideas. These were given at the start of each half-day by Hwang, Chauvin, Viola, Szpankowski, Devroye, Pittel, Flajolet, and Schaeffer. Here's a brief summary.

       
  1. Hsien Kuei Hwang investigated the method of moments in relation to the analysis of recursive algorithms of the divide-and-conquer type. The method of moments is familiar from probability theory: for a distribution $ Y$ characterized by its moments (this is the case for most usual distributions, including the Gaussian law), convergence of the moments of random variables $ X_n$ to those of $ Y$ implies convergence in distribution of $ X_n$ to $ Y$, namely, $ X_n\mathop{\rightarrow}^{\mathcal L} Y$. In analysis of algorithms, the $ X_n$ are a normalized form (usually centred around the mean, possibly rescaled by a standard deviation) of the cost of an algorithm over inputs of size $ n$. Hsien Kuei examines various examples related to Quicksort, Quickselect, Mergesort, Digital Trees, Leader Election (in a distributed environment) and so on. In all these cases, there is a recurrence satisfied by the probability generating function $ \operatorname{E}[u^{X_n}]$ (or equivalently, the moment generating function $ \operatorname{E}[e^{tX_n}]$ of the $ X_n$. One gets the first two moments by the usual devices. Then, it is often possible to guess--then prove--a form of the limit distribution as it should satisfy approximately the basic recurrence on probability (or moment) generating functions. Hsien Kuei gives many examples of this approach. The speed of convergence then derives from the Berry-Esseen inequalities (reminder: Berry-Esseen translates distances between characteristic functions into distances between distribution functions). There is here a method of considerable generality which in a sense parallels the recursive method developed by Rösler, Rüschendorf, and Neininger in similar contexts.

    Many papers by Hsien Kuei nicely illustrating this method are to be found at his web site: see esp. his publications in years $ \ge2000$.
    As related reading, I also recommend the study by Boris Pittel, ``Normal convergence problem? Two moments and a recurrence may be the clues'', Ann. Appl. Probab. 9 (1999), 1260-1302, corresponding to his talk at the AofA'98 in Princeton.
  2.    
  3. Brigitte Chauvin helped many of us discover the power of martingale arguments in the analysis of algorithms and data structures. Brigitte started her by a comprehensive review tuned to the needs of analysis of algorithms. She then proceeded to develop a martingale argument due to herself and her student Jean Jabbour which gives truly surprising results regarding the probable shape of binary search trees. Take a random binary search tree and assocaite to it its level polynomial $ W_n(x)$ defined by teh fact that $ [z^k]W_n(z)$ is the number of external nodens at level $ k$. This is a random polynomial whose expectation is well known to equal $ w_n(x)=\prod_{j=0}^{n-1}(j+2z)/(j+1)$. The big news is that the normalized sequence of (random) polynomials $ W_n(x)/w_n(x)$ form a martingale. The consequences are cornucopious: this gives quantitative properties that hold almost surely when a binary search tree evolves under successive insertions from the empty tree till the end of time--a point of view that differs somewhat from what is done usually in AofA. It also shows that the profile of the tree is in probability almost Gaussian. This property was known in the expected value sense since the 1980's (Louchard). Here, the strong form of this property says the following: simulate a random BST of size (say) $ 10^4$ and build the histogram of depths of extrenal nodes; you will only find bell-shaped histograms. This property had been conjectured by Sedgewick and myself when we did some empirical simulations, and much to our astronishment invariably found bell-shaped images like this one.

    Another surprising consequence is that the width of a random BST is in probability $ \sim n/\sqrt{4\pi\log n}$. It's hard to imagine such a result to be accessible by analytic methods alone!

    See the tutorial by Brigitte Chauvin ``Discrete martingales and applications to analysis of algorithms'' [for those who read French].
    See the joint paper with Drmota and Jabbour for martingales associated to BSTs.
  4.    
  5. Alfredo Viola examined Rabin's irreducibility test for polynomials. We are talking here of algorithms that operate on polynomials, which are basic objects of symbolic manipulation systems, but also of cryptography and of the theory of error-correcting codes. Following a tradition started by Berlekamp, Knuth (in volume 2 of TAOCP) showed that we can subject these algorithms to the usual questions: What is the expected complexity? What is the distribution of costs? The subject is currently pursued by Daniel Panario (Carleton University, Ottawa) and Alfredo Viola (Montevideo) and we start knowing quite a bit on these algorithms. For instance, the usual polynomial factorization chain (based on elimination of repeated factors, distinct degree factorization, and Cantor-Zassenhaus equal-degree factorization) has been completely analysed in Journal of Algorithms 40(1), 2001, pp. 37-81. The interest of these studies is enhanced by the fact that the algorithms are often probabilistic in nature, so that the randomness assumptions apply to perfection. In his talk, Alfredo reviewed Rabin's irreducibility test which answers the question: When is a polynomial irreducible (i.e., not factorable, a sort of ``prime'')? He presented results from the paper by Daniel Panario, Boris Pittel, Bruce Richmond, Alfredo Viola, ``Analysis of Rabin's irreducibility test for polynomials over finite fields'', Random Structures and Algorithms 19(3-4): 525-551 (2001) which improves on the earlier LATIN-1998 conference version.
    Alfredo, where's your home page with your publications?
  6.    
  7. Wojtek Szpankowski presented a survey talk on New and Old Problems in Pattern Occurrences. Let $ H$ and $ T$ be sequences generated ovr er a finite alphabet. We call $ H$ the pattern and $ T$ the text. The basic questions are How many times does $ H$ occurs in $ T$? how long one has to wait until $ H$ occurs in $ T$? In the string matching problem the pattern is considered to be a string of consecutive symbols (a ``factor''), while in the subsequence matching problem the pattern is a subsequence. We may ask that the pattern occurs either exactly or approximately. In all cases, we assume that the pattern is given and the text is random and generated among all possible strings by some source model of sorts, the simplest cases being the uniform model, the Bernoulli, and the Markov sources. Wojtek showed us how to compute moments, limiting distributions and large deviations for the number of pattern occurrences and the waiting time. Throughout he uses combinatorial (e.g., symbolic) and analytic methods, based on generating functions (GFs) and singularity analysis. For this dense survey, we refer to Wojtek's recent book and to his web page. I note that the limiting distributions are usually normal, a fact that may arise from moment methods (based on GFs) or from Quasi-Powers approximations associated to multivariate generating functions that are rational functions. This is an area started by Guibas and Odlyzko (who developed the correlation polynomial) in the early 1980's, then pursued by Mireille Régnier, Wojetk, Hosam Mahmoud, and several others in the ensuing decade. A recent unification of seemingly very different source models is due to Brigitte Vallée, our gracious hostess at Tatihou; see her model of ``dynamical sources''.
    A zillion publications are available from Wojtek's web pages.
    Go get Wojtek's book Average Case Analysis of Algorithms on Sequences, Wiley, 2001.
    Brigitte's Research Pages including papers on dynmaical sources start from here.
  8.    
  9. Luc Devroye surveyed the ubiquitous digital tree also known as trie. This wonderful structure can be approached by exact analytic models (ordinary, exponential, or Poisson generating functions), Mellin transforms, and saddle point methods, including Jacquet and Szpankowski's analytic depoissonization, whenever the source model is simple enough (typically, Bernoulli or Markov). It can also be approached by probabilistic methods and Luc has been a pioneer in this area. Luc started by recalling a few basic facts for a binary alphabet and a trie built on i.i.d. equally probable letter, as regards depth of a random node, height, and fill-up level:

    $\displaystyle \left\{\begin{array}{lllll}
\hbox{Depth:}&& D_n&=&\log n+O_P(1)\\...
...\\
\hbox{Fill-up:}&& F_n&=&\log_2n-(1+o_P(1))\log_2\log n.
\end{array}\right.
$

    These are results ``in probability'' (hence the $ O_P,o_P$ symbols). The fill up level is the maximal level at which no external node appears. The estimate on fill-up level is of particular interest given the structure of Level Compressed trie [LC trie] invented a few years ago by Nilsson and Andersson: these two authors created a surprise by showing the efficiency of this compressed structure for managing tables of IP addresses in routers. The idea is to extract the maximum perfect tree present at the root of a regular trie and encode it by an array; then proceed recursively on subtries. When you do this, the resulting structure is such that the depth $ D_n^{LC}$ (= the length of the access path) is with high probability close to $ \log^\star n$, where $ \log^\star n$ is the number of times the (binary) logarithm must be applied to $ n$ in order to obtain a number $ \le1$, e.g., $ \log^\star 2^{2^{2^{2^2}}}\approx \log^\star 2\cdot 10^{19730}=5$. The height is $ H_n^{LC-Pat}\sim \sqrt{2\log_2 n}$ for a Patricia variety [one-way branching is avoided] of LC tries against $ H_n^{LC}\sim\log_2n$ for a standard LC-trie, due to a few long filaments at the bottom of regular tries. For a source of i.i.d. symbols with leter probabilities $ p\not=q$, the depth goes up to $ D_n=O_P(\log\log n)$. On the other hand, and this is a famous result of Luc in the late 1980's, the depth remains close to $ \log_2n$ for tries built on binary representations of random variables over (0,1) as soon as these admit a probability density.

    Luc then went on to discuss various other models of data on which tries may be built: this includes continued fraction digits, a subject researched extensively by Brigitte Vallée and Julien Clément in recent years. I have unfortunately scanty notes on the rest: Azuma's and Talagrand's inequalities were flying around; Luc also mentioned Werner Schachinger's recent results on Gaussian-ness of trie parameters, and discussed briefly problems arising from partial match queries.

    Download a version of "An analysis of random LC tries", which appeared in Random Structures and Algorithms, vol. 15, pp. 359-375, 2001.
  10.    
  11. Boris Pittel thoroughly discussed the integer partitioning problem. This is of great value since the problem is well known to be $ NP$-complete. In Boris' formulation: you are given $ n$ randomly chosen numbers in the interval between 1 and $ 2^m$ and want to partition them into two subsets, so that the ``discrepancy'' (the absolute value of the difference of their sums) is minimized. So far, very few such hard combinatorial problems have succumbed to analytic approaches, most of the existing results being based on the probabilistic method.

    Say that a partition is perfect if the discrepancy is zero. Boris shows that everything depends on the ratio $ k=n/m$ and a sharp ``phase transition'' is detected: if $ k<1$, there are almost surely many perfect partitions while if $ k>1$ almost surely none exists. Boris also shows much finer results describing the transition which takes place in a $ k$-region region of width about $ (\log_2n)/n$ near 1. It is interesting to note that this study started from partly heuristic work by Stephan Mertens (based on the replica method of statistical physics and on Derrida's random energy model) well supported by extensive simulations. I cannot do justice to the richness of the methods here. Let me just say that it starts from a representation by oscillating integrals based on orthogonality on the unit circle; for instance, the number of perfect partitions given a sequence $ (X_1,\ldots,X_n)$ is

    $\displaystyle P_n=[z^0]\prod_{j=1}^n \left(z^{X_j}+z^{-X_j}\right)
=\frac{2^n}{2\pi}\int_{-\pi}^{+\pi}
\prod_{j=1}^n cos(xX_j)\, dx.
$

    Then, one of the questions is to determine under which conditions the saddle point method (or Laplace's method under the real-variables formulation) is, with high probability, applicable. Jump at the references to see how to carry out this beautiful programme.
    A column in the American Scientist serves as a good introduction to the problem.
    The STOC-2001 paper by Chayes, Borgs, and Pittel. Related publication in Random Structures and Algorithms 19(3-4): 247-288 (2001).
    The Mathematics ArXiv has a preliminary version of the related paper ``Phase Diagram for the Constrained Integer Partitioning Problem'', by C. Borgs, J. T. Chayes, S. Mertens, B. Pittel. 62 pages.
  12.    
  13. Philippe Flajolet: I talked about ``algebraic analytic asymptotics''. Gee! Many problems lead to generating functions that are algebraic functions, that is roots of polynomial equations $ P(z,y)=0$. First and foremost you have objects described by context-free grammars, which lead to systems of polynomial equations, hence algebraic generating functions (by elimination). This covers all sorts of combinatorial families of trees, e.g., the ones defined by restricting the node degrees to a fixed finite set of allowed values, but also lattice path that nicely decompose, certain geometric diagrams, parenthesis systems and related formal languages. Next, the ``kernel method'' invented by Fayolle, Malyshev, and Iasnogorodski (in the 1970s) and the related ``quadratic method'' of Tutte (in the 1960s) show that certain functional equations may have (highly nontrivial) algebraic solutions. Then, the following question arises: Given a generating function that is algebraic, can we determine automatically an asymptotic expansion of its coefficients? No suspense: the answer is Yes!.

    Simpler systems have a vectorial form $ Y=\Phi(z,Y)$ where $ Y$ is a vector of GFs and $ \Phi$ is a vector of polynomials with nonnegative coefficients. In this case, we have at our disposal a classical theorem, which I like to call the Drmota-Lalley-Woods Theorem (in fact the three authors worked independently in the 1990s): it asserts that, assuming total dependence [strong connectivity of the dependency graph], any component generating function must have a dominant singularity of the square-root type. Thus coefficients are composed of asymptotic elements of the form $ C\pi^{-1/2}\omega^n n^{-3/2}$, with $ C$ and $ \omega$ algebraic numbers. In general however, a system may well not satisfy these positivity/dependency conditions. Here I show (based on joint work with Bruno Salvy and Cyril Chabaud) that there exist decision procedures for finding the dominant singularities and determine the corresponding singular expansions. (The latter are known to be of the Newton-Puiseux type, that is, they involve only fractional exponents.) The difficulty is one of choosing amongst the various branches that an algebraic function may have--this is sometimes called a ``connection problem''. The net result is an explicit algorithm by which one can find the right expansions for a branch at the origin of an algebraic function specified by enough initial conditions: coefficients of any algebraic functions are asymptotically linear combinations of elements of the form $ C/\Gamma(p/q)\omega^n n^{p/q-1}$, for computable numbers $ p/q$ (rational) and $ C,\omega$ (algebraic).

    See Functional Equations, Rational and Algebraic Functions, by Flajolet & Sedgewick, Chapter 8 of Analytic Combinatorics, 2001, 98p. (INRIA Res. Rep. #4103).
    Slides for this lecture.

  14.    
  15. Gilles Schaeffer talked on the enumeration of planar maps. These are connected planar graphs with an embedding in the plane (or on the sphere) ``frozen''. Tutte in the 1960s, in an attempt to disprove what was at the time the 4-colour conjecture launched a far-reaching programme destined to enumerate all sorts of planar maps. (Tutte was hoping to enumerate 4-colourable maps and show that they are in smaller number than all maps.) Surgery on maps leads to functional equations for bivariate GFs enumerating maps according to size (i.e., number of edges, vertices, or faces depending the on situation) as well as an auxilairy parameter. For instance, take a rooted map (rooted along an oriented edge) and delete the root: you get either two maps not connected at all to one another or a map with a new face that combines two other faces. With $ z$ recording the number of edges and $ u$ the degree of the face containing the root, a moment's reflection shows that the bivariate GF of maps satisfies the functional equation

    $\displaystyle M(z,u)=1=u^2zM(z,u)^2+uz\frac{M(z,1)-uM(z,u)}{1-u}.
$

    Tutte's quadratic method will solve this: the answer is plainly a quadratic algebraic function from which there derives the closed form $ M_n=2
\frac{(2n)!3^n}{n!(n+2)!}$ for the number of maps with $ n$ edges. Starting from this elementary example, Gilles discussed similar functional equations that admit of algebraic function solutions (part of this is joint work with Mireille Bousquet-Mélou in Bordeaux). He then went on to describe bijective approaches which have the further advantage of leading to fast random generation algorithms. Finally he went on to describe a conjecture on the height of ``blossom trees'' and related properties of the diameter. (Some of these have since received a positive answer thanks to works of Gilles with Philippe Chassaing in Nancy). All in all, this implies that maps must be added to the collection of combinatorial objects resorting to analytic combinatorics. I'd expect in the near future some analysis of combinatorial algorithms coming out of such beautiful enumerative results.
    Gilles Schaeffer's old Home Page at LORIA (he's now at the Polytechnique near Paris).
    Mireille Bousquet-Mélou's Publication pages.

To be continued!