Previous News
Next News
Issue last updated May 29, 1999.
1998.
In June, Barcelona hosts the Fifth Seminar on Analysis of
Algorithms. The programme features some 35 talks or so.
Informations about the place, the
schedule, and the abstracts of talks are given under the
web site nicely
maintained by Conrado Martinez who is the chief organizer for 1999.
In 1999, a number of original results should be presented. They deal to a large extent with some of the most basic algorithms--the ones that do deserve deep investigation. Examples are hashing, sorting, retrieval of simple or multidimensional information, algorithms from computational number theory, random generation, data compression, circuits, to name a few. Naturally, these algorithms often lead to what one might term ``combinatorial processes'' that are close in spirit to stochastic processes, safe that they have a strong algorithmic and combinatorial content. Sometimes the new combinatorial processes that arise (think of digital tries or binary search trees) are of considerable interest for their own sake; they are often found to intersect other models from classical probability theory (e.g., the connection between simple families of trees and branching processes) and combinatorics (e.g., binary search trees, heap-ordered trees, recursive trees against permutations); some of them even show up somewhat unexpectedly in areas like statistical physics, dynamical systems theory, or other exotic fields.
Quicksort with equal keys also belongs to a circle of ideas that lead to the Ternary Search Trie (TST) structure proposed by Bentley and Sedgewick at the SODA Conference in 1997. It is striking that TST's overperform hashing in OCR dictionary applications typically by a factor of 5. They have also been tested by Julien Clément on large text corpuses, like the novel Moby Dick (!!), where they gain quite a bit in terms of either time or storage requirements and Julien has designed an efficient spell checker based on them. The paper by Clément, Flajolet, and Vallée will present some of the corresponding analyses cast into a framework, based on fundamental intervals and modelling by transfer operators, that gives access to a number of parameters of trie structures and to a number of source models that include Bernoulli and Markov models as well as sources with unbounded correlations like continued frcations. Through this an earlier studies, Quicksort is once more confirmed as a method of choice for general purpose sorting.
For files that are not too large, common wisdom is that a good easy-to-programme choice is shellsort. Knuth's volume 3 of TAOCP discusses shellsort based on various ``increment sequences''. The worst case and the average case each pose challenging problems. The average-case analysis of (h,k,1)-shellsort (three increments) was done by Yao in a paper of 19?? that still deserves to be fully exploited I think. Now, what's under attack is the distribution. Bob Smythe will precisely present a distributional analysis of (h,1)-shellsort, or what amounts to the same thing, of the number of inversions in h-sorted permutations. Furthermore, there are interesting connections with area parameters of Brownian motion that surface again below in seemingly unrelated contexts.
The height of a randomly built BST is known to be asymptotic to sqrt(n). The variance is O(1) as has been very recently announced in Spring 1999 by Devroye & Reed. Tsukiji had shown earlier that Devroye's original methods of 1986 designed to analyse height of a BST extend to random circuits. This year, he will present refined analyses of the depth of nodes and the number of output gates in such circuits.
Various specialized models are also of interest. Mahmoud and Pittel initiated the study of d-ary search trees. Chen will consider recurrences that arise from balanced versions of d-ary heaps and prove that the cost of a construction by means of Floyd's algorithm is asymptotically normal. Trees built with local reorganization rules are also of great interest as one easily approaches perfect balancing. The discussion is then related to median quicksort and sample-sort. Hofri will show how to come up with a very fast approximate median finding algorithm while Rösler will provide an overview of FIND methods and the new light that the contraction method on function spaces brings to selction algorithms. In this range of problems, expectations lead to probabilistic divide-and-conquer recurrences that are full-history recurrences; ``master theorems'' to be exposed by Martinez then provide a direct and simple access to first order asymptotics. This approach constitutes the average-case analysis counterpart of the (worst-case) divide-and-conquer paradigm that we all teach to our students.

A drawback of linear probing hashing (LPH) is that it degenerates into sequential search when the table fills up. Chassaing and Janson appeal to recent probabilistic tools in order to find the cut-off point where LPH becomes ``unreasonable''. This is when n becomes about m-sqrt(m), with n the number of data items and m the number of cells. The phase transition region, as physicists would call it, can be in fact very precisely quantified. Poblete will consider another type of dynamics and study the effect of long sequences of insertions and deletions: the behaviours corresponding to different deletion strategies can be indeed quite varied!
An indirect consequence of the recent surge of interest in hashing is Marckert's solution of a conjecture of Odlyzko and Wilf going back to 1987. In a nutshell, LPH (full) tables are labelled trees in disguise and the probabilistic tools that connect them to Brownian excursion can be put to use to analyse the width of trees. The net result is that the width of a random labelled tree is O(sqrt(n), with precise constants and full characterization of the distribution being now available.
Sometimes uniform tree models apply directly, as a sort of ``default model''; such is the case in computer algebra and symbolic manipulation where we can start by considering all expression trees as equally likely. Often, they are useful as a first approach to the analysis of more complex data models (like BST's and digital trees) since they replace difference or integro-differential relations by plain algebraic ones.
Drmota will discuss the general singularity structure of uniform tree models. Given a finite set of operands, we're talking here about systems of polynomial equations, algebraic functions, algebraic singularities, and the like. Normal laws are then ubiquitous by singularity perturbation techniques. Particular cases of this general situation are the uniform (aka ``static'') model of tries reviewed by Nebel and for which height estimates are derived, and also leftist trees where Kemp, who had solved earlier Knuth's long-standing open problem of enumerating shapes, will present new statistics on leaves.
Szpankowski and Jacquet coined the term of ``analytic information theory''. By this is meant the application of analytic combinatorics methods to information theory questions. The new field has had for instance spectacular success with a very complete characterization of the behaviour of data compression algorithms of the Lempel-Ziv variety. Szpankowski will present here new applications to very precise redundancy estimates for a large class of sources.
Arratia will discuss probabilistic issues arising from the problem of reconstructing a long sequence given a large number of its small fragments--a key problem in DNA sequencing. Methods of analytic combinatorics that were, as we know, largely triggered by the demands from analysis of algorithms, are put to use by Hwang in the study of consecutive records, a problem closely related to classical statistics. Karonski proposes a matching algorithm that is particularly efficient in a distributed environment context. Lugosi develops new concentration inequalities that have a lot of applications in random combinatorics including entropy estimates. Urn models pose challenging problems and a complete solution to the 2x2 case will be presented by Mahmoud. Analytic methods apply to a number of ``seminumerical'' algorithms in the sense of Vol. 2 of TAOCP. We've already seen the case of GCD algorithms; other examples are the structure of random polynomlials and polynomial factorization algorithms as well as a number of related algorithms from computational number theory. This provides the motivation of Panario's lecture who offers a general discussion of smallest and largest components under a combinatorial schema of broad applicability. Prodinger has devised a complete analysis of runs, records, inversions, etc, in sequences of geometrically distributed random variables. This is closely related to probabilistic counting and to skip lists, themselves a highly practical alternative to balanced trees. A branching tree model for the analysis of effective speedup from pipelining is the subject of Shachnai's lecture.
Last, this year will benefit from the presence of Herb Wilf, the pioneer and founding father of random generation in combinatorics. See the book by Nijenhuis and Wilf from 1975 and the follow-ups by Wilf along the years. Wilf will present a panorama of methods and a set of corresponding open problems in this area. Liebehenschel will focus on the ranking problem for general parenthesis systems while Krattenthaler will discuss the tough problem of generating plane partitions uniformly at random.
I hope I have made mysellf clear: 1999 should be a great vintage for the AofA meetings!
The Island of Tatihou
[A warning to Wojtek: Tatihou is not
Tahiti! :-) :-)]