Course on Probabilistic Algorithms, ChiangMai, December 2004
By Philippe Flajolet.
This is the outline of a set of 10 lectures for the
Fifteenth Asian School on
Computer Science, December 1115, 2004
following the ASIAN'04 Conference.
Chiang Mai, Thailand. This will be mostly a course on "selected
topics" as no reasonably complete coverage of the vast topic of
probabilistic algorithms can nowadays be expected.
Several of the examples discussed will be related to the area of
Massive Data Sets. The general orientation of the course
will be to focus on analyticcombinatorial methods.
Here is a detailed plan and some relevant
references. Most are freely available
and marked
but this does not apply to two of the books.
Probability, combinatorics, and algorithms
 What is a probability distribution? Distribution function,
moments, probability generating function for discrete laws.
The Bernoulli/binomial, geometric, and Poisson laws.
Analogues for continuous laws. The exponential and Gaussian
distribution. The inversion method for simulating a given distribution.
 What is a combinatorial model? Counting and generating functions.
The ordinary and exponential generating functions. The multivariate
versions. Examples: counting of trees and triangulations.
 Algorithms that bet on the likely shape of data.
Analysis in the averagecase and in probability.
To be contrasted with randomized algorithms which purposely introduce
randomness. The example of QuickSort and Randomized
Quicksort. Binary search trees and their randomized version.
Skip lists.
 Approximate counting: an example of a randomized algorithm whose
analysis involves interesting combinatorialanalytic methods.
More generally: counting regular languages and poles of rational functions.
Reading: Principally the books by FlajoletSedgewick and
MotwaniRaghavan. Unfortunately they are only available
commercially. The VitterFlajolet chapter is freely available on
the web (though without figures, because of the technology of the
time).

An Introduction to the Analysis of Algorithms.
by Robert Sedgewick, Philippe Flajolet.
A book with a strong orientation towards combinatorial models.
 Randomized Algorithms. By
Rajeev Motwani, Prabhakar Raghavan. A best seller with a strong orientation
towards probabilistic methods.

AverageCase Analysis of Algorithms and Data Structures. By J. S. Vitter and Ph. Flajolet. Chapter 9 in Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity (edited by J. van Leeuwen), Elsevier, 1990, 431524.
[still a useful introduction to the combinatorial methodology, a bit advanced]

RESEARCH in the
ANALYSIS of ALGORITHMS PAGES: What it says, but for
aficionados.

Approximate Counting: A Detailed Analysis.
Philippe Flajolet. In BIT vol 25, pp. 113134.

Analytic Combinatorics. By Philippe Flajolet and Robert Sedgewick.
(This is an advanced text; the relevant techniques
for approximate counting and regular expression counting
are mostly in Chapter I.)
 Random Number Generation. By
Luc Devroye. A page by a Master. Look up especially:
L. Devroye, Random variate generation in one line of code, 1996.
 On Randomization in Sequential
and Distributed Algorithms.
By Rajiv Gupta, Scott A. Smolka, Shaji Bhaskar
ACM Computing Surveys (1994).
 Generatingfunctionology.
By Herb Wilf. The entire book is freely available on the web. It
contains a good introduction to combinatorial enumeration,
though no algorithms.
Standard texts on ALGORITHMS (only commercially available):
these are only indirectly needed for the course, but anybody
seriously interested in algorithms and data structures should have
consulted them.
Random allocations
 Elementary uses of exponential generating functions.
Constrained functions from a finite set to a finite set (allocations).
 The birthday paradox and coupon collector problems.
Hash tables that don't waste space or time are (nearly) impossible.
 The Poisson law for table occupancy. Use for paged hash tables:
the expected worstcase of random alphafull allocations.
 The "hit counting" algorithm of Kyu Young Whang et al.
 The loglog load balancing slick trick of Azar et al.
 Bloom filters.
Reading:
Digital trees and applications
 What is a digital tree aka "trie"? Basic data structure for
searching. The probabilistic divideandconquer (here
binomial/Bernoulli) recurrence. Exact solution by means of
exponential generating functions.
 Elementary asymptotic methods for trie recurrences. Soft
introduction to the Mellin technology. Application: Randomized
leader election and the Tree Protocol.
 Straight sampling and adaptive sampling. Use of adaptive sampling
as a cardinality
estimator.
Reading:
Probabilistic counting and LogLog Counting
 Hash values and observables of multisets. The angeldaemon model.
 Basic ideas of Probabilistic Counting (by FlajoletMartin).
 Basic ideas of LogLog Counting (by DurandFlajolet).
Reading:
Frequency moments and stable laws
 What is a stable law? A law that can appear as a limit of sums of
i.i.d random variables. Zipf laws and stable laws. Laws with
heavy tails. The characteristic function of a "standard"
stable law. The normalization for sums of stable laws.
 The frequency moment F[a]: use of stable laws and the
basic algorithm of Indyk. The inference problem: what is the
scaling? Solution: use medians (we cannot use averages).
 Parametric statistics and medians: a little calculation with
Taylor expansions shows that, with medians, we get an accuracy of
order O(1/sqrt(m)) for msamples.
Also need pseudorandom generators for
simulating a stable law.
Reading: