Course on Probabilistic Algorithms, Chiang-Mai, December 2004
By Philippe Flajolet.
This is the outline of a set of 10 lectures for the
Fifteenth Asian School on
Computer Science, December 11-15, 2004
following the ASIAN'04 Conference.
Chiang Mai, Thailand. This will be mostly a course on "selected
topics" as no reasonably complete coverage of the vast topic of
probabilistic algorithms can nowadays be expected.
Several of the examples discussed will be related to the area of
Massive Data Sets. The general orientation of the course
will be to focus on analytic-combinatorial methods.
Here is a detailed plan and some relevant
references. Most are freely available
and marked
but this does not apply to two of the books.
Probability, combinatorics, and algorithms
- What is a probability distribution? Distribution function,
moments, probability generating function for discrete laws.
The Bernoulli/binomial, geometric, and Poisson laws.
Analogues for continuous laws. The exponential and Gaussian
distribution. The inversion method for simulating a given distribution.
- What is a combinatorial model? Counting and generating functions.
The ordinary and exponential generating functions. The multivariate
versions. Examples: counting of trees and triangulations.
- Algorithms that bet on the likely shape of data.
Analysis in the average-case and in probability.
To be contrasted with randomized algorithms which purposely introduce
randomness. The example of QuickSort and Randomized
Quicksort. Binary search trees and their randomized version.
Skip lists.
- Approximate counting: an example of a randomized algorithm whose
analysis involves interesting combinatorial-analytic methods.
More generally: counting regular languages and poles of rational functions.
Reading: Principally the books by Flajolet-Sedgewick and
Motwani-Raghavan. Unfortunately they are only available
commercially. The Vitter-Flajolet chapter is freely available on
the web (though without figures, because of the technology of the
time).
-
An Introduction to the Analysis of Algorithms.
by Robert Sedgewick, Philippe Flajolet.
A book with a strong orientation towards combinatorial models.
- Randomized Algorithms. By
Rajeev Motwani, Prabhakar Raghavan. A best seller with a strong orientation
towards probabilistic methods.
-
Average-Case Analysis of Algorithms and Data Structures. By J. S. Vitter and Ph. Flajolet. Chapter 9 in Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity (edited by J. van Leeuwen), Elsevier, 1990, 431-524.
[still a useful introduction to the combinatorial methodology, a bit advanced]
-
RESEARCH in the
ANALYSIS of ALGORITHMS PAGES: What it says, but for
aficionados.
-
Approximate Counting: A Detailed Analysis.
Philippe Flajolet. In BIT vol 25, pp. 113-134.
-
Analytic Combinatorics. By Philippe Flajolet and Robert Sedgewick.
(This is an advanced text; the relevant techniques
for approximate counting and regular expression counting
are mostly in Chapter I.)
- Random Number Generation. By
Luc Devroye. A page by a Master. Look up especially:
L. Devroye, Random variate generation in one line of code, 1996.
- On Randomization in Sequential
and Distributed Algorithms.
By Rajiv Gupta, Scott A. Smolka, Shaji Bhaskar
ACM Computing Surveys (1994).
- Generatingfunctionology.
By Herb Wilf. The entire book is freely available on the web. It
contains a good introduction to combinatorial enumeration,
though no algorithms.
Standard texts on ALGORITHMS (only commercially available):
these are only indirectly needed for the course, but anybody
seriously interested in algorithms and data structures should have
consulted them.
Random allocations
- Elementary uses of exponential generating functions.
Constrained functions from a finite set to a finite set (allocations).
- The birthday paradox and coupon collector problems.
Hash tables that don't waste space or time are (nearly) impossible.
- The Poisson law for table occupancy. Use for paged hash tables:
the expected worst-case of random alpha-full allocations.
- The "hit counting" algorithm of Kyu Young Whang et al.
- The log-log load balancing slick trick of Azar et al.
- Bloom filters.
Reading:
Digital trees and applications
- What is a digital tree aka "trie"? Basic data structure for
searching. The probabilistic divide-and-conquer (here
binomial/Bernoulli) recurrence. Exact solution by means of
exponential generating functions.
- Elementary asymptotic methods for trie recurrences. Soft
introduction to the Mellin technology. Application: Randomized
leader election and the Tree Protocol.
- Straight sampling and adaptive sampling. Use of adaptive sampling
as a cardinality
estimator.
Reading:
Probabilistic counting and LogLog Counting
- Hash values and observables of multisets. The angel-daemon model.
- Basic ideas of Probabilistic Counting (by Flajolet-Martin).
- Basic ideas of LogLog Counting (by Durand-Flajolet).
Reading:
Frequency moments and stable laws
- What is a stable law? A law that can appear as a limit of sums of
i.i.d random variables. Zipf laws and stable laws. Laws with
heavy tails. The characteristic function of a "standard"
stable law. The normalization for sums of stable laws.
- The frequency moment F[a]: use of stable laws and the
basic algorithm of Indyk. The inference problem: what is the
scaling? Solution: use medians (we cannot use averages).
- Parametric statistics and medians: a little calculation with
Taylor expansions shows that, with medians, we get an accuracy of
order O(1/sqrt(m)) for m-samples.
Also need pseudo-random generators for
simulating a stable law.
Reading: