Patterns in Random Binary Search Trees
Philippe Flajolet
Algorithms Project, INRIA Rocquencourt
Algorithms Seminar
October 7, 1996
[summary by Michèle Soria]
A properly typeset version of this document is available in
postscript and in
pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
Abstract
In a randomly grown binary search tree (BST) of size n, any fixed
pattern occurs with a frequency that is
on average proportional to n.
Deviations from the average case are highly unlikely and
well quantified by a Gaussian law. Trees with forbidden patterns
occur with an exponentially small probability
that is characterized in terms of Bessel functions.
The results obtained
extend to BST's a type of
property otherwise known for strings and combinatorial
tree models. They apply to paged
trees or to Quicksort with halting on short subfiles.
As a consequence, various pointer saving strategies for maintaining
trees obeying the random BST model can be precisely quantified.
The methods used are based on analytic models,
especially bivariate generating functions subjected to
singularity perturbation asymptotics.
The binary search tree (BST) model is a model of randomness that applies to
binary search trees and heapordered trees on random data, Quicksort,
multidimensional trees, randomised search trees and treaps,
as well as to syntax trees
that occur naturally in software engineering. Basic properties of
random BST have been studied in detail, for example it is wellknown
that the average value of depth nodes or height is asymptotically
logarithmic in the size of the tree (see
e.g. [7, 8, 9, 1, 2]).
This talk, based on [4], is devoted to characteristics
of the shape produced by the BST model.
1 Bivariate generating function
The bivariate generating function F(z,y) for the probability f_{n,k}
that a random BST of size n has k
occurrences of a given pattern u is defined by
F(z,y):= 

f_{n,k}y^{k}x^{n}
= 

l(t) y 

z^{t}, 
where l(t) is the probability that
a given unlabelled tree t is the shape of a random BST of
size t (l(t)=Õ_{v < t}1/v,
where the product is over all subtrees v of t), and
w[t]ºw_{u}[t] denotes the number of occurrences
of pattern u as a subtree of the BST t.
The recurrence on w[t] is obvious:
w[t]=[[t=u]]+w[t_{0}]+w[t_{1}]
,
where t_{0},t_{1} denote the left and right subtrees of t and where the
bracket notation [[P]]
is the indicator of P with
value 1 if the predicate P is true and 0 otherwise.
From this recurrence F(z,y) can be easily shown to satisfy
a linear second order differential equation.
Lemma 1
The bivariate generating function F(z,y) satisfies
the Riccati equation


F(z,y)=
F^{2}(z,y)+(y1)l(u)uz^{u1}, F(0,y)=1
(1) 
Using equation(1), the mean and variance
of the number of occurrences of u
are obtained by successive differentiation of F(z,y)
with respect to y, upon setting y=1.
Theorem 1 [Moments of occurrences]
The number W_{n} of occurrences of a pattern u
in a random BST
of size n has mean and variance which are asymptotically linear
µ_{n}~ c_{1}(u) n,
s_{n}~ c_{2}(u)n,
where c_{1}(u) and c_{2}(u) are effectively expressed in terms of
l(u) and u.
Linearization of the Riccati differential
equation (1) gives, in this case, an explicit solution
in terms of Bessel functions.
Lemma 2
The bivariate generating function of the number of occurrences of
pattern u is
F(z,y)= 

,
w(z,y)=A_{m}(L z^{m+1})zB_{m}(L z^{m+1}),

with w'(z,y)=w'_{z}(z,y),
L=ul(u)(1y), m=u³2, and A_{m},B_{m}
are normalized Bessel functions of orders 1/(m+1) and
1/(m+1) respectively.
2 Asymptotic analysis
The asymptotic behaviour of the bivariate generating function
F(z,y) is dictated by the
singularities in the main variable z (which are poles since F(z,y) is the
quotient of analytic functions),
with the auxiliary variable y
entering as a parameter. Different regions of values of the auxiliary variable
provide different types of information: probability for excluded
patterns (y=0) or rare occurrences (y» 0), limiting
distribution for the number of patterns, central (y» 1)
as well as local (y=1). The techniques of proof belong to the class
of singularity analysis methods: it is wellknown, after Cauchy's inversion
formula that
the location of polar singularities of a function f(z)=å f_{n}z^{n}
drives the asymptotic form of
its coefficients. For bivariate generating function F(z,y),
we first perform one level of inversion, resulting in
estimates of
f_{n}(y)=[z^{n}] F(z,y), with an extra property of uniform error bounds
for some values of y. One more inversion is required in order to
recover the
individual probabilities f_{n,k}.
The most direct approach consists in applying Levy's continuity
theorem for characteristic functions; this implies estimating f_{n}(y)
for y=e^{iq}, but only q near 0
is required because of scaling.
Thus, we have a perturbation of the univariate problem at
y=1.
When f_{n}(y) is a ``quasipower'', meaning that it
behaves very nearly like the powers of a fixed function, a Central
Limit Theorem can be derived. A local limit law of the Gaussian type
also holds when one can estimate
globally f_{n}(y) by quasipowers in larger
regions, like y=1, and not merely locally near y=1.
In that case, the recovery of f_{n,k} from
f_{n}(y) is achieved by using a saddlepoint method.
2.1 Trees with excluded patterns
Asymptotic analysis of univariate and bivariate generating functions
derived from F(z,y) depends on locating the zeros of
w(z,y) where y is a parameter.
We thus define the function a_{u}(y) to be
the root of smallest modulus of the Bessel type equation
A_{m}(
La^{m+1})
zB_{m}(
La^{m+1})=0,
L=(1
y)
u
l(
u).
(2)
This definition specifies a_{u}(y) unambiguously,
for 1y not too large (1y<5/2
as is shown in lemma 3 of [4]).
The probability that a random BST of size n does not contain
the pattern u, is by definition [z^{n}]F(z,0). The following result
comes from singularity analysis of meromorphic
functions.
Theorem 2 [Excluded patterns]
The probability e_{u,n} that a random BST of size n does not contain
the pattern u satisfies
e_{u,n}=a_{u}(0)^{n1}(1+O(K^{n}))
where K=K(u) is a constant strictly larger than 1, and a_{u}(0)
is the smallest positive root of Equation (2)
with y=0.
The same argument proves that, for small enough y,
[z^{n}] F(z,y)~ a_{u}(y)^{n1},
with a uniform exponentially small error term.
The probability that a random BST of size n has k occurrences
of a pattern u is obtained by differentiating k times this
asymptotic expansion.
Theorem 3 [Poisson law for rare occurrences]
Given a fixed pattern u, for each fixed
k,
one has as n®¥:
Pr 
{ 
w_{u}[t]=kt=n 
} 
=a_{u}(0)^{n1}· 


æ ç ç è 
1+O( 

) 
ö ÷ ÷ ø 
,
µ= 

.

2.2 Gaussian limit laws
The application of Levy's continuity theorem that leads to a Central
Limit Theorem, relies on evaluating f_{n}(y) when y=e^{i
t/(n)}^{1/2}.
By the
implicit function theorem
and the preparation theorem of Weierstrass
(see e.g. [6]), there is
a small complex neighbourhood of 1 such that the function
a_{u}(y) is analytic and by the analysis of meromorphic
functions, f_{n}(y) is closely approximated
by a large power of a fixed function:
f_{n}(y)=a_{u}(y)^{n1}(1+O(K^{n})), a situation that
conduces to normal laws. The speed of convergence is bounded
by means of the BerryEsseen
inequality (see e.g. [3])
that relates the distance between
distribution functions and characteristic functions.
Theorem 4 [Central law for occurrences]
Given a pattern u, the number of occurrences
W_{n} of u in a random BST of size n obeys a central
limit law with linear mean µ_{n} and variance s_{n}^{2}
(see theorem 1),
and the speed of
convergence is O(1/(n)^{1/2}):


½ ½ ½ ½ ½ ½ 
Pr 
ì í î 

£ x 
ü ý þ 
 


ó õ 

e 

dw

½ ½ ½ ½ ½ ½ 
< 

.

The obtention of a local limit law starts again with the quasipowers
form of f_{n}(y) and uses a saddlepoint derivation.
Theorem 5 [Local law for occurrences]
Given a pattern u, if a_{u}(y)¹1 for all y with y=1,
y¹1, then the random variable W_{n} satisfies a local limit law:

½ ½ ½ ½ 
s_{n} Pr{W_{n}=ëµ_{n}+xs_{n}û}
 

e 

½ ½ ½ ½ 
® 0, 
for x in any fixed compact
subset of ]¥,+¥[.
3 Other applications
Suitable adaptations of the technique also lead to
a distributional analysis of paging and factored representation of
BST as a DAG.
3.1 Paged trees
Given a tree t,
its bindex is a tree that is constructed
by retaining only those internal nodes of t
which correspond to subtrees of size > b.
Such an index is wellsuited to ``paging'',
where one has a twolevel hierarchical memory
structure: the index resides in main memory
and the rest of the tree is kept in pages
of capacity b on peripheral storage. Let
i[t]=i_{b}[t] denote the size of the bindex of t. The analysis
is then clearly equivalent to determining
the total number of occurrences of all patterns of size £ b.
The bivariate generating function
G(z,y):=å_{t} l(t)y^{i[t]}z^{t}
satisfies a Riccati equation

G(z,y)
=yG^{2}(z,y)+(1y) 

æ ç ç è 


ö ÷ ÷ ø 
, 
which is transformed by linearization. This equation has no explicit
solution, but the use of Weierstrass Preparation theorem shows that
G(z,y) has a unique dominant simple pole for y in a small
neighbourhood of 1. Thus [z^{n}]G(z,y) reduces to quasipowers, and
a central limit law follows:
Theorem 6 [Paging distribution]
For fixed b³2, the size I_{n}
of the bindex constructed on
a random BST of size n has mean µ_{n}
and variance s_{n}^{2} that satisfy
µ_{n}= 

1,
s_{n}^{2}= 


(n+1).

The random variable I_{n} obeys a central limit law
with speed of convergence O(1/(n)^{1/2}).
3.2 Factored representations of trees
We consider finally a global parameter k[t]
of trees that represents the number of structurally different subtrees
(i.e., number of different subtree shapes)
that occur in t. This parameter
is of intrinsic interest
as an indicator of
the structural ``richness'' of t.
It also measures the optimal storage complexity of tree t
when all common subtrees are factored and represented only once.
Then, k[t] measures the number of nodes
of the maximally factored DAG (directed acyclic graph)
corresponding to t,
a quantity that intervenes in parsing and data compression
applications [5].
The quantity that is actually analysed is the size
of a DAG representation which is partly redundant
(all trees of size less than b
are represented once, irrespective of their possible nonoccurrence
in t) and partially factored (nodes commanding subtrees of size ³ b are
each represented irrespective of the fact that they may be associated
to repeated subtrees).
Theorem 7
The average value of the DAG size of a random BST
of size n satisfies the upper bound,
K_{n} £4log 2 

+O 
æ ç ç è 

ö ÷ ÷ ø 
. 
This upper bound on K_{n} is of the right order, since L. Devroye has
shown a lower bound in O(n/log n) (unpublished
paper, May 97); but the constant is still unknown.
References
 [1]

Devroye (Luc). 
A note on the expected height of binary search trees. Journal of
the ACM, vol. 33, n°3, 1986, pp. 489498.
 [2]

Devroye (Luc). 
Limit laws for local counters in random binary search trees. Random Structures and Algorithms, vol. 2, n°3, 1991,
pp. 303315.
 [3]

Feller (William). 
An Introduction to Probability Theory and Its Applications. 
John Wiley & Sons, New York, 1971, 2nd edition,
vol. II.
 [4]

Flajolet (Philippe), Gourdon (Xavier), and Martínez (Conrado). 
Patterns in random binary search trees. 
Research Report n°2997, Institut National de Recherche en
Informatique et en Automatique, October 1996. 23 pages. To
appear in Random Structures & Algorithms.
 [5]

Flajolet (Philippe), Sipala (Paolo), and Steyaert (JeanMarc). 
Analytic variations on the common subexpression problem. In Paterson
(M. S.) (editor), Automata, Languages and Programming, Lecture
Notes in Computer Science, vol. 443, pp. 220234. 
1990. Proceedings of the 17th ICALP Conference,
Coventry, July 1990.
 [6]

Hille (Einar). 
Analytic function theory. Vol. 1. 
Ginn and Company, Boston, 1959, xi+308p. Introduction
to Higher Mathematics.
 [7]

Knuth (Donald E.). 
The Art of Computer Programming. 
AddisonWesley, 1973, vol. 3: Sorting and Searching.
 [8]

Mahmoud (Hosam M.). 
Evolution of Random Search Trees. 
John Wiley & Sons Inc., New York, 1992, WileyInterscience Series in Discrete Mathematics and Optimization,
xii+324p.
 [9]

Sedgewick (Robert) and Flajolet (Philippe). 
An Introduction to the Analysis of Algorithms. 
AddisonWesley Publishing Company, 1996, 512p.
This document was translated from L^{A}T_{E}X by
H^{E}V^{E}A.