A Top-Down Analysis of Fringe-Balanced Binary Search Trees

Helmut Prodinger

Technical University of Vienna

Algorithms Seminar

May 25, 1998

[summary by Michèle Soria]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

Abstract
Fringe-balanced binary search trees are obtained by performing rotations only on subtrees of size three. The parameter ``number of rotations'' has recently been studied by Mahmoud [3], using a Pólya urn model. This talk, based on [5] proposes a top-down approach of the problem, that leads to a differential equation. The solution is related to the Weierstrass' Ã-function. This fact allows to derive the asymptotic normality of the parameter by means of Hwang's quasi-power theorem [2]. An alternative way of obtaining the exact expectation and variance, which relies on operator calculus, is also presented.



It is well-known that in a random binary search tree constructed by insertion at the leaves, the average depth of a node is logarithmic in the size of the tree, so that retrieval of the data stored in the nodes can be done efficiently. One simple way to improve the speed of retrieval even more is to compress the subtrees near the leaves by doing a fringe-balanced rotation: whenever a son is appended to a node that itself is a single son (its ``brother'' is an external node), a rotation of the three nodes is performed to place the median of the three elements as the root of the subtree and the other two elements as sons. Therefore all subtrees of size 3 in the tree are complete.

The distribution of the number of rotations that are made when constructing such a fringe-balanced binary tree under the random permutation model was recently analyzed by Mahmoud [3], using a Pólya urn model, and a central limit theorem for urn models by Smythe [6]. Here is presented an alternative way, based on analytic methods, of proving the Gaussian limiting distribution. This presentation follows [5].

1   Top-Down Approach

The recursive top-down analysis (see [4] for various uses of this approach) begins with a recurrence relation based on splitting probabilities. When constructing a fringe-balanced tree from a random permutation, the first three elements of the permutation determine the root of the final tree, as well as whether or not there is a rotation at the root (a rotation occurs in four cases out of six).

Hence the splitting probability pn,k, which is the probability that in a tree of size n the root is the node k, is given by
pn,k =
(k-1) (n-k)
æ
è
n
3
ö
ø
for n ³ 3 and 1 £ k £ n, p2,1 = p2,2 =1/2 and p1,1=1. And we also get a recurrence relation for the probability Fn,m, that the number of rotations is m, when generating a fringe balanced tree of size n, starting with an empty tree (the number of rotations to construct the root of the tree is 1 with probability 2/3, and 0 with probability 1/3). Thus for n ³ 3 and 1 £ m £ n:
Fn,m =
1
3
n
å
k=1
pn,k
m
å
l=0
Fk-1,l Fn-k,m-l +
2
3
n
å
k=1
pn,k
m-1
å
l=0
Fk-1,l Fn-k,m-1-l     (1)
with initial values F0,0=1, F1,0=1, F2,0=1 and Fn,m=0 otherwise.

Introducing the probability generating function F(z,v) = ån,m ³ 0 Fn,m zn vm this recurrence leads to the differential equation 1/6 3/ z3F(z,v) = (1/3+2/3v) (/ z F(z,v))2, with initial conditions F(0,v)=1, ./ zF(z,v)|z=0 = 1 and .2/ z2F(z,v)|z=0 = 2.

Substituting G(z,v) = / z F(z,v) this differential equation can be rewritten as
2
z2
G(z,v) = (2+4v) G(z,v)2     (2)
with G(0,v)=1 and ./ zG(z,v)|z=0 = 2.

1.1   Moments

The moments of the distribution are obtained by differentiating equation (2) with respect to v, and evaluating at v=1. Let
M1(z) =
v
G(z,v) ½
½
½
½
 



v=1
=
 
å
n ³ 0
n Mn(1) zn-1,    and     M2(z) =
2
v2
G(z,v) ½
½
½
½
 



v=1
=
 
å
n ³ 0
n Mn(2) zn-1,
where Mn(1) and Mn(2) denote the first and second factorial moments of the number of rotations.

M1(z) and M2(z) both satisfy an Euler differential equation. Extracting the coefficients of the solutions M1(z) and M2(z) leads to exact values for the expectation Mn = Mn(1) and the variance Varn = Mn(2) + Mn (1) - (Mn (1))2:
Theorem 1   The expectation and the variance of the number of rotations when generating a fringe balanced binary search tree of size n are given by
Mn=
2
7
n -
8
21
  (n ³ 6),     Varn =
66
637
n -
680
5733
  (n ³ 12).

1.2   Limiting Distribution

Equation (2) transforms into
4
3
(1+2v) G3(z,v) +
8
3
(1-v) = æ
ç
ç
è
z
G(z,v) ö
÷
÷
ø
2



 
,
from which we get the implicitly given solution of G(z,v):
z =
G(z,v)
ó
õ
1
dx/
(
4
3
(1+2v) x3 +
8
3
(1-v))1/2
.
This form shows a close relation between G(z,v) and the Weierstrass' Ã-function, which can be characterized, within a simply connected domain of C¥, which contains no zeros of the denominator, by the integral z = òÃ(z)¥ dx/(4x3-g2x-g3)1/2. Constants g2 and g3 are called the invariants of Ã.

Indeed, making the substitution
t=(
1+2v
3
)1/2 æ
ç
ç
ç
ç
ç
è
z-
¥
ó
õ
1
dx
(
4
3
(1+2v)x3 +
8
3
(1-v))1/2
ö
÷
÷
÷
÷
÷
ø
º(
1+2v
3
)1/2(z-s(v)),
it is shown in [5] that W(t,v)=G(z(t),v) is a Weierstrass' Ã-function with invariants g2 = 0 and g3 = -8(1-v)/(1+2v).

Since Ã(z) has a double pole at z =0, it follows that G(z,v) has a double pole at z=s(v), where it admits the local expansion G(z,v)=3/1+2v (z-s(v))-2 + O (z-s(v))4 .

Integrating term by term, we obtain the expansion of F(z,v):
F(z,v)=
1
1+2v
3
s(v) æ
ç
ç
è
1-
z
s(v)
ö
÷
÷
ø
+ O ( z-s(v) )
5
 
 
.
Singularity analysis leads immediately to the expansion of the coefficients:
[zn] F(z,v) =
1
1+2v
3
s(v)
s(v)-n + æ
ç
ç
è
1 + O æ
ç
ç
è
1
n4
ö
÷
÷
ø
ö
÷
÷
ø
    (3)
Performing a series expansion of the integrand, the integral s(v) can be expressed in terms of a hypergeometric function:
(
1+2v
3
)1/2s(v) = 2F1 æ
ç
ç
ç
ç
ç
è
1
2
,
1
6
7
6
½
½
½
½
½
½
½
-
2(1-v)
1+2v
ö
÷
÷
÷
÷
÷
ø
.
In (3), the probability generating function of the number of rotations is given in a form that satisfies the hypothesis of Hwang's quasi-power theorem [2], so that we finally get the central limit theorem:
Theorem 2   The distribution Xn of the number of rotations, when generating a fringe balanced binary search tree is asymptotically Gaussian:
Pr ì
ï
ï
í
ï
ï
î
Xn -
2
7
n
(
66
637
n)1/2
£ x ü
ï
ï
ý
ï
ï
þ
= F(x) + O æ
ç
ç
è
1
(n)1/2
ö
÷
÷
ø
.

2   Urn Model and Operator Calculus

Insertions in fringe-balanced binary search trees can be translated into an urn model of Pólya. The urn contains balls of three different colors, corresponding to the leaves of the tree: binary subtrees of size 3 (which are always complete) have four leaves of color 1; in subtrees of size 2, the two deepest leaves are colored by 3, and the third one is colored by 2. We start with an urn containing two balls of color 1, corresponding to a starting tree with one internal node. Inserting at a leaf of color 1, i.e. picking a ball of color 1, results in replacing two leaves of color 1 by one leaf of color 2 and two leaves of color 3. Inserting at a leaf of color 2 results in replacing this leaf and its two associated leaves of color 3 by four leaves of color 1. In the same manner, inserting at a leaf of color 3 results in replacing two leaves of color 3 and one leaf of color 2 by four leaves of color 1. Thus the process of insertion translates into the ball addition matrix A, whose (i,j) entry is the number of balls of type j to be added when a ball of color i is picked:
A =
æ
ç
ç
è
-2 1 2
4 -1 -2
4 -1 -2
ö
÷
÷
ø
.
Working on the addition matrix, Mahmoud [3] obtains the exact averages and covariances for the number of balls in the urn after n picks, as well as the exact average and variance of the number of rotations after n random insertions in an empty fringe balanced tree.

These results can also be obtained by operator calculus [1]. Define the random variables Xn(k) to denote the number of balls of color k in the urn, when n+1 elements are in the urn altogether (after n-1 random picks). Only the values E{Xn(2)} and V{Xn(2)} must be calculated, since all other values follow by the relations Xn(3) = 2 Xn(2) and Xn(1) + Xn(2) + Xn(3) = n+1. We denote by pn,k the probability P{Xn(2) = k} and by Pn(u) the generating function åk ³ 0 pn,k uk. The urn picking process leads to the recursion
Pn+1(u) =
 
å
k ³ 0
pn,k é
ê
ê
ë
3k
n+1
uk-1 + æ
ç
ç
è
1-
3k
n+1
ö
÷
÷
ø
uk+1 ù
ú
ú
û
=
 
å
k ³ 0
æ
ç
ç
è
u +
3
n+1
(1-u2) D ö
÷
÷
ø
pn,k uk
where D denotes the differential operator d/du. If evaluation at u=1 is denoted by operator U, we get then for n ³ 6:
E{Xn+1(2)} = U D Pn+1(u) = æ
ç
ç
è
U + æ
ç
ç
è
1-
6
n+1
ö
÷
÷
ø
U D ö
÷
÷
ø
Pn(u) = 1 +
n-5
n+1
E{Xn(2)}
with E{X6(2)}=1. This linear recursion is easily solved and we get E{Xn(2)} = 1/7 (n+1) for n ³ 6. A similar computation holds for the second factorial moment E{Xn+1(2) (Xn+1(2)-1)} = U D2 Pn+1(u), from which the variance is obtained.

The average and variance of the number of rotations can also be treated in this way. Let the random variable Rn denote the number of rotations made when constructing a fringe balanced tree with n elements (or, in the urn model, the number of picks of elements with color 3 after n-1 random picks). Introduce the bivariate generating functions Rn(u,v) = åk,l pn,k,l uk vl, where pn,k,l denotes the probability, that after n-1 random picks k balls in the urn are of color 2 and l times a ball of color 3 was chosen. Following the picking process, the recursion for Rn is
Rn+1(u,v) =
 
å
k,l
pn,k,l æ
ç
ç
è
k
n+1
uk-1vl +
2k
n+1
uk-1vl+1 + æ
ç
ç
è
1-
3k
n+1
ö
÷
÷
ø
uk+1vl ö
÷
÷
ø
.
Denoting by Du and Dv the differential operator w.r.t. u resp. v and by Uu and Uv the evaluations at u=1 resp. v=1, we get for the expectation E{Rn+1(u,v)} = Uu Uv Dv Rn+1(u,v)
E{Rn+1(u,v)} = æ
ç
ç
è
2
n+1
UuUvDu + UuUvDv ö
÷
÷
ø
Rn(u,v) = E{Rn(u,v)} +
2
n+1
E{Xn(2)}.
The average value of Rn is obtained by solving this recursion, and similar computations lead to the variance.

References

[1]
Greene (D. H.) and Knuth (D. E.). -- Mathematics for the analysis of algorithms. -- Birkhäuser, Boston, 1982, second edition.

[2]
Hwang (H.-K.). -- Théorèmes limites pour les structures combinatoires et les fonctions arithmetiques. -- PhD thesis, École Polytechnique, December 1994.

[3]
Mahmoud (H.). -- On rotations in fringe-balanced binary search trees. Information Processing Letters, vol. 65, 1998, pp. 41--46.

[4]
Martinez (C.), Panholzer (A.), and Prodinger (H.). -- On the number of descendants and ascendants in random search trees. Electronic Journal of Combinatorics, vol. 5(1):#R20, 1998.

[5]
Panholzer (A.) and Prodinger (H.). -- An analytic approach for the analysis of rotations in fringe-balanced binary search trees. -- To appear in Annals of Combinatorics, 1998.

[6]
Smythe (R.). -- Central limit theorems for urn models. Stochastic Processes and their Applications, vol. 65, 1996, pp. 115--137.

This document was translated from LATEX by HEVEA.