Pólya Urn Models in Random Trees
Hosam M. Mahmoud
The George Washington University
Algorithms Seminar
October 20, 1997
[summary by Marko R. Riedel]
A properly typeset version of this document is available in
postscript and in
pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
1 Examples and previous results
Consider an urn that contains balls of k different colors
1,2,...,k.
There is a set of evolution rules: (i)
a ball is chosen at random from the urn, where all balls are
equally likely; (ii) that ball's color or type is noted, and
the ball is returned to
the urn; (iii) if the ball had color i, a_{ij} balls
of color j are
added to the urn.
Question of interest. What is the composition of the urn after
n draws?
The model is encoded in the addition matrix A=[a_{ij}],
1£ i,j£ k.
The a_{ij} may themselves be random, but this talk is concerned
exclusively with deterministic a_{ij}, i.e., the case of a fixed
addition matrix A.
Pólya and Eggenberger (1923) investigated the twocolor problem,
with A=sI and s a positive integer.
Suppose the two colors are red and blue, and
let R_{n} and B_{n} be the number of red and blue balls after n picks.
Example 1
Set s=2 and start with two red balls and one blue ball. One of eight
possible length3 runs is:
Pick blue (probability 1/3),
the composition of the urn is now R_{1}=2 and B_{1}=3;
Pick blue (probability 3/5),
R_{2}=2 and B_{2}=5;
Pick red (probability 2/7),
R_{3}=4 and B_{3}=7.
What is the typical behavior of R_{n} and B_{n}?
Bernard Friedman (1949) studied a more general urn, the addition
matrix now being A=[].
Freedman (1965) showed that
R_{n}^{*} 

N 
( 
0, 1 
) 
,
B_{n}^{*} 

N 
( 
0, 1 
) 
, where
R_{n}^{*} = 
R_{n}  E[R_{n}] 

(V[R_{n}])^{1/2} 

and B_{n}^{*} is defined similarly.
2 The connection to random trees
Recall the random permutation model for binary trees, where n keys
are inserted into a binary tree such that the root of any subtree is
larger than all left and less than all right descendants. We have a
uniform distribution on the n! possible key orderings and wish to
compute tree statistics associated to this model.
The PobleteMunro (1985) heuristic suggests that we can obtain a more
balanced tree with little extra work: we require that all subtrees on
the fringe and of size at most three be balanced. This means that we
rebalance size 3 subtrees on the fringe, if necessary.
This process yields shorter trees; in fact E[D_{n}] =(12/7) ln n
(compare with 2ln n for standard RBSTs).
2.1 Balanced trees
Work by Yao (1978) on 23 trees, BaezaYates, Gonnet and Ziviani on
other tree statistics S_{n} shows that we can study E[S_{n}] by
studying fringe configurations of RBSTs, to obtain bounds of the
type f_{1}(n) £ E[S_{n}] £ f_{2}(n).
Fringe analysis is based on exact counting of all (sub)trees less than
or equal to a given height.
The results improve in accuracy as the height is increased.
Mahmoud (1998) has used Pólya urn models to study the PobleteMunro
heuristic. We map fringe configurations to colors. The growth of the
tree is modeled by a 3× 3 urn.
Suppose an incoming node is placed on one of the four leaves of a
balanced subtree on three nodes. It is inserted without rebalancing
the tree. Its sibling is a leaf. Suppose the next node is placed at
that leaf. No rebalancing is required. Finally suppose that the next
node is not placed at the sibling leaf, but rather at a leaf of the
previous node. The tree must be rebalanced.
We distinguish these three configurations by assigning different
colors to the leaves concerned: color 1 to the leaves of any
terminal node whose sibling is not a leaf, color 3 to the leaves of
any terminal node whose sibling is a leaf, and color 2 to all such
leaves. The leaves correspond to balls in a Pólya urn.
The complexity measure of an insertion is the number of rotations,
call it R_{n}.
The addition matrix of the Pólya urn becomes
E.g., if we replace a leaf of color 1, we lose that leaf and recolor
its sibling with color 2. The new leaves have color 3.
R_{n} is therefore the number of picks of color 3.
The row sums of the addition matrix A form the vector S =[1,1,1]^{T},
which reflects the fact that every BST on n nodes has n+1
leaves.
If an addition matrix A has the property that there exists an m
such that all the entries of A^{m} are positive, we say that A is
regular.
In this particular example, A is not regular; nonetheless
Mahmoud (1998) shows that
2.2 mary search trees.
Under this model m1 keys k_{1}, k_{2}, ..., k_{m1} are placed at
the root of the tree. These keys partition the remaining keys into
m intervals, (¥, k_{1}), (k_{1}, k_{2}), ... ,(k_{m1}, +¥),
i.e., subtrees.
The construction is recursive and the branch factor is m.
Example 2
Let m=3 and consider the keys 9, 16, 4, 23, 11, 10,...
The first two keys are placed at the root, the key 4 is placed to
the left of 9 and starts a new subtree, 23 is placed to the right
of 16, also in a new subtree, and 11 and 10 fall between 9 and
16, starting a new subtree with root intervals (9, 10), (10, 11)
and (11, 16).
There are three types of nodes (or blocks in a hardwareoriented
setting): leaves, nodes that contain a single key, and nodes that
contain two keys.
More generally, we ask about S_{n}, the number of nodes after n
insertions, where S_{n} = å_{j} X_{n}^{j} and X_{n}^{j} counts the number
of nodes that contain j keys.
We construct an urn model by mapping gaps between keys at a node to
balls whose color indicates the number of gaps at that node. We can
recover the number of nodes of each type from the number of gaps of
the corresponding color.
For instance,
consider a leaf that contains i keys and hence i+1 gaps. We map
these gaps to balls of color i+1. Now suppose that i<m1 and we
insert a key at this leaf. We lose i+1 gaps of color i+1 and gain
i+2 gaps of color i+2.
The addition matrix associated to this model has the following shape:
A =

é ê ê ê ê ê ê ê ê ë 

· · · 






··· 
r 
r+1 




··· 
··· 
(r+1) 
r+2 







· · · 

m 





1 

ù ú ú ú ú ú ú ú ú û 

. 
The eigenvalues of the addition matrix A
Order the eigenvalues according to their real part, letting
l_{1} be the eigenvalue whose real part is the largest.
Athreya and Nay (1972) showed that if the real part of l_{2} is
less than half the real part of l_{1}, a condition guaranteed if
A is regular, then the colors have a normal N(0, 1) distribution.
In the urn model associated to mary trees, this property holds for
m<27, even though the associated urn is not regular. (This suggests
that regularity is too strong a precondition for the results of
Athreya and Nay.) When m=27, there are two conjugate eigenvalues
whose real part is larger than half the real part of l_{1}.
More precisely, Lew and Mahmoud (1994) showed that for any sequence
c_{1}, c_{2}, ..., c_{k}, where c_{j} is the cost of
a node that contains j keys, the vector of random variables
X_{n} = [X_{n}^{1}, X_{n}^{2}, ... ,X_{n}^{k}]^{T}
converges to a multivariate normal, i.e.,
X_{n}  E[X_{n}] 

(n)^{1/2} 



MVN(0, L)
for m=2, 3, ..., 26. 
2.3 Paged binary trees
In this model every external node stores at most b keys, while
internal nodes store a single key. Overflow on external nodes is
processed by splitting the node according to some splitting rule, say
by selecting the median and adding two subtrees whose roots store
b/2 keys. The corresponding counting problem leads to a differential
equation in F(x, y), the super exponential generating function of
paged binary trees:

¶^{b1} F(x, y) 

¶ x^{b1} 

=

æ ç ç è 

ö ÷ ÷ ø 

. 
PBSTs have been considered by Flajolet, Mahmoud and Martínez
Parra. Results that are based on Pólya urn models indicate that
there is a phase transition at b=118, when the real part of the
second eigenvalue of the addition matrix becomes larger than half the
real part of the first eigenvalue. Work in progress by Flajolet et al. seeks to construct an interpretation of this fact in
the context of generating functions.
3 Case study: planeoriented recursive trees
This type of tree models a recruiting process where the recruiting
probability of a recruiting officer increases with the number of
recruits attracted so far, or more generally, where the probability of
a node to receive a new node is proportional to its degree, a scenario
that Mahmoud (1991) calls ``success breeds success.'' We use
planeoriented recursive trees as the underlying model^{1}, as proposed
by Bergeron, Flajolet and Salvy (1992). Every node has outdegree
2k+1, for some k³ 0; k of its children are planeembedded
nodes, and k+1 leaves are placed in the gaps between adjacent nodes.
Consider a chain letter scheme where the acquisition price of a letter
is 100F, and copies of the letter are sold at 40F. Given that there
are n participants in the scheme, we ask how many of them have just
broken even, i.e., sold three letters. Let blue represent insertion
slots at nodes that have bought, but not sold a single letter; red,
nodes that bought and sold one copy of the letter, green, two copies,
and white, three, i.e., broken even, and let B_{n}, R_{n}, G_{n} and
W_{n} be the corresponding RVs. (We start with a single participant,
i.e., B_{0}=1, R_{0}=G_{0}=W_{0}=0.) Finally, assume that the success
probability of a participant is proportional to the number of letters
sold (other models are possible and even reasonable). The addition
matrix is easily seen to be
A = 
é ê ê ê ë 
0 
2 
0 
0 
1 
2 
3 
0 
1 
0 
3 
4 
1 
0 
0 
1 

ù ú ú ú û 

. 
E.g., if a participant has sold two letters and sells another, the
three green insertion slots at the corresponding node are replaced by
four white ones, and a new participant who has not sold any copy of
his letter yet must be accounted for. Note that A is not regular.
It can be shown that B_{n} is the number of leaves in a random tree of
size n+1.
Mahmoud, Smythe and Szymanski (1993) show that
and that the covariance matrix is
Cov(B_{n}, R_{n}, G_{n})
~

æ ç ç è 
1/9 
 8/45 
 1/15 
8/45 
23/45 
 11/105 
1/15 
11/105 
179/350 

ö ÷ ÷ ø 

n
. 
We sketch the proof of this result. Introduce the indicator
variables I_{n}^{(B)}, I_{n}^{(R)}, I_{n}^{(G)}, I_{n}^{(W)} so that
I_{n}^{(B)} + I_{n}^{(R)}+ I_{n}^{(G)}+ I_{n}^{(W)} = 1.
We now have e.g., R_{n} = R_{n1} + 2 I_{n}^{(B)}  2 I_{n}^{(R)},
and hence
E[R_{n}] = E[R_{n1}] + 2 E[I_{n}^{(B)}]  2 E[I_{n}^{(R)}].
The expectations of the indicator variables are obtained by
conditioning on the n1 picks that lead to a particular urn (call
this sfield T_{n1}), so that
E[I_{n}^{(B)}  T_{n1}] =


and
E[I_{n}^{(B)}] = 

. 
Substitute to get
E[R_{n}] =
E[R_{n1}] + 2 

 2 

. 
Similar computations for E[B_{n}], E[G_{n}] and E[W_{n}] yield a
system of recurrences of the form
[ B_{n}, R_{n} ,G_{n} , W_{n}]^{T}
= F(n) [ B_{n1}, R_{n1}, G_{n1}, W_{n1}]^{T}
where F(n) is a matrix that depends on n.
This system may be solved asymptotically.
(Note that we have made critical use of the fact that the total number
of balls in the urn is a function of n, namely 2n+1.)
The computation of the covariance is more involved. Start from
R_{n} = R_{n1} + 2 I_{n}^{(B)}  2 I_{n}^{(R)}
as before, square both sides and use simple properties of mutually
exclusive indicator variables to get
R_{n}^{2} = R_{n1}^{2} + 4 I_{n}^{(B)} R_{n1}  4 I_{n}^{(R)} R_{n1}
+ 4 I_{n}^{(B)} + 4 I_{n}^{(R)}.
Binary cross products of the four RVs appear on taking expectations,
i.e., we develop recurrences for E[R_{n}^{2}] = E[R_{n} R_{n}]
and these recurrences involve terms like
E[I_{n}^{(B)} R_{n1}]=
E 
[ 
E[
I_{n}^{(B)} R_{n1}  T_{n1}
]

] 

=
E 
[ 
R_{n1}
E[
I_{n}^{(B)}  T_{n1}
]

] 
=
E 
é ê ê ë 
R_{n1}



ù ú ú û 
=



The result is a system of recurrences in all binary cross products
that yields the desired asymptotics.
Next consider the vector X_{i} of centered RVs;
X_{i} =

æ ç ç è 
B_{i}^{*} 
R_{i}^{*} 
G_{i}^{*} 

ö ÷ ÷ ø 

=


 (2i+1)


Mahmoud, Smythe and Szymanski (1993) show that




MVN 
æ ç ç è 
0,

æ ç ç è 
1/9 
 8/45 
 1/15 
8/45 
23/45 
 11/105 
1/15 
11/105 
179/350 

ö ÷ ÷ ø 

ö ÷ ÷ ø 
. 
The proof uses martingale techniques. Recall that a martingale is a
sequence Y_{1}, Y_{2}, Y_{3}, ... of random variables such that
E[Y_{n}  T_{n1}] = Y_{n1}.
E.g., consider a fair game (``win all or lose all with equal
probability, i.e., 1/2''), which gives
E[ Y_{n}  T_{n1}] = 0 · Y_{n1} 

+ 2 Y_{n1} 

= Y_{n1}. 
Note that E[Y_{n}] = E[Y_{n1}] = ... = E[Y_{1}] = 0
in this example; this is known as the martingale difference
property, because if Y_{1}, Y_{2}, Y_{3}, ... is a martingale, then
E[Y_{n}  Y_{n1}  T_{n1}] = 0.
We can reconstruct the martingale from the sequence of first
differences, i.e., via å_{k=1}^{n} D Y_{k} = Y_{n}.
More generally, we can construct a martingale from any sequence of
random variables that has the martingale difference property; this was
done e.g., by Régnier (1989) in the context of algorithms, who showed
that the cost of Quicksort has a limit distribution.
If E[ D Z_{i}  T_{i1}] = 0, then A_{n} = å_{i=1}^{n}
D Z_{i} is a martingale, because
E[ A_{n} T_{n1}]=
E 
é ê ê ë 


D Z_{i}  T_{n1} 
ù ú ú û 
=
E 
é ê ê ë 
D Z_{n} + 

D Z_{i}
 T_{n1} 
ù ú ú û 
= 

D Z_{i}
= A_{n1}.

By linearity, E[ D Z_{i}  T_{i1}] = 0 implies
E[ b_{i} D Z_{i}  T_{i1}] = 0
for any sequence of constants {b_{i}}, and hence
å_{i=1}^{n} b_{i} D Z_{i}
is a martingale.
We return to the fourcolor urn of the chain letter scheme; consider
the color blue.
E[ B_{i}  T_{i1}] =
E 
[ 
B_{i1} + 
( 
1  I_{n}^{(B)} 
) 
 T_{i1} 
] 
= B_{i1} + 1  E[I_{n}^{(B)}  T_{i1}]
= B_{i1} + 1  

B_{i1} 

Further manipulation yields
E[ B_{i}  1/3 (2i+1)  T_{i1}] =

( 
B_{i1}  1/3 (2i1) 
) 
+ 1/3 (2i1)
 1/3 (2i+1) 

+ 1
 


( 
B_{i1}  1/3 (2i1) 
) 
 1/3 

and hence
E[ B_{i}^{*}  T_{i1}] = B_{i1}^{*} B_{i1}^{*}/(2i1).
B_{i}^{*} is not a martingale but
B_{i}^{*}  B_{i1}^{*} + 

B_{i1}^{*} 
is a martingale difference, because
E[ D M_{i}^{B}  T_{n1} ] = 0,
where we have set
D M_{i}^{B} = B_{i}^{*}  B_{i1}^{*} + 

B_{i1}^{*}. 
We can construct a martingale from D M_{i}^{B}, sinceå_{i=1}^{n} b_{i} D M_{i}^{B}
is a martingale for any sequence {b_{i}}.
The CramérWold device can be used to prove convergence to a
multivariate normal. Suppose we seek
[ X_{n}^{(1)},X_{n}^{(2)},X_{n}^{(3)},...,X_{n}^{(j)}]^{T}
®^{D} MVN(0, L).
It suffices to prove that any linear combination of the
X_{n}^{(}^{·}^{)} converges to a normal distribution, i.e.,
a_{1} X_{n}^{(1)} + a_{2} X_{n}^{(2)} + ... +
a_{j} X_{n}^{(j)}


N 
æ è 
0, s 

ö ø 
, 
a_{1}, a_{2}, ..., a_{j} arbitrary.
Here j=3 and we study W_{n} = a_{1} B_{n}^{*} + a_{2} R_{n}^{*} +
a_{3} G_{n}^{*}. Centering the remaining two variables, we have
E[ B_{i}^{*}  B_{i1}^{*}  T_{i1}] 

E[ R_{i}^{*}  R_{i1}^{*}  T_{i1}] 
=
+ 

(B_{i1}^{*}  R_{i1}^{*}) 

E[ G_{i}^{*}  G_{i1}^{*}  T_{i1}] 
=
+ 

(R_{i1}^{*}  G_{i1}^{*}). 

Introduce the martingale differences
D M_{i}^{B} 
= B_{i}^{*}  B_{i1}^{*} + 

B_{i1}^{*} 

D M_{i}^{R} 
=
R_{i}^{*}  R_{i1}^{*}  

(B_{i1}^{*}  R_{i1}^{*}) 

D M_{i}^{G} 
=
G_{i}^{*}  G_{i1}^{*}  

(R_{i1}^{*}  G_{i1}^{*}) 

and set V_{nk} =
å_{i=1}^{k} (b_{in} D M_{i}^{B} + c_{in} D M_{i}^{R} +
d_{in} D M_{i}^{G} )
for arbitrary {b_{in}}, {c_{in}},
{d_{in}}.
It remains to choose {b_{in}}, {c_{in}}
and {d_{in}}.
Expand V_{nn} to get
V_{nn} = b_{nn}

æ ç ç è 
B_{n}^{*}  B_{n1}^{*} + 

B_{n1}^{*} 
ö ÷ ÷ ø 
+ ... +

æ ç ç è 
B_{n1}^{*}  B_{n2}^{*} + 

B_{n3}^{*} 
ö ÷ ÷ ø 
+ ··· 
To obtain the particular linear combination
a_{1} B_{n}^{*} + a_{2} R_{n}^{*} + a_{3} G_{n}^{*}
we set b_{nn} = a_{1}, so that the term in B_{n}^{*} is preserved,
and choose the remaining b_{n, i} to cancel B_{n1}^{*},
B_{n2}^{*} etc.
This technique can be used to show that given a_{1}, a_{2} and
a_{3}, we may choose constants {b_{in}},
{c_{in}} and {d_{in}}, so that
V_{nn} =

( 
a_{1} B_{n}^{*} + a_{2} R_{n}^{*} + a_{3} G_{n}^{*} 
) 
 3/2 c_{1n} + 10/3 d_{1n}, 
which by a martingale central limit theorem yields

a_{1} B_{n}^{*} + a_{2} R_{n}^{*} + a_{3} G_{n}^{*} 

(n)^{1/2} 

~ 

,
hence 



N 
æ è 
0, s 

ö ø 
,

æ ç ç è 
B_{n}^{*} 
R_{n}^{*} 
G_{n}^{*} 

ö ÷ ÷ ø 



MVN(0, L).

Concluding remark
It should be obvious from the highly restricted
class of addition matrices that have been considered that an abundance
of combinatorial problems and possible addition matrices remain to be
analyzed; e.g., apparently simple instances such
as []
and []
have so far resisted attack.
 1
 If we
were using the terminology of combinatorial analysis, we would refer
to these trees as increasing trees; more precisely, as Renriched
increasing trees, where R is the list structure.
This document was translated from L^{A}T_{E}X by
H^{E}V^{E}A.