Pólya Urn Models in Random Trees

Hosam M. Mahmoud

The George Washington University

Algorithms Seminar

October 20, 1997

[summary by Marko R. Riedel]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

1   Examples and previous results

Consider an urn that contains balls of k different colors 1,2,...,k. There is a set of evolution rules: (i) a ball is chosen at random from the urn, where all balls are equally likely; (ii) that ball's color or type is noted, and the ball is returned to the urn; (iii) if the ball had color i, aij balls of color j are added to the urn.

Question of interest. What is the composition of the urn after n draws?
The model is encoded in the addition matrix A=[aij], 1£ i,j£ k. The aij may themselves be random, but this talk is concerned exclusively with deterministic aij, i.e., the case of a fixed addition matrix A.

Pólya and Eggenberger (1923) investigated the two-color problem, with A=sI and s a positive integer. Suppose the two colors are red and blue, and let Rn and Bn be the number of red and blue balls after n picks.
Example 1   Set s=2 and start with two red balls and one blue ball. One of eight possible length-3 runs is: Pick blue (probability 1/3), the composition of the urn is now R1=2 and B1=3; Pick blue (probability 3/5), R2=2 and B2=5; Pick red (probability 2/7), R3=4 and B3=7.

What is the typical behavior of Rn and Bn? Bernard Friedman (1949) studied a more general urn, the addition matrix now being A=[
a b
b a
]. Freedman (1965) showed that
Rn*
D
®
 
N ( 0, 1 ) ,   Bn*
D
®
 
N ( 0, 1 ) ,    where     Rn* =
Rn - E[Rn]
(V[Rn])1/2
and Bn* is defined similarly.

2   The connection to random trees

Recall the random permutation model for binary trees, where n keys are inserted into a binary tree such that the root of any subtree is larger than all left and less than all right descendants. We have a uniform distribution on the n! possible key orderings and wish to compute tree statistics associated to this model.

The Poblete-Munro (1985) heuristic suggests that we can obtain a more balanced tree with little extra work: we require that all subtrees on the fringe and of size at most three be balanced. This means that we rebalance size 3 subtrees on the fringe, if necessary. This process yields shorter trees; in fact E[Dn] =(12/7) ln n (compare with 2ln n for standard RBSTs).

2.1   Balanced trees

Work by Yao (1978) on 2-3 trees, Baeza-Yates, Gonnet and Ziviani on other tree statistics Sn shows that we can study E[Sn] by studying fringe configurations of RBSTs, to obtain bounds of the type f1(n) £ E[Sn] £ f2(n). Fringe analysis is based on exact counting of all (sub)trees less than or equal to a given height. The results improve in accuracy as the height is increased.

Mahmoud (1998) has used Pólya urn models to study the Poblete-Munro heuristic. We map fringe configurations to colors. The growth of the tree is modeled by a 3× 3 urn. Suppose an incoming node is placed on one of the four leaves of a balanced subtree on three nodes. It is inserted without rebalancing the tree. Its sibling is a leaf. Suppose the next node is placed at that leaf. No rebalancing is required. Finally suppose that the next node is not placed at the sibling leaf, but rather at a leaf of the previous node. The tree must be rebalanced. We distinguish these three configurations by assigning different colors to the leaves concerned: color 1 to the leaves of any terminal node whose sibling is not a leaf, color 3 to the leaves of any terminal node whose sibling is a leaf, and color 2 to all such leaves. The leaves correspond to balls in a Pólya urn. The complexity measure of an insertion is the number of rotations, call it Rn. The addition matrix of the Pólya urn becomes
A =
é
ê
ê
ë
-2 1 2
4 -1 -2
4 -1 -2
ù
ú
ú
û
.
E.g., if we replace a leaf of color 1, we lose that leaf and recolor its sibling with color 2. The new leaves have color 3. Rn is therefore the number of picks of color 3. The row sums of the addition matrix A form the vector S =[1,1,1]T, which reflects the fact that every BST on n nodes has n+1 leaves.

If an addition matrix A has the property that there exists an m such that all the entries of Am are positive, we say that A is regular. In this particular example, A is not regular; nonetheless Mahmoud (1998) shows that
Rn - 2/7 n
(n)1/2
D
®
 
N ( 0, 66/637 ) .

2.2   m-ary search trees.

Under this model m-1 keys k1, k2, ..., km-1 are placed at the root of the tree. These keys partition the remaining keys into m intervals, (-¥, k1), (k1, k2), ... ,(km-1, +¥), i.e., subtrees. The construction is recursive and the branch factor is m.
Example 2   Let m=3 and consider the keys 9, 16, 4, 23, 11, 10,... The first two keys are placed at the root, the key 4 is placed to the left of 9 and starts a new subtree, 23 is placed to the right of 16, also in a new subtree, and 11 and 10 fall between 9 and 16, starting a new subtree with root intervals (9, 10), (10, 11) and (11, 16). There are three types of nodes (or blocks in a hardware-oriented setting): leaves, nodes that contain a single key, and nodes that contain two keys.
More generally, we ask about Sn, the number of nodes after n insertions, where Sn = åj Xnj and Xnj counts the number of nodes that contain j keys. We construct an urn model by mapping gaps between keys at a node to balls whose color indicates the number of gaps at that node. We can recover the number of nodes of each type from the number of gaps of the corresponding color. For instance, consider a leaf that contains i keys and hence i+1 gaps. We map these gaps to balls of color i+1. Now suppose that i<m-1 and we insert a key at this leaf. We lose i+1 gaps of color i+1 and gain i+2 gaps of color i+2. The addition matrix associated to this model has the following shape:
A =
é
ê
ê
ê
ê
ê
ê
ê
ê
ë
  ·  
 · 
  ·
         
  ··· -r r+1      
  ··· ··· -(r+1) r+2    
          ·  
 · 
  ·
 
m           -1
ù
ú
ú
ú
ú
ú
ú
ú
ú
û
.

The eigenvalues of the addition matrix A
Order the eigenvalues according to their real part, letting l1 be the eigenvalue whose real part is the largest. Athreya and Nay (1972) showed that if the real part of l2 is less than half the real part of l1, a condition guaranteed if A is regular, then the colors have a normal N(0, 1) distribution.

In the urn model associated to m-ary trees, this property holds for m<27, even though the associated urn is not regular. (This suggests that regularity is too strong a precondition for the results of Athreya and Nay.) When m=27, there are two conjugate eigenvalues whose real part is larger than half the real part of l1. More precisely, Lew and Mahmoud (1994) showed that for any sequence c1, c2, ..., ck, where cj is the cost of a node that contains j keys, the vector of random variables Xn = [Xn1, Xn2, ... ,Xnk]T converges to a multivariate normal, i.e.,
Xn - E[Xn]
(n)1/2
D
®
 
MVN(0, L)     for     m=2, 3, ..., 26.

2.3   Paged binary trees

In this model every external node stores at most b keys, while internal nodes store a single key. Overflow on external nodes is processed by splitting the node according to some splitting rule, say by selecting the median and adding two subtrees whose roots store b/2 keys. The corresponding counting problem leads to a differential equation in F(x, y), the super exponential generating function of paged binary trees:
b-1 F(x, y)
xb-1
= æ
ç
ç
è
F(x, y)
x
ö
÷
÷
ø
2



 
.

PBSTs have been considered by Flajolet, Mahmoud and Martínez Parra. Results that are based on Pólya urn models indicate that there is a phase transition at b=118, when the real part of the second eigenvalue of the addition matrix becomes larger than half the real part of the first eigenvalue. Work in progress by Flajolet et al. seeks to construct an interpretation of this fact in the context of generating functions.

3   Case study: plane-oriented recursive trees

This type of tree models a recruiting process where the recruiting probability of a recruiting officer increases with the number of recruits attracted so far, or more generally, where the probability of a node to receive a new node is proportional to its degree, a scenario that Mahmoud (1991) calls ``success breeds success.'' We use plane-oriented recursive trees as the underlying model1, as proposed by Bergeron, Flajolet and Salvy (1992). Every node has outdegree 2k+1, for some k³ 0; k of its children are plane-embedded nodes, and k+1 leaves are placed in the gaps between adjacent nodes.

Consider a chain letter scheme where the acquisition price of a letter is 100F, and copies of the letter are sold at 40F. Given that there are n participants in the scheme, we ask how many of them have just broken even, i.e., sold three letters. Let blue represent insertion slots at nodes that have bought, but not sold a single letter; red, nodes that bought and sold one copy of the letter, green, two copies, and white, three, i.e., broken even, and let Bn, Rn, Gn and Wn be the corresponding RVs. (We start with a single participant, i.e., B0=1, R0=G0=W0=0.) Finally, assume that the success probability of a participant is proportional to the number of letters sold (other models are possible and even reasonable). The addition matrix is easily seen to be
A =
é
ê
ê
ê
ë
0 2 0 0
1 -2 3 0
1 0 -3 4
1 0 0 1
ù
ú
ú
ú
û
.
E.g., if a participant has sold two letters and sells another, the three green insertion slots at the corresponding node are replaced by four white ones, and a new participant who has not sold any copy of his letter yet must be accounted for. Note that A is not regular. It can be shown that Bn is the number of leaves in a random tree of size n+1.

Mahmoud, Smythe and Szymanski (1993) show that
E
æ
ç
ç
è
Bn
Rn
Gn
ö
÷
÷
ø
~
æ
ç
ç
è
1/3
1/6
1/10
ö
÷
÷
ø
(2n+1),
and that the covariance matrix is
Cov(Bn, Rn, Gn) ~
æ
ç
ç
è
1/9 - 8/45 - 1/15
-8/45 23/45 - 11/105
-1/15 -11/105 -179/350
ö
÷
÷
ø
n .
We sketch the proof of this result. Introduce the indicator variables In(B), In(R), In(G), In(W) so that In(B) + In(R)+ In(G)+ In(W) = 1. We now have e.g., Rn = Rn-1 + 2 In(B) - 2 In(R), and hence
E[Rn] = E[Rn-1] + 2 E[In(B)] - 2 E[In(R)].
The expectations of the indicator variables are obtained by conditioning on the n-1 picks that lead to a particular urn (call this s-field Tn-1), so that
E[In(B) | Tn-1] =
Bn-1(Tn-1)
2n-1
  and   E[In(B)] =
E[Bn]
2n-1
.
Substitute to get
E[Rn] = E[Rn-1] + 2
E[Bn]
2n-1
- 2
E[Rn]
2n-1
.

Similar computations for E[Bn], E[Gn] and E[Wn] yield a system of recurrences of the form
[ Bn, Rn ,Gn , Wn]T = F(n) [ Bn-1, Rn-1, Gn-1, Wn-1]T
where F(n) is a matrix that depends on n. This system may be solved asymptotically. (Note that we have made critical use of the fact that the total number of balls in the urn is a function of n, namely 2n+1.)

The computation of the covariance is more involved. Start from Rn = Rn-1 + 2 In(B) - 2 In(R) as before, square both sides and use simple properties of mutually exclusive indicator variables to get
Rn2 = Rn-12 + 4 In(B) Rn-1 - 4 In(R) Rn-1 + 4 In(B) + 4 In(R).
Binary cross products of the four RVs appear on taking expectations, i.e., we develop recurrences for E[Rn2] = E[Rn Rn] and these recurrences involve terms like
E[In(B) Rn-1]= E [ E[ In(B) Rn-1 | Tn-1 ] ]
= E [ Rn-1 E[ In(B) | Tn-1 ] ] = E é
ê
ê
ë
Rn-1
Bn-1
2n-1
ù
ú
ú
û
=
E[Rn-1 Bn-1]
2n-1
The result is a system of recurrences in all binary cross products that yields the desired asymptotics.

Next consider the vector Xi of centered RVs;
Xi =
æ
ç
ç
è
Bi*
Ri*
Gi*
ö
÷
÷
ø
=
æ
ç
ç
è
Bi
Ri
Gi
ö
÷
÷
ø
- (2i+1)
æ
ç
ç
è
1/3
1/6
1/10
ö
÷
÷
ø
Mahmoud, Smythe and Szymanski (1993) show that
Xi
(n)1/2
D
®
 
MVN æ
ç
ç
è
0,
æ
ç
ç
è
1/9 - 8/45 - 1/15
-8/45 23/45 - 11/105
-1/15 -11/105 -179/350
ö
÷
÷
ø
ö
÷
÷
ø
.
The proof uses martingale techniques. Recall that a martingale is a sequence Y1, Y2, Y3, ... of random variables such that E[Yn | Tn-1] = Yn-1. E.g., consider a fair game (``win all or lose all with equal probability, i.e., 1/2''), which gives
E[ Yn | Tn-1] = 0 · Yn-1
1
2
+ 2 Yn-1
1
2
= Yn-1.
Note that E[Yn] = E[Yn-1] = ... = E[Y1] = 0 in this example; this is known as the martingale difference property, because if Y1, Y2, Y3, ... is a martingale, then E[Yn - Yn-1 | Tn-1] = 0. We can reconstruct the martingale from the sequence of first differences, i.e., via åk=1n D Yk = Yn. More generally, we can construct a martingale from any sequence of random variables that has the martingale difference property; this was done e.g., by Régnier (1989) in the context of algorithms, who showed that the cost of Quicksort has a limit distribution.

If E[ D Zi | Ti-1] = 0, then An = åi=1n D Zi is a martingale, because
E[ An| Tn-1]= E é
ê
ê
ë
n
å
i=1
D Zi | Tn-1 ù
ú
ú
û
= E é
ê
ê
ë
D Zn +
n-1
å
i=1
D Zi | Tn-1 ù
ú
ú
û
=
n-1
å
i=1
D Zi = An-1.
By linearity, E[ D Zi | Ti-1] = 0 implies E[ bi D Zi | Ti-1] = 0 for any sequence of constants {bi}, and hence åi=1n bi D Zi is a martingale.

We return to the four-color urn of the chain letter scheme; consider the color blue.
E[ Bi | Ti-1] = E [ Bi-1 + ( 1 - In(B) ) | Ti-1 ] = Bi-1 + 1 - E[In(B) | Ti-1] = Bi-1 + 1 -
1
2i-1
Bi-1
Further manipulation yields
E[ Bi - 1/3 (2i+1) | Ti-1] = ( Bi-1 - 1/3 (2i-1) ) + 1/3 (2i-1) - 1/3 (2i+1)
+ 1 -
1
2i-1
( Bi-1 - 1/3 (2i-1) ) - 1/3
and hence E[ Bi* | Ti-1] = Bi-1* -Bi-1*/(2i-1). Bi* is not a martingale but
Bi* - Bi-1* +
1
2i-1
Bi-1*
is a martingale difference, because E[ D MiB | Tn-1 ] = 0, where we have set
D MiB = Bi* - Bi-1* +
1
2i-1
Bi-1*.
We can construct a martingale from D MiB, sinceåi=1n bi D MiB is a martingale for any sequence {bi}.

The Cramér-Wold device can be used to prove convergence to a multivariate normal. Suppose we seek [ Xn(1),Xn(2),Xn(3),...,Xn(j)]T ®D MVN(0, L). It suffices to prove that any linear combination of the Xn(·) converges to a normal distribution, i.e.,
a1 Xn(1) + a2 Xn(2) + ... + aj Xn(j)
D
®
 
N æ
è
0, s
 
a1, ..., aj
ö
ø
,
a1, a2, ..., aj arbitrary. Here j=3 and we study Wn = a1 Bn* + a2 Rn* + a3 Gn*. Centering the remaining two variables, we have
E[ Bi* - Bi-1* | Ti-1]
= -
1
2i-1
Bi-1*
E[ Ri* - Ri-1* | Ti-1]
= +
2
2i-1
(Bi-1* - Ri-1*)
E[ Gi* - Gi-1* | Ti-1]
= +
3
2i-1
(Ri-1* - Gi-1*).
Introduce the martingale differences
D MiB
= Bi* - Bi-1* +
1
2i-1
Bi-1*
D MiR
= Ri* - Ri-1* -
2
2i-1
(Bi-1* - Ri-1*)
D MiG
= Gi* - Gi-1* -
3
2i-1
(Ri-1* - Gi-1*)
and set Vnk = åi=1k (bin D MiB + cin D MiR + din D MiG ) for arbitrary {bin}, {cin}, {din}. It remains to choose {bin}, {cin} and {din}. Expand Vnn to get
Vnn = bnn æ
ç
ç
è
Bn* - Bn-1* +
1
2n-1
Bn-1* ö
÷
÷
ø
+ ... + æ
ç
ç
è
Bn-1* - Bn-2* +
1
2n-3
Bn-3* ö
÷
÷
ø
+ ···
To obtain the particular linear combination a1 Bn* + a2 Rn* + a3 Gn* we set bnn = a1, so that the term in Bn* is preserved, and choose the remaining bn, i to cancel Bn-1*, Bn-2* etc. This technique can be used to show that given a1, a2 and a3, we may choose constants {bin}, {cin} and {din}, so that
Vnn = ( a1 Bn* + a2 Rn* + a3 Gn* ) - 3/2 c1n + 10/3 d1n,
which by a martingale central limit theorem yields
a1 Bn* + a2 Rn* + a3 Gn*
(n)1/2
~
Vnn
(n)1/2
,     hence    
Wn
(n)1/2
D
®
 
N æ
è
0, s
 
a1, ..., aj
ö
ø
,  
æ
ç
ç
è
Bn*
Rn*
Gn*
ö
÷
÷
ø
D
®
 
MVN(0, L).

Concluding remark

It should be obvious from the highly restricted class of addition matrices that have been considered that an abundance of combinatorial problems and possible addition matrices remain to be analyzed; e.g., apparently simple instances such as [
2 0
3 4
] and [
1 0
1 1
] have so far resisted attack.
1
If we were using the terminology of combinatorial analysis, we would refer to these trees as increasing trees; more precisely, as R-enriched increasing trees, where R is the list structure.

This document was translated from LATEX by HEVEA.