Pólya Urn Models in Random Trees

Pólya Urn Models in Random Trees

Hosam M. Mahmoud

The George Washington University

Algorithms Seminar

October 20, 1997

[summary by Marko R. Riedel]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).

1 Examples and previous results

Consider an urn that contains balls of k different colors 1,2,...,k. There is a set of evolution rules: (i) a ball is chosen at random from the urn, where all balls are equally likely; (ii) that ball's color or type is noted, and the ball is returned to the urn; (iii) if the ball had color i, a_ij balls of color j are added to the urn.

Question of interest. What is the composition of the urn after n draws?

The model is encoded in the addition matrix A=[a_ij], 1£ i,j£ k. The a_ij may themselves be random, but this talk is concerned exclusively with deterministic a_ij, i.e., the case of a fixed addition matrix A.

Pólya and Eggenberger (1923) investigated the two-color problem, with A=sI and s a positive integer. Suppose the two colors are red and blue, and let R_n and B_n be the number of red and blue balls after n picks.

Example 1 Set s=2 and start with two red balls and one blue ball. One of eight possible length-3 runs is: Pick blue (probability 1/3), the composition of the urn is now R₁=2 and B₁=3; Pick blue (probability 3/5), R₂=2 and B₂=5; Pick red (probability 2/7), R₃=4 and B₃=7.

What is the typical behavior of R_n and B_n? Bernard Friedman (1949) studied a more general urn, the addition matrix now being A=[

a	b
b	a

]. Freedman (1965) showed that

R_n^*

(

0, 1

)

, B_n^*

(

0, 1

)

, where R_n^* =

R_n - E[R_n]

(V[R_n])^1/2

and B_n^* is defined similarly.

2 The connection to random trees

Recall the random permutation model for binary trees, where n keys are inserted into a binary tree such that the root of any subtree is larger than all left and less than all right descendants. We have a uniform distribution on the n! possible key orderings and wish to compute tree statistics associated to this model.

The Poblete-Munro (1985) heuristic suggests that we can obtain a more balanced tree with little extra work: we require that all subtrees on the fringe and of size at most three be balanced. This means that we rebalance size 3 subtrees on the fringe, if necessary. This process yields shorter trees; in fact E[D_n] =(12/7) ln n (compare with 2ln n for standard RBSTs).

2.1 Balanced trees

Work by Yao (1978) on 2-3 trees, Baeza-Yates, Gonnet and Ziviani on other tree statistics S_n shows that we can study E[S_n] by studying fringe configurations of RBSTs, to obtain bounds of the type f₁(n) £ E[S_n] £ f₂(n). Fringe analysis is based on exact counting of all (sub)trees less than or equal to a given height. The results improve in accuracy as the height is increased.

Mahmoud (1998) has used Pólya urn models to study the Poblete-Munro heuristic. We map fringe configurations to colors. The growth of the tree is modeled by a 3× 3 urn. Suppose an incoming node is placed on one of the four leaves of a balanced subtree on three nodes. It is inserted without rebalancing the tree. Its sibling is a leaf. Suppose the next node is placed at that leaf. No rebalancing is required. Finally suppose that the next node is not placed at the sibling leaf, but rather at a leaf of the previous node. The tree must be rebalanced. We distinguish these three configurations by assigning different colors to the leaves concerned: color 1 to the leaves of any terminal node whose sibling is not a leaf, color 3 to the leaves of any terminal node whose sibling is a leaf, and color 2 to all such leaves. The leaves correspond to balls in a Pólya urn. The complexity measure of an insertion is the number of rotations, call it R_n. The addition matrix of the Pólya urn becomes

A =

é
ê
ê
ë

-2	1	2
4	-1	-2
4	-1	-2

ù
ú
ú
û

E.g., if we replace a leaf of color 1, we lose that leaf and recolor its sibling with color 2. The new leaves have color 3. R_n is therefore the number of picks of color 3. The row sums of the addition matrix A form the vector S =[1,1,1]^T, which reflects the fact that every BST on n nodes has n+1 leaves.

If an addition matrix A has the property that there exists an m such that all the entries of A^m are positive, we say that A is regular. In this particular example, A is not regular; nonetheless Mahmoud (1998) shows that

R_n - 2/7 n

(n)^1/2

(

0, 66/637

)

2.2 m-ary search trees.

Under this model m-1 keys k₁, k₂, ..., k_m-1 are placed at the root of the tree. These keys partition the remaining keys into m intervals, (-¥, k₁), (k₁, k₂), ... ,(k_m-1, +¥), i.e., subtrees. The construction is recursive and the branch factor is m.

Example 2 Let m=3 and consider the keys 9, 16, 4, 23, 11, 10,... The first two keys are placed at the root, the key 4 is placed to the left of 9 and starts a new subtree, 23 is placed to the right of 16, also in a new subtree, and 11 and 10 fall between 9 and 16, starting a new subtree with root intervals (9, 10), (10, 11) and (11, 16). There are three types of nodes (or blocks in a hardware-oriented setting): leaves, nodes that contain a single key, and nodes that contain two keys.

More generally, we ask about S_n, the number of nodes after n insertions, where S_n = å_j X_n^j and X_n^j counts the number of nodes that contain j keys. We construct an urn model by mapping gaps between keys at a node to balls whose color indicates the number of gaps at that node. We can recover the number of nodes of each type from the number of gaps of the corresponding color. For instance, consider a leaf that contains i keys and hence i+1 gaps. We map these gaps to balls of color i+1. Now suppose that i<m-1 and we insert a key at this leaf. We lose i+1 gaps of color i+1 and gain i+2 gaps of color i+2. The addition matrix associated to this model has the following shape:

A =

é
ê
ê
ê
ê
ê
ê
ê
ê
ë

	· · ·
	···	-r	r+1
	···	···	-(r+1)	r+2
					· · ·
m						-1

ù
ú
ú
ú
ú
ú
ú
ú
ú
û

The eigenvalues of the addition matrix A

Order the eigenvalues according to their real part, letting l₁ be the eigenvalue whose real part is the largest. Athreya and Nay (1972) showed that if the real part of l₂ is less than half the real part of l₁, a condition guaranteed if A is regular, then the colors have a normal N(0, 1) distribution.

In the urn model associated to m-ary trees, this property holds for m<27, even though the associated urn is not regular. (This suggests that regularity is too strong a precondition for the results of Athreya and Nay.) When m=27, there are two conjugate eigenvalues whose real part is larger than half the real part of l₁. More precisely, Lew and Mahmoud (1994) showed that for any sequence c₁, c₂, ..., c_k, where c_j is the cost of a node that contains j keys, the vector of random variables X_n = [X_n¹, X_n², ... ,X_n^k]^T converges to a multivariate normal, i.e.,

X_n - E[X_n]

(n)^1/2

MVN(0, L) for m=2, 3, ..., 26.

2.3 Paged binary trees

In this model every external node stores at most b keys, while internal nodes store a single key. Overflow on external nodes is processed by splitting the node according to some splitting rule, say by selecting the median and adding two subtrees whose roots store b/2 keys. The corresponding counting problem leads to a differential equation in F(x, y), the super exponential generating function of paged binary trees:

¶^b-1 F(x, y)

¶ x^b-1

æ
ç
ç
è

¶ F(x, y)

¶ x

ö
÷
÷
ø

PBSTs have been considered by Flajolet, Mahmoud and Martínez Parra. Results that are based on Pólya urn models indicate that there is a phase transition at b=118, when the real part of the second eigenvalue of the addition matrix becomes larger than half the real part of the first eigenvalue. Work in progress by Flajolet et al. seeks to construct an interpretation of this fact in the context of generating functions.

3 Case study: plane-oriented recursive trees

This type of tree models a recruiting process where the recruiting probability of a recruiting officer increases with the number of recruits attracted so far, or more generally, where the probability of a node to receive a new node is proportional to its degree, a scenario that Mahmoud (1991) calls ``success breeds success.'' We use plane-oriented recursive trees as the underlying model¹, as proposed by Bergeron, Flajolet and Salvy (1992). Every node has outdegree 2k+1, for some k³ 0; k of its children are plane-embedded nodes, and k+1 leaves are placed in the gaps between adjacent nodes.

Consider a chain letter scheme where the acquisition price of a letter is 100F, and copies of the letter are sold at 40F. Given that there are n participants in the scheme, we ask how many of them have just broken even, i.e., sold three letters. Let blue represent insertion slots at nodes that have bought, but not sold a single letter; red, nodes that bought and sold one copy of the letter, green, two copies, and white, three, i.e., broken even, and let B_n, R_n, G_n and W_n be the corresponding RVs. (We start with a single participant, i.e., B₀=1, R₀=G₀=W₀=0.) Finally, assume that the success probability of a participant is proportional to the number of letters sold (other models are possible and even reasonable). The addition matrix is easily seen to be

A =

é
ê
ê
ê
ë

0	2	0	0
1	-2	3	0
1	0	-3	4
1	0	0	1

ù
ú
ú
ú
û

E.g., if a participant has sold two letters and sells another, the three green insertion slots at the corresponding node are replaced by four white ones, and a new participant who has not sold any copy of his letter yet must be accounted for. Note that A is not regular. It can be shown that B_n is the number of leaves in a random tree of size n+1.

Mahmoud, Smythe and Szymanski (1993) show that

æ
ç
ç
è

B_n

R_n

G_n

ö
÷
÷
ø

æ
ç
ç
è

1/3

1/6

1/10

ö
÷
÷
ø

(2n+1),

and that the covariance matrix is

Cov(B_n, R_n, G_n) ~

æ
ç
ç
è

1/9	- 8/45	- 1/15
-8/45	23/45	- 11/105
-1/15	-11/105	-179/350

ö
÷
÷
ø

n .

We sketch the proof of this result. Introduce the indicator variables I_n^(B), I_n^(R), I_n^(G), I_n^(W) so that I_n^(B) + I_n^(R)+ I_n^(G)+ I_n^(W) = 1. We now have e.g., R_n = R_n-1 + 2 I_n^(B) - 2 I_n^(R), and hence

E[R_n] = E[R_n-1] + 2 E[I_n^(B)] - 2 E[I_n^(R)].

The expectations of the indicator variables are obtained by conditioning on the n-1 picks that lead to a particular urn (call this s-field T_n-1), so that

E[I_n^(B) | T_n-1] =

B_n-1(T_n-1)

2n-1

and E[I_n^(B)] =

E[B_n]

2n-1

Substitute to get

E[R_n] = E[R_n-1] + 2

E[B_n]

2n-1

- 2

E[R_n]

2n-1

Similar computations for E[B_n], E[G_n] and E[W_n] yield a system of recurrences of the form

[ B_n, R_n ,G_n , W_n]^T = F(n) [ B_n-1, R_n-1, G_n-1, W_n-1]^T

where F(n) is a matrix that depends on n. This system may be solved asymptotically. (Note that we have made critical use of the fact that the total number of balls in the urn is a function of n, namely 2n+1.)

The computation of the covariance is more involved. Start from R_n = R_n-1 + 2 I_n^(B) - 2 I_n^(R) as before, square both sides and use simple properties of mutually exclusive indicator variables to get

R_n² = R_n-1² + 4 I_n^(B) R_n-1 - 4 I_n^(R) R_n-1 + 4 I_n^(B) + 4 I_n^(R).

Binary cross products of the four RVs appear on taking expectations, i.e., we develop recurrences for E[R_n²] = E[R_n R_n] and these recurrences involve terms like

E[I_n^(B) R_n-1]= E

[

E[ I_n^(B) R_n-1 | T_n-1 ]

]

= E

[

R_n-1 E[ I_n^(B) | T_n-1 ]

]

= E

é
ê
ê
ë

R_n-1

B_n-1

2n-1

ù
ú
ú
û

E[R_n-1 B_n-1]

2n-1

The result is a system of recurrences in all binary cross products that yields the desired asymptotics.

Next consider the vector X_i of centered RVs;

X_i =

æ
ç
ç
è

B_i^*

R_i^*

G_i^*

ö
÷
÷
ø

æ
ç
ç
è

B_i

R_i

G_i

ö
÷
÷
ø

- (2i+1)

æ
ç
ç
è

1/3

1/6

1/10

ö
÷
÷
ø

Mahmoud, Smythe and Szymanski (1993) show that

X_i

(n)^1/2

MVN

æ
ç
ç
è

1/9	- 8/45	- 1/15
-8/45	23/45	- 11/105
-1/15	-11/105	-179/350

ö
÷
÷
ø

The proof uses martingale techniques. Recall that a martingale is a sequence Y₁, Y₂, Y₃, ... of random variables such that E[Y_n | T_n-1] = Y_n-1. E.g., consider a fair game (``win all or lose all with equal probability, i.e., 1/2''), which gives

E[ Y_n | T_n-1] = 0 · Y_n-1

+ 2 Y_n-1

= Y_n-1.

Note that E[Y_n] = E[Y_n-1] = ... = E[Y₁] = 0 in this example; this is known as the martingale difference property, because if Y₁, Y₂, Y₃, ... is a martingale, then E[Y_n - Y_n-1 | T_n-1] = 0. We can reconstruct the martingale from the sequence of first differences, i.e., via å_k=1ⁿ D Y_k = Y_n. More generally, we can construct a martingale from any sequence of random variables that has the martingale difference property; this was done e.g., by Régnier (1989) in the context of algorithms, who showed that the cost of Quicksort has a limit distribution.

If E[ D Z_i | T_i-1] = 0, then A_n = å_i=1ⁿ D Z_i is a martingale, because

E[ A_n| T_n-1]= E

é
ê
ê
ë

i=1

D Z_i | T_n-1

ù
ú
ú
û

= E

é
ê
ê
ë

D Z_n +

n-1

i=1

D Z_i | T_n-1

ù
ú
ú
û

n-1

i=1

D Z_i = A_n-1.

By linearity, E[ D Z_i | T_i-1] = 0 implies E[ b_i D Z_i | T_i-1] = 0 for any sequence of constants {b_i}, and hence å_i=1ⁿ b_i D Z_i is a martingale.

We return to the four-color urn of the chain letter scheme; consider the color blue.

E[ B_i | T_i-1] = E

[

B_i-1 +

(

1 - I_n^(B)

)

| T_i-1

]

= B_i-1 + 1 - E[I_n^(B) | T_i-1] = B_i-1 + 1 -

2i-1

B_i-1

Further manipulation yields

E[ B_i - 1/3 (2i+1) | T_i-1] =

(

B_i-1 - 1/3 (2i-1)

)

+ 1/3 (2i-1) - 1/3 (2i+1)

+ 1 -

2i-1

(

B_i-1 - 1/3 (2i-1)

)

- 1/3

and hence E[ B_i^* | T_i-1] = B_i-1^* -B_i-1^*/(2i-1). B_i^* is not a martingale but

B_i^* - B_i-1^* +

2i-1

B_i-1^*

is a martingale difference, because E[ D M_i^B | T_n-1 ] = 0, where we have set

D M_i^B = B_i^* - B_i-1^* +

2i-1

B_i-1^*.

We can construct a martingale from D M_i^B, sinceå_i=1ⁿ b_i D M_i^B is a martingale for any sequence {b_i}.

The Cramér-Wold device can be used to prove convergence to a multivariate normal. Suppose we seek [ X_n⁽¹⁾,X_n⁽²⁾,X_n⁽³⁾,...,X_n^(j)]^T ®^D MVN(0, L). It suffices to prove that any linear combination of the X_n⁽^·⁾ converges to a normal distribution, i.e.,

a₁ X_n⁽¹⁾ + a₂ X_n⁽²⁾ + ... + a_j X_n^(j)

æ
è

0, s

a₁, ..., a_j

ö
ø

a₁, a₂, ..., a_j arbitrary. Here j=3 and we study W_n = a₁ B_n^* + a₂ R_n^* + a₃ G_n^*. Centering the remaining two variables, we have

E[ B_i^* - B_i-1^* | T_i-1]

= -

2i-1

B_i-1^*

E[ R_i^* - R_i-1^* | T_i-1]

= +

2i-1

(B_i-1^* - R_i-1^*)

E[ G_i^* - G_i-1^* | T_i-1]

= +

2i-1

(R_i-1^* - G_i-1^*).

Introduce the martingale differences

D M_i^B

= B_i^* - B_i-1^* +

2i-1

B_i-1^*

D M_i^R

= R_i^* - R_i-1^* -

2i-1

(B_i-1^* - R_i-1^*)

D M_i^G

= G_i^* - G_i-1^* -

2i-1

(R_i-1^* - G_i-1^*)

and set V_nk = å_i=1^k (b_in D M_i^B + c_in D M_i^R + d_in D M_i^G ) for arbitrary {b_in}, {c_in}, {d_in}. It remains to choose {b_in}, {c_in} and {d_in}. Expand V_nn to get

V_nn = b_nn

æ
ç
ç
è

B_n^* - B_n-1^* +

2n-1

B_n-1^*

ö
÷
÷
ø

+ ... +

æ
ç
ç
è

B_n-1^* - B_n-2^* +

2n-3

B_n-3^*

ö
÷
÷
ø

+ ···

To obtain the particular linear combination a₁ B_n^* + a₂ R_n^* + a₃ G_n^* we set b_nn = a₁, so that the term in B_n^* is preserved, and choose the remaining b_{n, i} to cancel B_n-1^*, B_n-2^* etc. This technique can be used to show that given a₁, a₂ and a₃, we may choose constants {b_in}, {c_in} and {d_in}, so that

V_nn =

(

a₁ B_n^* + a₂ R_n^* + a₃ G_n^*

)

- 3/2 c_1n + 10/3 d_1n,

which by a martingale central limit theorem yields

a₁ B_n^* + a₂ R_n^* + a₃ G_n^*

(n)^1/2

V_nn

(n)^1/2

, hence

W_n

(n)^1/2

æ
è

0, s

a₁, ..., a_j

ö
ø

æ
ç
ç
è

B_n^*

R_n^*

G_n^*

ö
÷
÷
ø

MVN(0, L).

Concluding remark

It should be obvious from the highly restricted class of addition matrices that have been considered that an abundance of combinatorial problems and possible addition matrices remain to be analyzed; e.g., apparently simple instances such as [

2	0
3	4

] and [

1	0
1	1

] have so far resisted attack.

1: If we were using the terminology of combinatorial analysis, we would refer to these trees as increasing trees; more precisely, as R-enriched increasing trees, where R is the list structure.

This document was translated from L^AT_EX by H^EV^EA.