Pólya Urn Models in Random Trees
Hosam M. Mahmoud
The George Washington University
Algorithms Seminar
October 20, 1997
[summary by Marko R. Riedel]
A properly typeset version of this document is available in
postscript and in
pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
1 Examples and previous results
Consider an urn that contains balls of k different colors
1,2,...,k.
There is a set of evolution rules: (i)
a ball is chosen at random from the urn, where all balls are
equally likely; (ii) that ball's color or type is noted, and
the ball is returned to
the urn; (iii) if the ball had color i, aij balls
of color j are
added to the urn.
Question of interest. What is the composition of the urn after
n draws?
The model is encoded in the addition matrix A=[aij],
1£ i,j£ k.
The aij may themselves be random, but this talk is concerned
exclusively with deterministic aij, i.e., the case of a fixed
addition matrix A.
Pólya and Eggenberger (1923) investigated the two-color problem,
with A=sI and s a positive integer.
Suppose the two colors are red and blue, and
let Rn and Bn be the number of red and blue balls after n picks.
Example 1
Set s=2 and start with two red balls and one blue ball. One of eight
possible length-3 runs is:
Pick blue (probability 1/3),
the composition of the urn is now R1=2 and B1=3;
Pick blue (probability 3/5),
R2=2 and B2=5;
Pick red (probability 2/7),
R3=4 and B3=7.
What is the typical behavior of Rn and Bn?
Bernard Friedman (1949) studied a more general urn, the addition
matrix now being A=[].
Freedman (1965) showed that
Rn* |
|
N |
( |
0, 1 |
) |
,
Bn* |
|
N |
( |
0, 1 |
) |
, where
Rn* = |
|
and Bn* is defined similarly.
2 The connection to random trees
Recall the random permutation model for binary trees, where n keys
are inserted into a binary tree such that the root of any subtree is
larger than all left and less than all right descendants. We have a
uniform distribution on the n! possible key orderings and wish to
compute tree statistics associated to this model.
The Poblete-Munro (1985) heuristic suggests that we can obtain a more
balanced tree with little extra work: we require that all subtrees on
the fringe and of size at most three be balanced. This means that we
rebalance size 3 subtrees on the fringe, if necessary.
This process yields shorter trees; in fact E[Dn] =(12/7) ln n
(compare with 2ln n for standard RBSTs).
2.1 Balanced trees
Work by Yao (1978) on 2-3 trees, Baeza-Yates, Gonnet and Ziviani on
other tree statistics Sn shows that we can study E[Sn] by
studying fringe configurations of RBSTs, to obtain bounds of the
type f1(n) £ E[Sn] £ f2(n).
Fringe analysis is based on exact counting of all (sub)trees less than
or equal to a given height.
The results improve in accuracy as the height is increased.
Mahmoud (1998) has used Pólya urn models to study the Poblete-Munro
heuristic. We map fringe configurations to colors. The growth of the
tree is modeled by a 3× 3 urn.
Suppose an incoming node is placed on one of the four leaves of a
balanced subtree on three nodes. It is inserted without rebalancing
the tree. Its sibling is a leaf. Suppose the next node is placed at
that leaf. No rebalancing is required. Finally suppose that the next
node is not placed at the sibling leaf, but rather at a leaf of the
previous node. The tree must be rebalanced.
We distinguish these three configurations by assigning different
colors to the leaves concerned: color 1 to the leaves of any
terminal node whose sibling is not a leaf, color 3 to the leaves of
any terminal node whose sibling is a leaf, and color 2 to all such
leaves. The leaves correspond to balls in a Pólya urn.
The complexity measure of an insertion is the number of rotations,
call it Rn.
The addition matrix of the Pólya urn becomes
E.g., if we replace a leaf of color 1, we lose that leaf and recolor
its sibling with color 2. The new leaves have color 3.
Rn is therefore the number of picks of color 3.
The row sums of the addition matrix A form the vector S =[1,1,1]T,
which reflects the fact that every BST on n nodes has n+1
leaves.
If an addition matrix A has the property that there exists an m
such that all the entries of Am are positive, we say that A is
regular.
In this particular example, A is not regular; nonetheless
Mahmoud (1998) shows that
2.2 m-ary search trees.
Under this model m-1 keys k1, k2, ..., km-1 are placed at
the root of the tree. These keys partition the remaining keys into
m intervals, (-¥, k1), (k1, k2), ... ,(km-1, +¥),
i.e., subtrees.
The construction is recursive and the branch factor is m.
Example 2
Let m=3 and consider the keys 9, 16, 4, 23, 11, 10,...
The first two keys are placed at the root, the key 4 is placed to
the left of 9 and starts a new subtree, 23 is placed to the right
of 16, also in a new subtree, and 11 and 10 fall between 9 and
16, starting a new subtree with root intervals (9, 10), (10, 11)
and (11, 16).
There are three types of nodes (or blocks in a hardware-oriented
setting): leaves, nodes that contain a single key, and nodes that
contain two keys.
More generally, we ask about Sn, the number of nodes after n
insertions, where Sn = åj Xnj and Xnj counts the number
of nodes that contain j keys.
We construct an urn model by mapping gaps between keys at a node to
balls whose color indicates the number of gaps at that node. We can
recover the number of nodes of each type from the number of gaps of
the corresponding color.
For instance,
consider a leaf that contains i keys and hence i+1 gaps. We map
these gaps to balls of color i+1. Now suppose that i<m-1 and we
insert a key at this leaf. We lose i+1 gaps of color i+1 and gain
i+2 gaps of color i+2.
The addition matrix associated to this model has the following shape:
A =
|
é ê ê ê ê ê ê ê ê ë |
|
· · · |
|
|
|
|
|
|
··· |
-r |
r+1 |
|
|
|
|
··· |
··· |
-(r+1) |
r+2 |
|
|
|
|
|
|
|
· · · |
|
m |
|
|
|
|
|
-1 |
|
ù ú ú ú ú ú ú ú ú û |
|
. |
The eigenvalues of the addition matrix A
Order the eigenvalues according to their real part, letting
l1 be the eigenvalue whose real part is the largest.
Athreya and Nay (1972) showed that if the real part of l2 is
less than half the real part of l1, a condition guaranteed if
A is regular, then the colors have a normal N(0, 1) distribution.
In the urn model associated to m-ary trees, this property holds for
m<27, even though the associated urn is not regular. (This suggests
that regularity is too strong a precondition for the results of
Athreya and Nay.) When m=27, there are two conjugate eigenvalues
whose real part is larger than half the real part of l1.
More precisely, Lew and Mahmoud (1994) showed that for any sequence
c1, c2, ..., ck, where cj is the cost of
a node that contains j keys, the vector of random variables
Xn = [Xn1, Xn2, ... ,Xnk]T
converges to a multivariate normal, i.e.,
|
|
|
MVN(0, L)
for m=2, 3, ..., 26. |
2.3 Paged binary trees
In this model every external node stores at most b keys, while
internal nodes store a single key. Overflow on external nodes is
processed by splitting the node according to some splitting rule, say
by selecting the median and adding two subtrees whose roots store
b/2 keys. The corresponding counting problem leads to a differential
equation in F(x, y), the super exponential generating function of
paged binary trees:
PBSTs have been considered by Flajolet, Mahmoud and Martínez
Parra. Results that are based on Pólya urn models indicate that
there is a phase transition at b=118, when the real part of the
second eigenvalue of the addition matrix becomes larger than half the
real part of the first eigenvalue. Work in progress by Flajolet et al. seeks to construct an interpretation of this fact in
the context of generating functions.
3 Case study: plane-oriented recursive trees
This type of tree models a recruiting process where the recruiting
probability of a recruiting officer increases with the number of
recruits attracted so far, or more generally, where the probability of
a node to receive a new node is proportional to its degree, a scenario
that Mahmoud (1991) calls ``success breeds success.'' We use
plane-oriented recursive trees as the underlying model1, as proposed
by Bergeron, Flajolet and Salvy (1992). Every node has outdegree
2k+1, for some k³ 0; k of its children are plane-embedded
nodes, and k+1 leaves are placed in the gaps between adjacent nodes.
Consider a chain letter scheme where the acquisition price of a letter
is 100F, and copies of the letter are sold at 40F. Given that there
are n participants in the scheme, we ask how many of them have just
broken even, i.e., sold three letters. Let blue represent insertion
slots at nodes that have bought, but not sold a single letter; red,
nodes that bought and sold one copy of the letter, green, two copies,
and white, three, i.e., broken even, and let Bn, Rn, Gn and
Wn be the corresponding RVs. (We start with a single participant,
i.e., B0=1, R0=G0=W0=0.) Finally, assume that the success
probability of a participant is proportional to the number of letters
sold (other models are possible and even reasonable). The addition
matrix is easily seen to be
A = |
é ê ê ê ë |
0 |
2 |
0 |
0 |
1 |
-2 |
3 |
0 |
1 |
0 |
-3 |
4 |
1 |
0 |
0 |
1 |
|
ù ú ú ú û |
|
. |
E.g., if a participant has sold two letters and sells another, the
three green insertion slots at the corresponding node are replaced by
four white ones, and a new participant who has not sold any copy of
his letter yet must be accounted for. Note that A is not regular.
It can be shown that Bn is the number of leaves in a random tree of
size n+1.
Mahmoud, Smythe and Szymanski (1993) show that
and that the covariance matrix is
Cov(Bn, Rn, Gn)
~
|
æ ç ç è |
1/9 |
- 8/45 |
- 1/15 |
-8/45 |
23/45 |
- 11/105 |
-1/15 |
-11/105 |
-179/350 |
|
ö ÷ ÷ ø |
|
n
. |
We sketch the proof of this result. Introduce the indicator
variables In(B), In(R), In(G), In(W) so that
In(B) + In(R)+ In(G)+ In(W) = 1.
We now have e.g., Rn = Rn-1 + 2 In(B) - 2 In(R),
and hence
E[Rn] = E[Rn-1] + 2 E[In(B)] - 2 E[In(R)].
The expectations of the indicator variables are obtained by
conditioning on the n-1 picks that lead to a particular urn (call
this s-field Tn-1), so that
E[In(B) | Tn-1] =
|
|
and
E[In(B)] = |
|
. |
Substitute to get
E[Rn] =
E[Rn-1] + 2 |
|
- 2 |
|
. |
Similar computations for E[Bn], E[Gn] and E[Wn] yield a
system of recurrences of the form
[ Bn, Rn ,Gn , Wn]T
= F(n) [ Bn-1, Rn-1, Gn-1, Wn-1]T
where F(n) is a matrix that depends on n.
This system may be solved asymptotically.
(Note that we have made critical use of the fact that the total number
of balls in the urn is a function of n, namely 2n+1.)
The computation of the covariance is more involved. Start from
Rn = Rn-1 + 2 In(B) - 2 In(R)
as before, square both sides and use simple properties of mutually
exclusive indicator variables to get
Rn2 = Rn-12 + 4 In(B) Rn-1 - 4 In(R) Rn-1
+ 4 In(B) + 4 In(R).
Binary cross products of the four RVs appear on taking expectations,
i.e., we develop recurrences for E[Rn2] = E[Rn Rn]
and these recurrences involve terms like
E[In(B) Rn-1]=
E |
[ |
E[
In(B) Rn-1 | Tn-1
]
|
] |
|
=
E |
[ |
Rn-1
E[
In(B) | Tn-1
]
|
] |
=
E |
é ê ê ë |
Rn-1
|
|
|
ù ú ú û |
=
|
|
|
The result is a system of recurrences in all binary cross products
that yields the desired asymptotics.
Next consider the vector Xi of centered RVs;
Mahmoud, Smythe and Szymanski (1993) show that
|
|
|
|
MVN |
æ ç ç è |
0,
|
æ ç ç è |
1/9 |
- 8/45 |
- 1/15 |
-8/45 |
23/45 |
- 11/105 |
-1/15 |
-11/105 |
-179/350 |
|
ö ÷ ÷ ø |
|
ö ÷ ÷ ø |
. |
The proof uses martingale techniques. Recall that a martingale is a
sequence Y1, Y2, Y3, ... of random variables such that
E[Yn | Tn-1] = Yn-1.
E.g., consider a fair game (``win all or lose all with equal
probability, i.e., 1/2''), which gives
E[ Yn | Tn-1] = 0 · Yn-1 |
|
+ 2 Yn-1 |
|
= Yn-1. |
Note that E[Yn] = E[Yn-1] = ... = E[Y1] = 0
in this example; this is known as the martingale difference
property, because if Y1, Y2, Y3, ... is a martingale, then
E[Yn - Yn-1 | Tn-1] = 0.
We can reconstruct the martingale from the sequence of first
differences, i.e., via åk=1n D Yk = Yn.
More generally, we can construct a martingale from any sequence of
random variables that has the martingale difference property; this was
done e.g., by Régnier (1989) in the context of algorithms, who showed
that the cost of Quicksort has a limit distribution.
If E[ D Zi | Ti-1] = 0, then An = åi=1n
D Zi is a martingale, because
E[ An| Tn-1]=
E |
é ê ê ë |
|
|
D Zi | Tn-1 |
ù ú ú û |
=
E |
é ê ê ë |
D Zn + |
|
D Zi
| Tn-1 |
ù ú ú û |
= |
|
D Zi
= An-1.
|
By linearity, E[ D Zi | Ti-1] = 0 implies
E[ bi D Zi | Ti-1] = 0
for any sequence of constants {bi}, and hence
åi=1n bi D Zi
is a martingale.
We return to the four-color urn of the chain letter scheme; consider
the color blue.
E[ Bi | Ti-1] =
E |
[ |
Bi-1 + |
( |
1 - In(B) |
) |
| Ti-1 |
] |
= Bi-1 + 1 - E[In(B) | Ti-1]
= Bi-1 + 1 - |
|
Bi-1 |
|
Further manipulation yields
E[ Bi - 1/3 (2i+1) | Ti-1] =
|
( |
Bi-1 - 1/3 (2i-1) |
) |
+ 1/3 (2i-1)
- 1/3 (2i+1) |
|
+ 1
- |
|
|
( |
Bi-1 - 1/3 (2i-1) |
) |
- 1/3 |
|
and hence
E[ Bi* | Ti-1] = Bi-1* -Bi-1*/(2i-1).
Bi* is not a martingale but
is a martingale difference, because
E[ D MiB | Tn-1 ] = 0,
where we have set
D MiB = Bi* - Bi-1* + |
|
Bi-1*. |
We can construct a martingale from D MiB, sinceåi=1n bi D MiB
is a martingale for any sequence {bi}.
The Cramér-Wold device can be used to prove convergence to a
multivariate normal. Suppose we seek
[ Xn(1),Xn(2),Xn(3),...,Xn(j)]T
®D MVN(0, L).
It suffices to prove that any linear combination of the
Xn(·) converges to a normal distribution, i.e.,
a1 Xn(1) + a2 Xn(2) + ... +
aj Xn(j)
|
|
N |
æ è |
0, s |
|
ö ø |
, |
a1, a2, ..., aj arbitrary.
Here j=3 and we study Wn = a1 Bn* + a2 Rn* +
a3 Gn*. Centering the remaining two variables, we have
E[ Bi* - Bi-1* | Ti-1] |
|
E[ Ri* - Ri-1* | Ti-1] |
|
E[ Gi* - Gi-1* | Ti-1] |
|
Introduce the martingale differences
D MiB |
|
D MiR |
=
Ri* - Ri-1* - |
|
(Bi-1* - Ri-1*) |
|
D MiG |
=
Gi* - Gi-1* - |
|
(Ri-1* - Gi-1*) |
|
and set Vnk =
åi=1k (bin D MiB + cin D MiR +
din D MiG )
for arbitrary {bin}, {cin},
{din}.
It remains to choose {bin}, {cin}
and {din}.
Expand Vnn to get
Vnn = bnn
|
æ ç ç è |
Bn* - Bn-1* + |
|
Bn-1* |
ö ÷ ÷ ø |
+ ... +
|
æ ç ç è |
Bn-1* - Bn-2* + |
|
Bn-3* |
ö ÷ ÷ ø |
+ ··· |
To obtain the particular linear combination
a1 Bn* + a2 Rn* + a3 Gn*
we set bnn = a1, so that the term in Bn* is preserved,
and choose the remaining bn, i to cancel Bn-1*,
Bn-2* etc.
This technique can be used to show that given a1, a2 and
a3, we may choose constants {bin},
{cin} and {din}, so that
Vnn =
|
( |
a1 Bn* + a2 Rn* + a3 Gn* |
) |
- 3/2 c1n + 10/3 d1n, |
which by a martingale central limit theorem yields
|
a1 Bn* + a2 Rn* + a3 Gn* |
|
(n)1/2 |
|
~ |
|
,
hence |
|
|
|
N |
æ è |
0, s |
|
ö ø |
,
|
|
|
|
MVN(0, L).
|
Concluding remark
It should be obvious from the highly restricted
class of addition matrices that have been considered that an abundance
of combinatorial problems and possible addition matrices remain to be
analyzed; e.g., apparently simple instances such
as []
and []
have so far resisted attack.
- 1
- If we
were using the terminology of combinatorial analysis, we would refer
to these trees as increasing trees; more precisely, as R-enriched
increasing trees, where R is the list structure.
This document was translated from LATEX by
HEVEA.