Hyperharmonic Numbers and the Phratry of the Coupon Collector
Dominique Foata
Département de mathématique, Université Louis Pasteur (France)
Algorithms Seminar
May 21, 2001
[summary by Cyril Banderier]
A properly typeset version of this document is available in
postscript and in
pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
Abstract
The classical coupon-collector problem is here extended to the case
where the collector shares his harvest with other members of his
phratry. She (!) remains the single buyer, but she gives to his
brothers all the pictures that she got in double. When her album is
filled, her brothers' albums have some empty places. How many in
average? Dominique Foata (in a joint work with Guoniu Han and Bodo
Lass) answers this question via an expression for the multivariate
generating function. The problem is related to hyperharmonic numbers,
that are studied here as solutions of finite differences equations.
1 Coupons Collector
A cleaver firm sells chocolate, with a picture (or ``coupon'') of a
famous cricket player in each bar. In total, there are m different
pictures to collect and each picture appears with probability 1/m.
Mr. and Mrs. Brown have r sons and one daughter, chocolate and
cricket addicts. The girl (she's the oldest) is the only one to buy
chocolate. She tries to complete her collection. When she gets a new
picture, she puts it in her album, and when she gets a double, she
gives it to her oldest brother, and when this one gets a double, he
gives it to the remaining oldest brother, and so on. After having
bought T bars of chocolate, the girl has completed her album, and it
remains MT(i) empty places in the album of the ith brother
(i=1,...,r). Let Xn(k) be the number of coupons which
appeared exactly k times until time n. As
MT(k)=XT(1)+...+XT(k), the distribution of
(T,MT(1),...,MT(r)) is then totally determined by the
distribution of (T,XT(1),...,XT(r)).
The question is to find formulae and asymptotics for MT(i), or
equivalently for XT(k).
There is a lot of ways to solve coupon-collector-like problems.
One can distinguish three main approaches:
-
formal approach: combinatoricians indeed used
a language-theory approach (shuffle products and Laplace transforms [2]
or manipulation of regular expressions [6]);
- probabilistic approach: a lot of folklore results are established via basic probabilistic considerations,
and more sophisticated tools such as martingales theory were also
useful [5];
- matricial approach: this is in fact a mixture of the two
precedent approaches, which is exploited to solve a Markovian
generalization of the coupon-collector problem in [1] (with
Perron--Frobenius theory and approximation of integrals).
This is with a combinatorial approach (enumeration of surjections and
formal Laplace transform) that Foata et al. obtain
in [4] the multivariate generating function of the
coupon-collector problem, from which they derive the formulae for the
expectations of E[T] and E[XT(k)].
2 The Multivariate Generating Function
Theorem 1
The generating function of the random variables
T, XT(1), ..., XT(r) is
|
|
P{T=l,XT(1)=n1,...,XT(r)=nr}
tl u1n1... urnr= |
u1t å |
|
(-1)b |
æ
ç
ç
è |
|
æ
ç
ç
è |
|
ö
÷
÷
ø |
ck |
ö
÷
÷
ø |
|
|
æ
ç
ç
è |
|
kck |
ö
÷
÷
ø |
!.
|
The proof follows from setting ui=1 (for i ³ r+1), expanding
(with Newton multinomial formula), and applying the Laplace transform to
|
|
|
|
|
p(s)
=mu1 |
æ
ç
ç
è |
|
ui |
|
ö
÷
÷
ø |
m-1. |
The formal Laplace transform is defined as a linear map such that
L (gn tn/n!)=gn tn. This implies
L (exp(a t) tn)= n! tn/(1-a t)n+1.
The set S(l,m;n) is defined as a subset of the surjections from [ 1,...,l ] to [ 1,...,m ]
for which sÎ S(l,m;n) implies i is reached ni times and
the restriction of s to [ 1,...,l-1 ] is still a surjection from [ 1,...,l-1 ] to [ 1,...,m ]\{s(l)}.
The weight p of a surjection SÎ S(l,m;n) is defined by p(s)=Õuini.
3 Hyperharmonic Numbers
In order to complete one collection, it is well known that the average
number of needed bars is
For example, when there are m=50 different pictures, E[T]=50
H50»50×4.5»225 and thus the daughter has a lot of
doubles and we can expect that the oldest brother has almost completed
his album with the 175 remaining pictures.
Pintacuda [5] proved with martingale theory that
E[MT(1)]=Hm.
Foata et al. prove
Theorem 2
For k³ 2, the average number of empty places in the kth brother's album is
E |
[MT(k)]=1+ |
|
Km(i)
where |
Km(k)= |
|
|
|
, (k³ 1, m³ 3) |
with the following initial conditions
K2(k)=1/2k (for k³ 0) and Km(0)=1 (for m³ 2).
A first derivation of this result follows of Theorem 1.
Another proof is in two steps: first get the generating function for
the Km(k) (end of this section) and then prove that this
generating function is also the one of the coupon-collector problem
(next section).
Consider the rising factorial defined by (a)n= a (a+1)... (a+n-1)
if n³ 1 and (a)0=1. An hypergeometric function with respect to
two lists (a1,...,ar) and (b1,...,bs) is defined as the
function given by the series
rFs |
æ
è |
|
;x |
ö
ø |
:= |
|
|
(a1)n...(ar)n |
|
(b1)n...(bs)n |
|
|
. |
The authors prove that the numbers Km(k) (that they call
``hyperharmonic numbers'') satisfy
Km(k)= |
|
k+2Fk+1 |
æ
è |
|
|1 |
ö
ø |
. |
Comparing the recurrences satisfied by both sides
and then summing gives the generating function
|
|
|
Km(k)tk= |
|
|
|
|
|
|
= |
1 |
|
(1-t/2)(1-t/3)...(1-t/m) |
|
(1) |
(the last equality following from a partial fraction decomposition).
Thus Km(k)=hk(1/2,...,1/k), the symmetric
homogeneous polynomial of (total) degree k in m-1 variables.
Reexpressing hk in the basis of the power symmetric functions
pk:=åxik gives
One also has explicit asymptotics (for fixed k), e.g.,
Km(3)» 1.1666 ln3 m - 0.2113 ln2 m +0.4118 lnm -0.0815.
4 Martingales Rescue the Phratry
Let Xn(0) be the number of empty places in the daughter's album.
Now, define the process X as
Xn=(Xn(0),Xn(1),...,Xn(r)). For any
function f, the average increase of f(X) (knowing all the
previously drawn coupons) is easy to get:
E |
[ |
f(Xn+1)-f(Xn) | Y0,...,Yn |
] |
= |
|
|
|
|
( |
f(Xn(0),...,Xn(k)-1,Xn(k+1)+1,...,Xn(r))-f(Xn) |
) |
;
|
this simply reflects the different possible updates (Xn(k)/m is the probability
to get a new coupon which was already in k-tuple).
If f is such that the sum is 0,
one has also Wn+1-Wn=0 and thus W is a martingale,
where W is the process f(X) stopped at T, that is
Wn:=f(Xn) (for n<T) and Wn:=f(XT) (for n³ T).
More generally, suppose that for r functions
f(1), ..., f(r) from Nk+1 to R one has:
-
åi=0k xi(f(x0,...,xi-1,xi+1+1,...,xk) -f(k)(x0,...,xk))=0 for x0³1;
- f(k)(0,x1,...,xk)=xk.
Then E[XT(k)]=f(k)(m,0,...,0).
Proof.
2. implies that XT(k)=f(XT)=WT; 1. gives a martingale
property for W, Doob's theorem for stopping time of martingales
gives E[WT]=E[W0]=W0, and 2. implies that
W0=f(k)(m,0,...,0).
Pintacuda [5] used this result with k=1 and found
f(1)(x0,x1)=Hx0+x1/1+x0.
Foata et al. guessed the general formula:
Proposition 1
For k³ 2, the function f(k) defined as
f(k)(x0,x1,...,xk):=Kx0(k)+ |
x1Kx0+1(k-1)+···+xk-1Kx0+1(1)+xk |
|
x0+1 |
|
is the only solution of 1. and 2.
One also has f(k)(x0,0,...,0)=Kx0(k).
They give two proofs in [4], but I prefer to explain
what I heard in the meeting Random Structure and Algorithms (Poznan, August 2001),
where Doron Zeilberger explained how to use a language theory argument
to get a shorter proof.
A complete collection of coupons can be written
11*2{1,2}*3{1,2,3}*4 ... {1,2,...,
m-1}*m. Let W be this set of words.
This leads to the generating function
Recall that E[XT(k)] is the expected number of kinds of
coupons in k-tuple (at time T, that is when the daughter has
completed her album). Thus,
|
E |
[ |
XT(k) |
] |
tk
= |
|
|
|
P |
(w) t|w|k
= |
|
|
|
|
æ
ç
ç
è |
|
ö
÷
÷
ø |
|w| t|w|k |
= m! (f(t,0,...,0)+f(0,t,0,...,0)+...+f(0,...,0,t))
As words of W are ordered (whereas it is in fact irrelevant for
the coupon collector), there is a factor m! at the second line takes
into account all the permutations. The generating function obtained
at the last line shows that the hyperharmonic numbers generated by
Equation (1) indeed gives the average value of
Theorem 2.
5 Conclusion
The coupon-collector problem (like the ménage problem, the birthday
paradox) belongs to the large class of problems that can be modeled by
simple urns models. It is very likely that, during the next years,
the symbolic method will be applied with success to all these urns
problems, and analytic combinatorics will then provide enumeration,
complete asymptotics expansions and limit laws. The ``classical''
coupon-collector problem waits for his next revisitor!
This summary is related to Foata's article [4] (the more
recent preprint [3] is also relevant). These articles are
accessible at http://www-irma.u-strasbg.fr/~foata/paper
.
References
- [1]
-
Banderier (Cyril) and Dobrow (Robert P.). --
A generalized cover time for random walks on graphs. In Formal
power series and algebraic combinatorics (Moscow, 2000), pp. 113--124. --
Springer, Berlin, 2000.
- [2]
-
Flajolet (Philippe), Gardy (Danièle), and Thimonier (Loys). --
Birthday paradox, coupon collectors, caching algorithms and
self-organizing search. Discrete Applied Mathematics, vol. 39, n°3,
1992, pp. 207--229.
- [3]
-
Foata (D.) and Zeilberger (D.). --
The collector's brotherhood problem using the Newman-Shepp
symbolic method. --
To appear in Algebra Universalis. Special issue dedicated to
the memory of Gian-Carlo Rota.
- [4]
-
Foata (Dominique), Han (Guo-Niu), and Lass (Bodo). --
Les nombres hyperharmoniques et la fratrie du collectionneur de
vignettes. Séminaire Lotharingien de Combinatoire, vol. 47,
n°B47a, 2001. --
20 pages. Available from
http://www.mat.univie.ac.at/~slc/
.
- [5]
-
Pintacuda (N.). --
Coupons collectors via the martingales. Bollettino dell'Unione
Matematica Italiana. A. Serie V, vol. 17, n°1, 1980,
pp. 174--177.
- [6]
-
Zeilberger (Doron). --
How many singles, doubles, triples, etc., should the coupon collector
expect? Personal Journal of Ekhad and Zeilberger,
2001. --
1 page. Available from
http://www.math.rutgers.edu/~zeilberg/
.
This document was translated from LATEX by
HEVEA.