Hyperharmonic Numbers and the Phratry of the Coupon Collector

Dominique Foata

Dpartement de mathmatique, Universit Louis Pasteur (France)

Algorithms Seminar

May 21, 2001

[summary by Cyril Banderier]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).
Abstract
The classical coupon-collector problem is here extended to the case where the collector shares his harvest with other members of his phratry. She (!) remains the single buyer, but she gives to his brothers all the pictures that she got in double. When her album is filled, her brothers' albums have some empty places. How many in average? Dominique Foata (in a joint work with Guoniu Han and Bodo Lass) answers this question via an expression for the multivariate generating function. The problem is related to hyperharmonic numbers, that are studied here as solutions of finite differences equations.



1  Coupons Collector

A cleaver firm sells chocolate, with a picture (or ``coupon'') of a famous cricket player in each bar. In total, there are m different pictures to collect and each picture appears with probability 1/m. Mr. and Mrs. Brown have r sons and one daughter, chocolate and cricket addicts. The girl (she's the oldest) is the only one to buy chocolate. She tries to complete her collection. When she gets a new picture, she puts it in her album, and when she gets a double, she gives it to her oldest brother, and when this one gets a double, he gives it to the remaining oldest brother, and so on. After having bought T bars of chocolate, the girl has completed her album, and it remains MT(i) empty places in the album of the ith brother (i=1,...,r). Let Xn(k) be the number of coupons which appeared exactly k times until time n. As MT(k)=XT(1)+...+XT(k), the distribution of (T,MT(1),...,MT(r)) is then totally determined by the distribution of (T,XT(1),...,XT(r)).

The question is to find formulae and asymptotics for MT(i), or equivalently for XT(k).

There is a lot of ways to solve coupon-collector-like problems. One can distinguish three main approaches: This is with a combinatorial approach (enumeration of surjections and formal Laplace transform) that Foata et al. obtain in [4] the multivariate generating function of the coupon-collector problem, from which they derive the formulae for the expectations of E[T] and E[XT(k)].

2  The Multivariate Generating Function


Theorem 1   The generating function of the random variables T, XT(1), ..., XT(r) is
 
l m,n
P{T=l,XT(1)=n1,...,XT(r)=nr} tl u1n1... urnr=
u1t

m-1
a,b,c1,...,cr

(-1)b


r
k=1



uk-1
k!



ck


(t/m)
 
k
kck
 
(1-at/m)
1+
 
k
kck
 



 
k
kck


!.

The proof follows from setting ui=1 (for i r+1), expanding (with Newton multinomial formula), and applying the Laplace transform to
 
l m
tl-1
(l-1)!
 
n
 
s S(l,m;n)
p(s) =mu1


 
i1
ui
ti
i!



m-1.
The formal Laplace transform is defined as a linear map such that L (gn tn/n!)=gn tn. This implies L (exp(a t) tn)= n! tn/(1-a t)n+1. The set S(l,m;n) is defined as a subset of the surjections from [ 1,...,l ] to [ 1,...,m ] for which s S(l,m;n) implies i is reached ni times and the restriction of s to [ 1,...,l-1 ] is still a surjection from [ 1,...,l-1 ] to [ 1,...,m ]\{s(l)}. The weight p of a surjection S S(l,m;n) is defined by p(s)=uini.

3  Hyperharmonic Numbers

In order to complete one collection, it is well known that the average number of needed bars is
E[T]=m Hm     where      Hm=
m
k=1
1
k
.
For example, when there are m=50 different pictures, E[T]=50 H50504.5225 and thus the daughter has a lot of doubles and we can expect that the oldest brother has almost completed his album with the 175 remaining pictures.

Pintacuda [5] proved with martingale theory that E[MT(1)]=Hm. Foata et al. prove
Theorem 2   For k 2, the average number of empty places in the kth brother's album is
E [MT(k)]=1+
k
i=1
Km(i)   where    Km(k)=
m
i=2
Ki(k-1)
i
,  (k 1, m 3)
with the following initial conditions K2(k)=1/2k (for k 0) and Km(0)=1 (for m 2).

A first derivation of this result follows of Theorem 1. Another proof is in two steps: first get the generating function for the Km(k) (end of this section) and then prove that this generating function is also the one of the coupon-collector problem (next section).

Consider the rising factorial defined by (a)n= a (a+1)... (a+n-1) if n 1 and (a)0=1. An hypergeometric function with respect to two lists (a1,...,ar) and (b1,...,bs) is defined as the function given by the series
rFs
a1,...,ar
b1,...,bs
;x
:=
 
n 0
(a1)n...(ar)n
(b1)n...(bs)n
xn
n!
.

The authors prove that the numbers Km(k) (that they call ``hyperharmonic numbers'') satisfy
Km(k)=
m (m-1)
2k+1
k+2Fk+1
-m+2,2,...,2
3,...,3
|1
.
Comparing the recurrences satisfied by both sides and then summing gives the generating function
 
k 0
Km(k)tk=
m
n=2
(-m)n
(n-2)!
1
n
1
1-t/n
=
1
(1-t/2)(1-t/3)...(1-t/m)
    (1)
(the last equality following from a partial fraction decomposition).

Thus Km(k)=hk(1/2,...,1/k), the symmetric homogeneous polynomial of (total) degree k in m-1 variables. Reexpressing hk in the basis of the power symmetric functions pk:=xik gives
Km(k)~
p1k
k!
~
(lnm)k
k!
One also has explicit asymptotics (for fixed k), e.g.,
Km(3) 1.1666 ln3 m - 0.2113 ln2 m +0.4118 lnm -0.0815.

4  Martingales Rescue the Phratry

Let Xn(0) be the number of empty places in the daughter's album. Now, define the process X as Xn=(Xn(0),Xn(1),...,Xn(r)). For any function f, the average increase of f(X) (knowing all the previously drawn coupons) is easy to get:
E [ f(Xn+1)-f(Xn) | Y0,...,Yn ] =
r
k=0
Xn(k)
m
( f(Xn(0),...,Xn(k)-1,Xn(k+1)+1,...,Xn(r))-f(Xn) ) ;
this simply reflects the different possible updates (Xn(k)/m is the probability to get a new coupon which was already in k-tuple).

If f is such that the sum is 0, one has also Wn+1-Wn=0 and thus W is a martingale, where W is the process f(X) stopped at T, that is Wn:=f(Xn) (for n<T) and Wn:=f(XT) (for n T).

More generally, suppose that for r functions f(1), ..., f(r) from Nk+1 to R one has:
  1. i=0k xi(f(x0,...,xi-1,xi+1+1,...,xk) -f(k)(x0,...,xk))=0 for x01;
  2. f(k)(0,x1,...,xk)=xk.
Then E[XT(k)]=f(k)(m,0,...,0).

Proof. 2. implies that XT(k)=f(XT)=WT; 1. gives a martingale property for W, Doob's theorem for stopping time of martingales gives E[WT]=E[W0]=W0, and 2. implies that W0=f(k)(m,0,...,0).




Pintacuda [5] used this result with k=1 and found f(1)(x0,x1)=Hx0+x1/1+x0. Foata et al. guessed the general formula:
Proposition 1   For k 2, the function f(k) defined as
f(k)(x0,x1,...,xk):=Kx0(k)+
x1Kx0+1(k-1)++xk-1Kx0+1(1)+xk
x0+1
is the only solution of 1. and 2. One also has f(k)(x0,0,...,0)=Kx0(k).

They give two proofs in [4], but I prefer to explain what I heard in the meeting Random Structure and Algorithms (Poznan, August 2001), where Doron Zeilberger explained how to use a language theory argument to get a shorter proof. A complete collection of coupons can be written 11*2{1,2}*3{1,2,3}*4 ... {1,2,..., m-1}*m. Let W be this set of words. This leads to the generating function
f(x1,...,xm):=
x1
m
1
1-
x1
m
...
x2
m
1
1-
x1+x2
m
xm-1
m
1
1-
x1+...+xm-1
m
xm.

Recall that E[XT(k)] is the expected number of kinds of coupons in k-tuple (at time T, that is when the daughter has completed her album). Thus,
k=1
E [ XT(k) ]  tk =
 
w W
m
k=1
P (w) t|w|k =
m
k=1
 
w W



1
m



|w| t|w|k
= m! (f(t,0,...,0)+f(0,t,0,...,0)+...+f(0,...,0,t))
=t+t
m-1
k=1
k!
(2-t)(3-4)...(k+1-t)
=t-1+
m!
m
j=2
(j-t)
.

As words of W are ordered (whereas it is in fact irrelevant for the coupon collector), there is a factor m! at the second line takes into account all the permutations. The generating function obtained at the last line shows that the hyperharmonic numbers generated by Equation (1) indeed gives the average value of Theorem 2.

5  Conclusion

The coupon-collector problem (like the mnage problem, the birthday paradox) belongs to the large class of problems that can be modeled by simple urns models. It is very likely that, during the next years, the symbolic method will be applied with success to all these urns problems, and analytic combinatorics will then provide enumeration, complete asymptotics expansions and limit laws. The ``classical'' coupon-collector problem waits for his next revisitor!

This summary is related to Foata's article [4] (the more recent preprint [3] is also relevant). These articles are accessible at http://www-irma.u-strasbg.fr/~foata/paper.

References

[1]
Banderier (Cyril) and Dobrow (Robert P.). -- A generalized cover time for random walks on graphs. In Formal power series and algebraic combinatorics (Moscow, 2000), pp. 113--124. -- Springer, Berlin, 2000.

[2]
Flajolet (Philippe), Gardy (Danile), and Thimonier (Loys). -- Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Applied Mathematics, vol. 39, n°3, 1992, pp. 207--229.

[3]
Foata (D.) and Zeilberger (D.). -- The collector's brotherhood problem using the Newman-Shepp symbolic method. -- To appear in Algebra Universalis. Special issue dedicated to the memory of Gian-Carlo Rota.

[4]
Foata (Dominique), Han (Guo-Niu), and Lass (Bodo). -- Les nombres hyperharmoniques et la fratrie du collectionneur de vignettes. Sminaire Lotharingien de Combinatoire, vol. 47, n°B47a, 2001. -- 20 pages. Available from http://www.mat.univie.ac.at/~slc/.

[5]
Pintacuda (N.). -- Coupons collectors via the martingales. Bollettino dell'Unione Matematica Italiana. A. Serie V, vol. 17, n°1, 1980, pp. 174--177.

[6]
Zeilberger (Doron). -- How many singles, doubles, triples, etc., should the coupon collector expect? Personal Journal of Ekhad and Zeilberger, 2001. -- 1 page. Available from http://www.math.rutgers.edu/~zeilberg/.

This document was translated from LATEX by HEVEA.