Hyperharmonic Numbers and the Phratry of the Coupon Collector

Dominique Foata

Département de mathématique, Université Louis Pasteur (France)

Algorithms Seminar

May 21, 2001

[summary by Cyril Banderier]

A properly typeset version of this document is available in postscript and in pdf.

If some fonts do not look right on your screen, this might be fixed by configuring your browser (see the documentation here).
Abstract
The classical coupon-collector problem is here extended to the case where the collector shares his harvest with other members of his phratry. She (!) remains the single buyer, but she gives to his brothers all the pictures that she got in double. When her album is filled, her brothers' albums have some empty places. How many in average? Dominique Foata (in a joint work with Guoniu Han and Bodo Lass) answers this question via an expression for the multivariate generating function. The problem is related to hyperharmonic numbers, that are studied here as solutions of finite differences equations.

## 1  Coupons Collector

A cleaver firm sells chocolate, with a picture (or ``coupon'') of a famous cricket player in each bar. In total, there are m different pictures to collect and each picture appears with probability 1/m. Mr. and Mrs. Brown have r sons and one daughter, chocolate and cricket addicts. The girl (she's the oldest) is the only one to buy chocolate. She tries to complete her collection. When she gets a new picture, she puts it in her album, and when she gets a double, she gives it to her oldest brother, and when this one gets a double, he gives it to the remaining oldest brother, and so on. After having bought T bars of chocolate, the girl has completed her album, and it remains MT(i) empty places in the album of the ith brother (i=1,...,r). Let Xn(k) be the number of coupons which appeared exactly k times until time n. As MT(k)=XT(1)+...+XT(k), the distribution of (T,MT(1),...,MT(r)) is then totally determined by the distribution of (T,XT(1),...,XT(r)).

The question is to find formulae and asymptotics for MT(i), or equivalently for XT(k).

There is a lot of ways to solve coupon-collector-like problems. One can distinguish three main approaches:
• formal approach: combinatoricians indeed used a language-theory approach (shuffle products and Laplace transforms  or manipulation of regular expressions );
• probabilistic approach: a lot of folklore results are established via basic probabilistic considerations, and more sophisticated tools such as martingales theory were also useful ;
• matricial approach: this is in fact a mixture of the two precedent approaches, which is exploited to solve a Markovian generalization of the coupon-collector problem in  (with Perron--Frobenius theory and approximation of integrals).
This is with a combinatorial approach (enumeration of surjections and formal Laplace transform) that Foata et al. obtain in  the multivariate generating function of the coupon-collector problem, from which they derive the formulae for the expectations of E[T] and E[XT(k)].

## 2  The Multivariate Generating Function

Theorem 1   The generating function of the random variables T, XT(1), ..., XT(r) is
 å l³ m,n
P{T=l,XT(1)=n1,...,XT(r)=nr} tl u1n1... urnr=
u1t å
æ
è
 m-1 a,b,c1,...,cr
ö
ø
(-1)b æ
ç
ç
è
 r Õ k=1
æ
ç
ç
è
uk-1
k!
ö
÷
÷
ø
ck ö
÷
÷
ø
(t/m)
 å k
kck

(1-at/m)
1+
 å k
kck

æ
ç
ç
è
 å k
kck ö
÷
÷
ø
!.

The proof follows from setting ui=1 (for i ³ r+1), expanding (with Newton multinomial formula), and applying the Laplace transform to
 å l³ m
tl-1
(l-1)!
 å n
 å sÎ S(l,m;n)
p(s) =mu1 æ
ç
ç
è
 å i³1
ui
ti
i!
ö
÷
÷
ø
m-1.
The formal Laplace transform is defined as a linear map such that L (gn tn/n!)=gn tn. This implies L (exp(a t) tn)= n! tn/(1-a t)n+1. The set S(l,m;n) is defined as a subset of the surjections from [ 1,...,l ] to [ 1,...,m ] for which sÎ S(l,m;n) implies i is reached ni times and the restriction of s to [ 1,...,l-1 ] is still a surjection from [ 1,...,l-1 ] to [ 1,...,m ]\{s(l)}. The weight p of a surjection SÎ S(l,m;n) is defined by p(s)=Õuini.

## 3  Hyperharmonic Numbers

In order to complete one collection, it is well known that the average number of needed bars is
E[T]=m Hm     where      Hm=
 m å k=1
1
k
.
For example, when there are m=50 different pictures, E[T]=50 H50»50×4.5»225 and thus the daughter has a lot of doubles and we can expect that the oldest brother has almost completed his album with the 175 remaining pictures.

Pintacuda  proved with martingale theory that E[MT(1)]=Hm. Foata et al. prove
Theorem 2   For k³ 2, the average number of empty places in the kth brother's album is
E [MT(k)]=1+
 k å i=1
Km(i)   where    Km(k)=
 m å i=2
Ki(k-1)
i
,  (k³ 1, m³ 3)
with the following initial conditions K2(k)=1/2k (for k³ 0) and Km(0)=1 (for m³ 2).

A first derivation of this result follows of Theorem 1. Another proof is in two steps: first get the generating function for the Km(k) (end of this section) and then prove that this generating function is also the one of the coupon-collector problem (next section).

Consider the rising factorial defined by (a)n= a (a+1)... (a+n-1) if n³ 1 and (a)0=1. An hypergeometric function with respect to two lists (a1,...,ar) and (b1,...,bs) is defined as the function given by the series
rFs æ
è
 a1,...,ar b1,...,bs
;x ö
ø
:=
 å n³ 0
(a1)n...(ar)n
(b1)n...(bs)n
xn
n!
.

The authors prove that the numbers Km(k) (that they call ``hyperharmonic numbers'') satisfy
Km(k)=
m (m-1)
2k+1
k+2Fk+1 æ
è
 -m+2,2,...,2 3,...,3
|1 ö
ø
.
Comparing the recurrences satisfied by both sides and then summing gives the generating function
 å k³ 0
Km(k)tk=
 m å n=2
(-m)n
(n-2)!
1
n
1
1-t/n
=
1
(1-t/2)(1-t/3)...(1-t/m)
(1)
(the last equality following from a partial fraction decomposition).

Thus Km(k)=hk(1/2,...,1/k), the symmetric homogeneous polynomial of (total) degree k in m-1 variables. Reexpressing hk in the basis of the power symmetric functions pk:=åxik gives
Km(k)~
p1k
k!
~
(lnm)k
k!
One also has explicit asymptotics (for fixed k), e.g.,
Km(3)» 1.1666 ln3 m - 0.2113 ln2 m +0.4118 lnm -0.0815.

## 4  Martingales Rescue the Phratry

Let Xn(0) be the number of empty places in the daughter's album. Now, define the process X as Xn=(Xn(0),Xn(1),...,Xn(r)). For any function f, the average increase of f(X) (knowing all the previously drawn coupons) is easy to get:
E [ f(Xn+1)-f(Xn) | Y0,...,Yn ] =
 r å k=0
Xn(k)
m
( f(Xn(0),...,Xn(k)-1,Xn(k+1)+1,...,Xn(r))-f(Xn) ) ;
this simply reflects the different possible updates (Xn(k)/m is the probability to get a new coupon which was already in k-tuple).

If f is such that the sum is 0, one has also Wn+1-Wn=0 and thus W is a martingale, where W is the process f(X) stopped at T, that is Wn:=f(Xn) (for n<T) and Wn:=f(XT) (for n³ T).

More generally, suppose that for r functions f(1), ..., f(r) from Nk+1 to R one has:
1. åi=0k xi(f(x0,...,xi-1,xi+1+1,...,xk) -f(k)(x0,...,xk))=0 for x0³1;
2. f(k)(0,x1,...,xk)=xk.
Then E[XT(k)]=f(k)(m,0,...,0).

Proof. 2. implies that XT(k)=f(XT)=WT; 1. gives a martingale property for W, Doob's theorem for stopping time of martingales gives E[WT]=E[W0]=W0, and 2. implies that W0=f(k)(m,0,...,0).

Pintacuda  used this result with k=1 and found f(1)(x0,x1)=Hx0+x1/1+x0. Foata et al. guessed the general formula:
Proposition 1   For k³ 2, the function f(k) defined as
f(k)(x0,x1,...,xk):=Kx0(k)+
x1Kx0+1(k-1)+···+xk-1Kx0+1(1)+xk
x0+1
is the only solution of 1. and 2. One also has f(k)(x0,0,...,0)=Kx0(k).

They give two proofs in , but I prefer to explain what I heard in the meeting Random Structure and Algorithms (Poznan, August 2001), where Doron Zeilberger explained how to use a language theory argument to get a shorter proof. A complete collection of coupons can be written 11*2{1,2}*3{1,2,3}*4 ... {1,2,..., m-1}*m. Let W be this set of words. This leads to the generating function
f(x1,...,xm):=
x1
m
1
1-
x1
m
...
x2
m
1
1-
x1+x2
m
xm-1
m
1
1-
x1+...+xm-1
m
xm.

Recall that E[XT(k)] is the expected number of kinds of coupons in k-tuple (at time T, that is when the daughter has completed her album). Thus,
 ¥ å k=1
E [ XT(k) ]  tk =
 å wÎ W
 m å k=1
P (w) t|w|k =
 m å k=1
 å wÎ W
æ
ç
ç
è
1
m
ö
÷
÷
ø
|w| t|w|k
= m! (f(t,0,...,0)+f(0,t,0,...,0)+...+f(0,...,0,t))
=t+t
 m-1 å k=1
k!
(2-t)(3-4)...(k+1-t)
=t-1+
m!
 m Õ j=2
(j-t)
.

As words of W are ordered (whereas it is in fact irrelevant for the coupon collector), there is a factor m! at the second line takes into account all the permutations. The generating function obtained at the last line shows that the hyperharmonic numbers generated by Equation (1) indeed gives the average value of Theorem 2.

## 5  Conclusion

The coupon-collector problem (like the ménage problem, the birthday paradox) belongs to the large class of problems that can be modeled by simple urns models. It is very likely that, during the next years, the symbolic method will be applied with success to all these urns problems, and analytic combinatorics will then provide enumeration, complete asymptotics expansions and limit laws. The ``classical'' coupon-collector problem waits for his next revisitor!

This summary is related to Foata's article  (the more recent preprint  is also relevant). These articles are accessible at `http://www-irma.u-strasbg.fr/~foata/paper`.

## References


Banderier (Cyril) and Dobrow (Robert P.). -- A generalized cover time for random walks on graphs. In Formal power series and algebraic combinatorics (Moscow, 2000), pp. 113--124. -- Springer, Berlin, 2000.


Flajolet (Philippe), Gardy (Danièle), and Thimonier (Loys). -- Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Applied Mathematics, vol. 39, n°3, 1992, pp. 207--229.


Foata (D.) and Zeilberger (D.). -- The collector's brotherhood problem using the Newman-Shepp symbolic method. -- To appear in Algebra Universalis. Special issue dedicated to the memory of Gian-Carlo Rota.


Foata (Dominique), Han (Guo-Niu), and Lass (Bodo). -- Les nombres hyperharmoniques et la fratrie du collectionneur de vignettes. Séminaire Lotharingien de Combinatoire, vol. 47, n°B47a, 2001. -- 20 pages. Available from `http://www.mat.univie.ac.at/~slc/`.


Pintacuda (N.). -- Coupons collectors via the martingales. Bollettino dell'Unione Matematica Italiana. A. Serie V, vol. 17, n°1, 1980, pp. 174--177.


Zeilberger (Doron). -- How many singles, doubles, triples, etc., should the coupon collector expect? Personal Journal of Ekhad and Zeilberger, 2001. -- 1 page. Available from `http://www.math.rutgers.edu/~zeilberg/`.

This document was translated from LATEX by HEVEA.