Hyperharmonic Numbers and the Phratry of the Coupon Collector
Dominique Foata
Département de mathématique, Université Louis Pasteur (France)
Algorithms Seminar
May 21, 2001
[summary by Cyril Banderier]
A properly typeset version of this document is available in
postscript and in
pdf.
If some fonts do not look right on your screen, this might be
fixed by configuring your browser (see the documentation here).
Abstract
The classical couponcollector problem is here extended to the case
where the collector shares his harvest with other members of his
phratry. She (!) remains the single buyer, but she gives to his
brothers all the pictures that she got in double. When her album is
filled, her brothers' albums have some empty places. How many in
average? Dominique Foata (in a joint work with Guoniu Han and Bodo
Lass) answers this question via an expression for the multivariate
generating function. The problem is related to hyperharmonic numbers,
that are studied here as solutions of finite differences equations.
1 Coupons Collector
A cleaver firm sells chocolate, with a picture (or ``coupon'') of a
famous cricket player in each bar. In total, there are m different
pictures to collect and each picture appears with probability 1/m.
Mr. and Mrs. Brown have r sons and one daughter, chocolate and
cricket addicts. The girl (she's the oldest) is the only one to buy
chocolate. She tries to complete her collection. When she gets a new
picture, she puts it in her album, and when she gets a double, she
gives it to her oldest brother, and when this one gets a double, he
gives it to the remaining oldest brother, and so on. After having
bought T bars of chocolate, the girl has completed her album, and it
remains M_{T}^{(i)} empty places in the album of the ith brother
(i=1,...,r). Let X_{n}^{(k)} be the number of coupons which
appeared exactly k times until time n. As
M_{T}^{(k)}=X_{T}^{(1)}+...+X_{T}^{(k)}, the distribution of
(T,M_{T}^{(1)},...,M_{T}^{(r)}) is then totally determined by the
distribution of (T,X_{T}^{(1)},...,X_{T}^{(r)}).
The question is to find formulae and asymptotics for M_{T}^{(i)}, or
equivalently for X_{T}^{(k)}.
There is a lot of ways to solve couponcollectorlike problems.
One can distinguish three main approaches:

formal approach: combinatoricians indeed used
a languagetheory approach (shuffle products and Laplace transforms [2]
or manipulation of regular expressions [6]);
 probabilistic approach: a lot of folklore results are established via basic probabilistic considerations,
and more sophisticated tools such as martingales theory were also
useful [5];
 matricial approach: this is in fact a mixture of the two
precedent approaches, which is exploited to solve a Markovian
generalization of the couponcollector problem in [1] (with
PerronFrobenius theory and approximation of integrals).
This is with a combinatorial approach (enumeration of surjections and
formal Laplace transform) that Foata et al. obtain
in [4] the multivariate generating function of the
couponcollector problem, from which they derive the formulae for the
expectations of E[T] and E[X_{T}^{(k)}].
2 The Multivariate Generating Function
Theorem 1
The generating function of the random variables
T, X_{T}^{(1)}, ..., X_{T}^{(r)} is


P{T=l,X_{T}^{(1)}=n_{1},...,X_{T}^{(r)}=n_{r}}
t^{l} u_{1}^{n1}... u_{r}^{nr}= 
u_{1}t å 

(1)^{b} 
æ
ç
ç
è 

æ
ç
ç
è 

ö
÷
÷
ø 
^{ck} 
ö
÷
÷
ø 


æ
ç
ç
è 

kc_{k} 
ö
÷
÷
ø 
!.

The proof follows from setting u_{i}=1 (for i ³ r+1), expanding
(with Newton multinomial formula), and applying the Laplace transform to





p(s)
=mu_{1} 
æ
ç
ç
è 

u_{i} 

ö
÷
÷
ø 
^{m1}. 
The formal Laplace transform is defined as a linear map such that
L (g_{n} t^{n}/n!)=g_{n} t^{n}. This implies
L (exp(a t) t^{n})= n! t^{n}/(1a t)^{n+1}.
The set S(l,m;n) is defined as a subset of the surjections from [ 1,...,l ] to [ 1,...,m ]
for which sÎ S(l,m;n) implies i is reached n_{i} times and
the restriction of s to [ 1,...,l1 ] is still a surjection from [ 1,...,l1 ] to [ 1,...,m ]\{s(l)}.
The weight p of a surjection SÎ S(l,m;n) is defined by p(s)=Õu_{i}^{ni}.
3 Hyperharmonic Numbers
In order to complete one collection, it is well known that the average
number of needed bars is
E[T]=m H_{m} where 
H_{m}= 



. 
For example, when there are m=50 different pictures, E[T]=50
H_{50}»50×4.5»225 and thus the daughter has a lot of
doubles and we can expect that the oldest brother has almost completed
his album with the 175 remaining pictures.
Pintacuda [5] proved with martingale theory that
E[M_{T}^{(1)}]=H_{m}.
Foata et al. prove
Theorem 2
For k³ 2, the average number of empty places in the kth brother's album is
E 
[M_{T}^{(k)}]=1+ 

K_{m}^{(i)}
where 
K_{m}^{(k)}= 



, (k³ 1, m³ 3) 
with the following initial conditions
K_{2}^{(k)}=1/2^{k} (for k³ 0) and K_{m}^{(0)}=1 (for m³ 2).
A first derivation of this result follows of Theorem 1.
Another proof is in two steps: first get the generating function for
the K_{m}^{(k)} (end of this section) and then prove that this
generating function is also the one of the couponcollector problem
(next section).
Consider the rising factorial defined by (a)_{n}= a (a+1)... (a+n1)
if n³ 1 and (a)_{0}=1. An hypergeometric function with respect to
two lists (a_{1},...,a_{r}) and (b_{1},...,b_{s}) is defined as the
function given by the series
_{r}F_{s} 
æ
è 
a_{1},...,a_{r} 
b_{1},...,b_{s} 

;x 
ö
ø 
:= 


(a_{1})_{n}...(a_{r})_{n} 

(b_{1})_{n}...(b_{s})_{n} 


. 
The authors prove that the numbers K_{m}^{(k)} (that they call
``hyperharmonic numbers'') satisfy
K_{m}^{(k)}= 

_{k+2}F_{k+1} 
æ
è 

1 
ö
ø 
. 
Comparing the recurrences satisfied by both sides
and then summing gives the generating function



K_{m}^{(k)}t^{k}= 






= 
1 

(1t/2)(1t/3)...(1t/m) 

(1) 
(the last equality following from a partial fraction decomposition).
Thus K_{m}^{(k)}=h_{k}(1/2,...,1/k), the symmetric
homogeneous polynomial of (total) degree k in m1 variables.
Reexpressing h_{k} in the basis of the power symmetric functions
p_{k}:=åx_{i}^{k} gives
One also has explicit asymptotics (for fixed k), e.g.,
K_{m}^{(3)}» 1.1666 ln^{3} m  0.2113 ln^{2} m +0.4118 lnm 0.0815.
4 Martingales Rescue the Phratry
Let X_{n}^{(0)} be the number of empty places in the daughter's album.
Now, define the process X as
X_{n}=(X_{n}^{(0)},X_{n}^{(1)},...,X_{n}^{(r)}). For any
function f, the average increase of f(X) (knowing all the
previously drawn coupons) is easy to get:
E 
[ 
f(X_{n+1})f(X_{n})  Y_{0},...,Y_{n} 
] 
= 




( 
f(X_{n}^{(0)},...,X_{n}^{(k)}1,X_{n}^{(k+1)}+1,...,X_{n}^{(r)})f(X_{n}) 
) 
;

this simply reflects the different possible updates (X_{n}^{(k)}/m is the probability
to get a new coupon which was already in ktuple).
If f is such that the sum is 0,
one has also W_{n+1}W_{n}=0 and thus W is a martingale,
where W is the process f(X) stopped at T, that is
W_{n}:=f(X_{n}) (for n<T) and W_{n}:=f(X_{T}) (for n³ T).
More generally, suppose that for r functions
f^{(1)}, ..., f^{(r)} from N^{k+1} to R one has:

å_{i=0}^{k} x_{i}(f(x_{0},...,x_{i}1,x_{i+1}+1,...,x_{k}) f^{(k)}(x_{0},...,x_{k}))=0 for x_{0}³1;
 f^{(k)}(0,x_{1},...,x_{k})=x_{k}.
Then E[X_{T}^{(k)}]=f^{(k)}(m,0,...,0).
Proof.
2. implies that X_{T}^{(k)}=f(X_{T})=W_{T}; 1. gives a martingale
property for W, Doob's theorem for stopping time of martingales
gives E[W_{T}]=E[W_{0}]=W_{0}, and 2. implies that
W_{0}=f^{(k)}(m,0,...,0).
Pintacuda [5] used this result with k=1 and found
f^{(1)}(x_{0},x_{1})=H_{x0}+x_{1}/1+x_{0}.
Foata et al. guessed the general formula:
Proposition 1
For k³ 2, the function f^{(k)} defined as
f^{(k)}(x_{0},x_{1},...,x_{k}):=K_{x0}^{(k)}+ 
x_{1}K_{x0+1}^{(k1)}+···+x_{k1}K_{x0+1}^{(1)}+x_{k} 

x_{0}+1 

is the only solution of 1. and 2.
One also has f^{(k)}(x_{0},0,...,0)=K_{x0}^{(k)}.
They give two proofs in [4], but I prefer to explain
what I heard in the meeting Random Structure and Algorithms (Poznan, August 2001),
where Doron Zeilberger explained how to use a language theory argument
to get a shorter proof.
A complete collection of coupons can be written
11^{*}2{1,2}^{*}3{1,2,3}^{*}4 ... {1,2,...,
m1}^{*}m. Let W be this set of words.
This leads to the generating function
f(x_{1},...,x_{m}):= 



...






x_{m}. 
Recall that E[X_{T}^{(k)}] is the expected number of kinds of
coupons in ktuple (at time T, that is when the daughter has
completed her album). Thus,

E 
[ 
X_{T}^{(k)} 
] 
t^{k}
= 



P 
(w) t^{wk}
= 




æ
ç
ç
è 

ö
÷
÷
ø 
^{w} t^{wk} 
= m! (f(t,0,...,0)+f(0,t,0,...,0)+...+f(0,...,0,t))
As words of W are ordered (whereas it is in fact irrelevant for
the coupon collector), there is a factor m! at the second line takes
into account all the permutations. The generating function obtained
at the last line shows that the hyperharmonic numbers generated by
Equation (1) indeed gives the average value of
Theorem 2.
5 Conclusion
The couponcollector problem (like the ménage problem, the birthday
paradox) belongs to the large class of problems that can be modeled by
simple urns models. It is very likely that, during the next years,
the symbolic method will be applied with success to all these urns
problems, and analytic combinatorics will then provide enumeration,
complete asymptotics expansions and limit laws. The ``classical''
couponcollector problem waits for his next revisitor!
This summary is related to Foata's article [4] (the more
recent preprint [3] is also relevant). These articles are
accessible at http://wwwirma.ustrasbg.fr/~foata/paper
.
References
 [1]

Banderier (Cyril) and Dobrow (Robert P.). 
A generalized cover time for random walks on graphs. In Formal
power series and algebraic combinatorics (Moscow, 2000), pp. 113124. 
Springer, Berlin, 2000.
 [2]

Flajolet (Philippe), Gardy (Danièle), and Thimonier (Loys). 
Birthday paradox, coupon collectors, caching algorithms and
selforganizing search. Discrete Applied Mathematics, vol. 39, n°3,
1992, pp. 207229.
 [3]

Foata (D.) and Zeilberger (D.). 
The collector's brotherhood problem using the NewmanShepp
symbolic method. 
To appear in Algebra Universalis. Special issue dedicated to
the memory of GianCarlo Rota.
 [4]

Foata (Dominique), Han (GuoNiu), and Lass (Bodo). 
Les nombres hyperharmoniques et la fratrie du collectionneur de
vignettes. Séminaire Lotharingien de Combinatoire, vol. 47,
n°B47a, 2001. 
20 pages. Available from
http://www.mat.univie.ac.at/~slc/
.
 [5]

Pintacuda (N.). 
Coupons collectors via the martingales. Bollettino dell'Unione
Matematica Italiana. A. Serie V, vol. 17, n°1, 1980,
pp. 174177.
 [6]

Zeilberger (Doron). 
How many singles, doubles, triples, etc., should the coupon collector
expect? Personal Journal of Ekhad and Zeilberger,
2001. 
1 page. Available from
http://www.math.rutgers.edu/~zeilberg/
.
This document was translated from L^{A}T_{E}X by
H^{E}V^{E}A.