Collecting Coupons


Coupons in cereal boxes are numbered \(1\) to \(5\), and a set of one of each is required for a prize. With one coupon per box, how many boxes on the average are required to make a complete set?


Let \(X_{1}\), \(X_{2}\), …, \(X_{5}\) be random variables representing the number of boxes opened until receiving the \(1\)st, \(2\)nd, …, and \(5\)th unique coupon, respectively, where the number of boxes is counted since the last box containing a unique coupon was opened. For example, \(X_{3}\) represents the number of boxes opened after the \(2\)nd unique coupon was found up until the \(3\)3rd unique coupon was found.

Each \(X_{i}\) is a geometric random variable associated with a different probability of success. For example, \(X_{1}\) is associated with \(p_{1} = 1\) because the \(1\)st box will obviously contain a unique coupon. \(X_{2}\) is associated with \(p_{2} = \frac{4}{5}\) because only \(4\) of the \(5\) coupons are unique at that point. Probabilities for the remaining \(X_{i}\) can be calculated in a similar fashion. For geometric random variables, we also know that \(E[X_{i}] = \frac{1}{p_{i}}\).

To solve the problem, we can apply the Linearity of Expectation property as follows:

\[\begin{split} E \left[\sum_{i=1}^5 X_{i} \right] &= \sum_{i=1}^5 E[X_{i}] \\ &= \sum_{i=1}^5 \frac{1}{p_{i}} \\ &= 5 \left( \frac{1}{5} + \frac{1}{4} + \frac{1}{3} + \frac{1}{2} + \frac{1}{1} \right) \\ &= \frac{137}{12} \approx 11.42 \end{split}\]


When generalized to \(n\) coupons, the solution to this problem can be written as:

\[C(n) = n \sum_{i=1}^n \frac{1}{i} = nH_{n}\]

In the above, \(H_{n}\) represents the \(n\)th Harmonic Number, which is the sum of the reciprocals of integers from \(1\) to \(n\).