Question
Let follow the uniform distribution on , and define
Prove that converges in probability to 1.
(20 points)
Step-by-step solution
Step 1. Denote by the number of trials needed to collect the -th distinct element after distinct elements have already been collected, where . At that stage, the probability of obtaining a new element on each trial is . The random variable follows a geometric distribution with probability mass function , and are mutually independent. The total number of trials can be expressed as .
Step 2. The expectation of the geometric random variable is , and its variance is . By linearity of expectation, the expectation of is: . Using the asymptotic expansion of the harmonic series (where is the Euler--Mascheroni constant), we obtain , and hence .
Step 3. By independence, the variance of is: . Since the series converges to , there exists a constant such that .
Step 4. Applying Chebyshev's inequality, for any : . Substituting the variance bound: . As , the right-hand side tends to , so . Combined with the conclusion from Step 2 that , we obtain .
Final answer
QED.
Marking scheme
The following is the detailed marking rubric based on the official solution:
1. Checkpoints (Total: 7 points)
Score exactly one chain; take the maximum subtotal among chains; do not add points across chains.
Chain A: Geometric distribution decomposition and Chebyshev's inequality (official solution)
- Geometric distribution modeling and decomposition (2 points) [additive]
- Decompose the total count into a sum of stage-wise waiting times (or using other notation such as ), and explicitly state that the are mutually independent and geometrically distributed. (1 point)
- Correctly specify the parameter: state that the success probability in the -th stage (collecting the -th distinct element) is or an equivalent form. (1 point)
- Computation and asymptotics of the expectation (2 points) [additive]
- Using linearity of expectation, compute . (1 point)
- Using the harmonic series property , explicitly conclude that (or ). (1 point)
- Order-of-magnitude estimate for the variance (2 points) [additive]
- Using independence, write the variance-sum expression: (or ). (1 point)
- Using the convergence of , show that is of order at most (i.e., or ). (1 point)
- *(Note: If the student does not perform the bounding step but correctly computes the exact expression involving and identifies its leading term as , full credit is awarded.)*
- Proof of convergence in probability (1 point) [additive]
- Apply Chebyshev's inequality, substitute the variance bound obtained above, and prove that . (1 point)
- *(Note: If the student has shown that the centered quantity converges in probability to 0, and in the expectation step has shown that the mean ratio converges to 1, this point is awarded directly; the conclusion must be based on the preceding order-of-magnitude estimates.)*
Total (max 7)
2. Zero-credit items
- Merely copying the problem statement, definitions, or formulas without performing any specific substitution or derivation for this problem.
- Asserting based solely on intuition, without geometric distribution decomposition or series summation as justification.
- Stating Chebyshev's inequality without computing or , rendering the right-hand side of the inequality unanalyzable in the limit.
- Citing the conclusion of the "coupon collector problem" without any derivation.
3. Deductions
- Missing independence (-1): When computing , failing to mention or implicitly use the independence of the .
- Parameter logic error (-1): The geometric distribution parameter is set seriously incorrectly (e.g., set as a constant or with the wrong monotonicity in ), preventing subsequent series analysis.
- Confusing convergence concepts (-1): Proving "convergence in expectation" instead of "convergence in probability" (e.g., only showing without discussing variance or probability concentration).
- Serious series estimation error (cap at 4/7): Claiming that the harmonic series converges (i.e., concluding ), which invalidates the entire asymptotic analysis.
- Logical gap (-1): In the final step, directly asserting non-convergence from (which is correct but unnormalized), or failing to divide the variance by the denominator for comparison.