Question
Let be uniformly distributed on , and define
Prove that converges in probability to 1.
(20 points)
Step-by-step solution
Step 1. Let denote the number of trials needed to collect the -th new element after distinct elements have already been collected, for . At that stage, the probability of obtaining a new element on each trial is . The random variable follows a geometric distribution with probability mass function , and are mutually independent. The total number of trials can be expressed as .
Step 2. The expectation of the geometric random variable is , and its variance is . By linearity of expectation, the expectation of is: . Using the asymptotic expansion of the harmonic series (where is the Euler--Mascheroni constant), we obtain , and hence .
Step 3. By independence, the variance of is: . Since the series converges to , there exists a constant such that .
Step 4. Applying Chebyshev's inequality, for any : . Substituting the variance bound: . As , this tends to , so . Combined with the conclusion from Step 2 that , we obtain .
Final answer
QED.
Marking scheme
The following is the marking rubric based on the official solution:
1. Checkpoints (total: 7 pts)
Score exactly one chain | take the maximum subtotal among chains; do not add points across chains.
Chain A: Geometric Distribution Decomposition and Chebyshev's Inequality (official solution)
- Geometric distribution modeling and decomposition (2 pts) [additive]
- Decompose the total time as a sum of stage-wise waiting times (or using other notation such as ), and explicitly state that the are mutually independent and geometrically distributed. (1 pt)
- Correctly write the parameter: state that the success probability in the -th stage (collecting the -th distinct element) is or an equivalent form. (1 pt)
- Expectation computation and asymptotics (2 pts) [additive]
- Use linearity of expectation to compute . (1 pt)
- Use the harmonic series property to explicitly conclude (or ). (1 pt)
- Variance order estimate (2 pts) [additive]
- Use independence to write the variance sum expression: (or ). (1 pt)
- Use the convergence of to show that is of order at most (i.e., or ). (1 pt)
- *(Note: If the student does not perform the bounding step but correctly computes the exact expression containing and identifies its leading term as , full credit is also awarded.)*
- Convergence in probability proof (1 pt) [additive]
- Apply Chebyshev's inequality, substitute the above variance bound, and prove . (1 pt)
- *(Note: If the student has proved the centered quantity converges in probability to 0, and in the expectation step has shown the mean ratio converges to 1, this point is awarded directly; the conclusion must be based on the preceding order estimates.)*
Total (max 7)
2. Zero-credit items
- Merely copying the problem statement, definitions, or formulas without any concrete substitution or derivation specific to this problem.
- Asserting based on intuition alone, without the geometric distribution decomposition or series summation as justification.
- Listing the Chebyshev inequality formula but not computing or , rendering the right-hand side of the inequality unanalyzable.
- Merely citing the "coupon collector problem" result without any derivation.
3. Deductions
- Independence omission (-1): When computing , failing to mention or implicitly use the independence of the .
- Parameter logic error (-1): Setting the geometric distribution parameter to be seriously incorrect (e.g., constant or with the wrong monotonicity in ), preventing subsequent series analysis.
- Confusing convergence concepts (-1): Proving "convergence of expectations" instead of "convergence in probability" (e.g., only showing without discussing variance or probability concentration).
- Serious series estimation error (cap at 4/7): Claiming the harmonic series converges (yielding ), causing the entire asymptotic analysis to be incorrect.
- Logical gap (-1): In the final step, directly asserting non-convergence from (which is correct but unnormalized), or failing to divide the variance by the denominator for comparison.