Chapter 129

Championship Droughts and Secret Santas

The Riddler for December 2, 2016, the month after the Cubs broke their 108-year drought. The Express is a coupon-collector cousin; the Classic is the cycle-following trick in holiday dress.

Riddler Express

In a league of 30 teams, a champion is chosen uniformly at random each year (every team starts with a zero-year drought). Whenever the team that has gone longest without a title finally wins, that record drought makes headlines. On average, how long are these headline-making droughts?

The Riddler, FiveThirtyEight, December 2, 2016(original post)

Solution

Picture the 30 teams listed by how recently they last won, most recent first. The first team on the list won last year. The second-most-recent champion is found by waiting for a title to go to a different team: each year there is a $\tfrac{29}{30}$ chance the winner differs from the most recent one, so the gap back to the second team averages $\tfrac{30}{29}$ years. Stepping down the list, the gap from the team in position $k$ to the one in position $k+1$ averages $\tfrac{30}{30-k}$ years, since $30-k$ of the teams are “new” at that stage. Adding the gaps, the team holding the $k$ th-longest drought has an expected drought of $30\bigl(\tfrac1k + \tfrac{1}{k+1} + \dots + \tfrac{1}{30}\bigr)$ .

A win only makes headlines when it ends the single longest drought, the $k=1$ case, so the expected headline drought is the full harmonic sum $30\left(1 + \tfrac12 + \tfrac13 + \dots + \tfrac{1}{30}\right) = 30\,H_{30} \approx \boxed{119.85 \text{ years}}.$ So a 108-year drought is, if anything, slightly short of what this model expects a record drought to run. This is the coupon-collector’s problem wearing a baseball cap: $30\,H_{30}$ is also the expected time to see all 30 teams win once.

The computation

Encode it directly: evaluate the harmonic sum, and independently simulate a million seasons, recording the drought length every time a win sets a new record.

import random
H30 = sum(1 / k for k in range(1, 31))
print("30 * H_30 =", round(30 * H30, 2))

def simulate(years, seed=0):
    rng = random.Random(seed); last_win = [0] * 30; headlines = []
    for y in range(1, years + 1):
        champ = rng.randrange(30)
        droughts = [y - last_win[t] for t in range(30)]
        if droughts[champ] == max(droughts) and droughts.count(max(droughts)) == 1:
            headlines.append(droughts[champ])      # ended the single longest drought
        last_win[champ] = y
    return sum(headlines) / len(headlines)
print("simulated:", round(simulate(1_000_000), 2))
# 30 * H_30 = 119.85
# simulated: 120.09

Riddler Classic

The 41 FiveThirtyEight staffers run a Secret Santa: each is randomly assigned a different colleague to give to (nobody gives to themselves). To find out who gave to them, each person may ask up to 20 colleagues the single question “Who were you Secret Santa for?” Playing optimally, what is the probability everyone discovers their own Secret Santa?

The Riddler, FiveThirtyEight, December 2, 2016(original post)

Solution

A Secret Santa assignment is a permutation of the 41 staffers with no fixed point, a derangement, and like any permutation it breaks into cycles: A gives to B, B to C, …, eventually back to A. The one question you are allowed reveals a single arrow of that structure, so the strategy writes itself: ask the person you gave to who they gave to, then ask that person, and follow the chain. You are walking around your own cycle, and the person who finally names you is your Secret Santa.

How far must you walk? In a cycle of length $L$ , going round from the person you gave to until someone names you takes $L-1$ questions. So you succeed within your 20-question budget exactly when your cycle has length $L \le 21$ , and everyone succeeds exactly when the derangement has no cycle longer than 21. With 41 people two long cycles cannot coexist (they would need at least 44 people), so the only failure is a single cycle of length 22 or more, and $P(\text{all succeed}) = 1 - \sum_{L=22}^{41} \frac{\binom{41}{L}\,(L-1)!\,D_{41-L}}{D_{41}} \approx \boxed{0.318},$ where $D_m$ is the number of derangements of $m$ items (and $D_0 = 1$ ). About a $31.8\%$ chance, far better than the doomed $1$ -in- $2$ -trillion of asking at random, because following the chain converts the whole office’s fate into one clean question about the longest cycle. It is the hundred-prisoners locker puzzle in a Santa hat.

The computation

Encode both the counting and the experiment. The exact line sums the derangement counts that contain a too-long cycle; the simulation draws random derangements and checks that the longest cycle fits in the budget.

from math import comb, factorial
import random
def derangements(n):
    d = [1, 0]
    for k in range(2, n + 1): d.append((k - 1) * (d[-1] + d[-2]))
    return d
D = derangements(41); n = 41
p_fail = sum(comb(n, L) * factorial(L - 1) * D[n - L] / D[n] for L in range(22, n + 1))
print("exact P(all succeed):", round(1 - p_fail, 4))

def random_derangement(n, rng):
    while True:
        p = list(range(n)); rng.shuffle(p)
        if all(p[i] != i for i in range(n)): return p
def longest_cycle(p):
    seen = [False] * len(p); best = 0
    for i in range(len(p)):
        if not seen[i]:
            L = 0; j = i
            while not seen[j]: seen[j] = True; j = p[j]; L += 1
            best = max(best, L)
    return best
rng = random.Random(1); ok = sum(longest_cycle(random_derangement(41, rng)) <= 21
                                 for _ in range(200_000))
print("simulated:", round(ok / 200_000, 4))
# exact P(all succeed): 0.3183
# simulated: 0.3187