Skip to content
Vamshi Jandhyala

Books · The Riddler

Chapter 151

Three Birthday Pairs and Chaos Tag

The Riddler for June 16, 2017. The Express is a delicate variant of the birthday problem: among 2323 people, exactly three pairs share birthdays, with no triples or larger collisions; among 1414 people on one side of the family, the same is true and the other nine all have distinct birthdays. The Classic is a beautifully clean question about “chaos tag”, whose expected game length turns out to be 2N112^{N-1} - 1 via a potential-function argument.

Riddler Express

In an extended family of 2323 people, three pairs of people share birthdays. What are the odds of that? Moreover, all these pairs are on one side of the family, a group of only 1414 people. What are the odds of that?

The Riddler, FiveThirtyEight, June 16, 2017(original post)

Solution

Model each person’s birthday as independent and uniform on 365365 days (no leap years). For each scenario, count the favourable arrangements and divide by the total 365n365^{n} arrangements for nn people.

Question 1: three pairs among 2323. We want the event that exactly three calendar dates are each shared by exactly two of the 2323 people, and the other 1717 people all have distinct birthdays from each other and from the pair dates. The count splits into three factors:

  • Choose the 2020 distinct birthdays used by the family from 365365: (36520)\binom{365}{20} ways.

  • Of those 2020 dates, choose the 33 that will be the shared (pair) birthdays: (203)\binom{20}{3} ways.

  • Assign the 2323 people to the 2020 dates so that each pair date receives exactly 22 people and each singleton date receives exactly 11. The multinomial count is 23!(2!)3(1!)17=23!8\dfrac{23!}{(2!)^{3}\,(1!)^{17}} = \dfrac{23!}{8} ways.

Total favourable arrangements: N1  =  (36520)(203)23!8.N_{1} \;=\; \binom{365}{20}\binom{20}{3}\,\frac{23!}{8}. Probability: P1  =  N136523    0.01833  =  1.832%.P_{1} \;=\; \frac{N_{1}}{365^{23}} \;\approx\; 0.01833 \;=\; 1.832\%.

Question 2: the same three pairs sit on a 1414-person side, and the other 99 all have distinct birthdays. Treat the two sides as independent draws.

  • Probability the 1414-side has exactly 33 pairs (no triples, the other 88 all distinct from one another and from the pair dates): P14  =  (36511)(113)14!836514    0.000796.P_{\text{14}} \;=\; \frac{\binom{365}{11}\binom{11}{3}\,\frac{14!}{8}}{365^{14}} \;\approx\; 0.000796.

  • Probability the other 99 all have distinct birthdays from one another: P9  =  3653643573659    0.9054.P_{9} \;=\; \frac{365 \cdot 364 \cdots 357}{365^{9}} \;\approx\; 0.9054.

Multiplying (the two sides’ birthdays are independent under the model), P2  =  P14P9    0.000720  =  0.0720%.P_{2} \;=\; P_{\text{14}}\cdot P_{9} \;\approx\; 0.000720 \;=\; 0.0720\%. P1    1.832%,P2    0.072%.\boxed{\,P_{1} \;\approx\; 1.832\%, \qquad P_{2} \;\approx\; 0.072\%.\,}

Why are these much smaller than the headline 50%50\% “birthday paradox” figure? The classical paradox asks for at least one pair, summing many ways for collisions to happen. Specifying exactly three pairs and no other collisions is a much narrower event.

The computation

Simulate independent birthday draws and count the empirical frequency of the two events; cross-check the exact formulae.

import random
from collections import Counter
from math import comb, factorial

def is_3_pairs_only(group):
    """Exactly 3 pairs (multiplicity-2), rest singletons."""
    mults = Counter(Counter(group).values())
    return mults == Counter({2: 3, 1: len(group) - 6})

rng = random.Random(0)
T = 400_000
hits1 = hits2 = 0
for _ in range(T):
    fam = [rng.randint(1, 365) for _ in range(23)]
    side14, side9 = fam[:14], fam[14:]
    if is_3_pairs_only(fam):                       # any 3 pairs in 23
        hits1 += 1
    if is_3_pairs_only(side14) and len(set(side9)) == 9:
        # sides treated independently (no cross-group constraint)
        hits2 += 1

print(f"empirical P(3 pairs in 23)        = {hits1/T*100:.3f}%")
print(f"empirical P(3 pairs in 14, 9 distinct) = {hits2/T*100:.3f}%")

# exact
P1 = comb(365, 20) * comb(20, 3) * factorial(23) // 8 / 365**23
P14 = comb(365, 11) * comb(11, 3) * factorial(14) // 8 / 365**14
P9 = 1
for k in range(9): P9 *= (365 - k) / 365
print(f"exact     P(3 pairs in 23)        = {P1*100:.3f}%")
print(f"exact     P_14 * P_9              = {P14 * P9 * 100:.3f}%")
# empirical P(3 pairs in 23)        = 1.850%
# empirical P(3 pairs in 14, 9 distinct) = 0.077%
# exact     P(3 pairs in 23)        = 1.833%
# exact     P_14 * P_9              = 0.072%

The Monte Carlo and the closed form agree to within sampling noise.

Riddler Classic

You and your N1N - 1 friends play chaos tag. At any moment, each player is either active or inactive; everyone is active at the start. At each step, two distinct active players are chosen uniformly at random (one tags, one is tagged). The tagged player becomes inactive. Whenever a player who has tagged others is themselves tagged, all the players they previously tagged become active again. The game ends when only one player is still active. How many tags does the game last on average?

The Riddler, FiveThirtyEight, June 16, 2017(original post)

Solution

The answer is 2N11\boxed{\,2^{N-1} - 1\,} expected tags, via a potential-function argument due to Joseph Wetherell.

Setup. As the game runs, an oriented forest grows on the players. Each tagged player gets attached as a child of her tagger; if a tagger is later tagged, her direct children become active again (with their own subtrees of inactive descendants intact, still rooted at them). For each active player PP, let n(P)n(P) be the total number of currently inactive players in the subtree rooted at PP, that is, the players that trace their inactive chain back through PP. Inactive players contribute 00 for now.

Define the score of an active player PP as s(P)  =  2n(P)1,s(P) \;=\; 2^{n(P)} - 1, and the game’s potential as Φ  =  P actives(P).\Phi \;=\; \sum_{P\text{ active}} s(P). At the start, every player is active with n(P)=0n(P) = 0, so Φ0=0\Phi_{0} = 0. At the end exactly one player is active and the remaining N1N - 1 players are inactive in her subtree (in a tagging game that runs to one active player, the entire population is eventually downstream of the winner), giving Φend=2N11\Phi_{\text{end}} = 2^{N-1} - 1.

Each tag adds exactly 11 to Φ\Phi on average. Two players are chosen uniformly at random from the active set, and either ordering “PP tags QQ” or “QQ tags PP” is equally likely. We need to show E[ΔΦtag involves active pair P,Q]  =  1.\mathbb{E}[\Delta \Phi \mid \text{tag involves active pair } P, Q] \;=\; 1. The crucial observation is that this swap-symmetry is enough: regardless of the more complicated bookkeeping among grandchildren and deeper descendants, the contribution to ΔΦ\Delta \Phi from the rest of the forest is the same under either direction, while the difference between the two directions cancels for the local term. Wetherell’s calculation shows that the two directional contributions average to +1+1, and the rest of the forest’s score is unaffected by which of PP or QQ does the tagging. We verify the invariant directly by simulation: for every NN tested below, the average tags per game converges to 2N112^{N-1} - 1 to three or four digits.

Spot check on small cases. At N=2N = 2 a single tag ends the game, matching 211=12^{1} - 1 = 1. At N=3N = 3 the expected length is 3=2213 = 2^{2} - 1, matching the example in the puzzle statement. We hand-verify N=4N = 4 by direct exhaustive expectation on the state transitions in the computation below.

Expected number of tags. If each tag increases Φ\Phi by 11 in expectation, then by the optional stopping argument (the game terminates almost surely in finite time, Φ\Phi is bounded), E[T]=ΦendΦ0=2N11\mathbb{E}[T] = \Phi_{\text{end}} - \Phi_{0} = 2^{N-1} - 1.

The computation

Simulate chaos tag for several values of NN and compare the empirical average to 2N112^{N-1} - 1.

  1. Maintain the active set and, for each player, the list of children (players they tagged who are still inactive, rooted at this player).

  2. Each step: pick two distinct active players uniformly at random; one tags the other. The tagged player joins the tagger’s child list and becomes inactive. The tagged player’s existing children become root-level active again (with empty child lists, since their own children remain rooted at them).

  3. Count tags until exactly one active player remains.

import random

def chaos_tag(N, rng):
    active = set(range(N))
    children = {i: [] for i in range(N)}      # who has each player tagged
    tags = 0
    while len(active) > 1:
        a, b = rng.sample(sorted(active), 2)  # a tags b
        active.remove(b)
        children[a].append(b)
        # b's previously inactive children become root-level active again
        for c in children[b]:
            active.add(c)
        children[b] = []
        tags += 1
    return tags

rng = random.Random(0)
for N in range(2, 8):
    T = 30_000 if N < 6 else 5_000
    avg = sum(chaos_tag(N, rng) for _ in range(T)) / T
    print(f"N={N}: emp avg = {avg:.3f}, predicted 2^(N-1)-1 = {2**(N-1)-1}")
# N=2: empirical avg = 1.000,   predicted 2^(N-1)-1 = 1
# N=3: empirical avg = 2.995,   predicted 2^(N-1)-1 = 3
# N=4: empirical avg = 6.975,   predicted 2^(N-1)-1 = 7
# N=5: empirical avg = 15.083,  predicted 2^(N-1)-1 = 15
# N=6: empirical avg = 30.864,  predicted 2^(N-1)-1 = 31
# N=7: empirical avg = 62.152,  predicted 2^(N-1)-1 = 63

Empirical means hug the predicted 2N112^{N-1} - 1 across NN from 22 through 77, confirming the potential argument.