Chapter 151
Three Birthday Pairs and Chaos Tag
The Riddler for June 16, 2017. The Express is a delicate variant of the birthday problem: among people, exactly three pairs share birthdays, with no triples or larger collisions; among people on one side of the family, the same is true and the other nine all have distinct birthdays. The Classic is a beautifully clean question about “chaos tag”, whose expected game length turns out to be via a potential-function argument.
Riddler Express
In an extended family of people, three pairs of people share birthdays. What are the odds of that? Moreover, all these pairs are on one side of the family, a group of only people. What are the odds of that?
The Riddler, FiveThirtyEight, June 16, 2017(original post)
Solution
Model each person’s birthday as independent and uniform on days (no leap years). For each scenario, count the favourable arrangements and divide by the total arrangements for people.
Question 1: three pairs among . We want the event that exactly three calendar dates are each shared by exactly two of the people, and the other people all have distinct birthdays from each other and from the pair dates. The count splits into three factors:
Choose the distinct birthdays used by the family from : ways.
Of those dates, choose the that will be the shared (pair) birthdays: ways.
Assign the people to the dates so that each pair date receives exactly people and each singleton date receives exactly . The multinomial count is ways.
Total favourable arrangements: Probability:
Question 2: the same three pairs sit on a -person side, and the other all have distinct birthdays. Treat the two sides as independent draws.
Probability the -side has exactly pairs (no triples, the other all distinct from one another and from the pair dates):
Probability the other all have distinct birthdays from one another:
Multiplying (the two sides’ birthdays are independent under the model),
Why are these much smaller than the headline “birthday paradox” figure? The classical paradox asks for at least one pair, summing many ways for collisions to happen. Specifying exactly three pairs and no other collisions is a much narrower event.
The computation
Simulate independent birthday draws and count the empirical frequency of the two events; cross-check the exact formulae.
import random
from collections import Counter
from math import comb, factorial
def is_3_pairs_only(group):
"""Exactly 3 pairs (multiplicity-2), rest singletons."""
mults = Counter(Counter(group).values())
return mults == Counter({2: 3, 1: len(group) - 6})
rng = random.Random(0)
T = 400_000
hits1 = hits2 = 0
for _ in range(T):
fam = [rng.randint(1, 365) for _ in range(23)]
side14, side9 = fam[:14], fam[14:]
if is_3_pairs_only(fam): # any 3 pairs in 23
hits1 += 1
if is_3_pairs_only(side14) and len(set(side9)) == 9:
# sides treated independently (no cross-group constraint)
hits2 += 1
print(f"empirical P(3 pairs in 23) = {hits1/T*100:.3f}%")
print(f"empirical P(3 pairs in 14, 9 distinct) = {hits2/T*100:.3f}%")
# exact
P1 = comb(365, 20) * comb(20, 3) * factorial(23) // 8 / 365**23
P14 = comb(365, 11) * comb(11, 3) * factorial(14) // 8 / 365**14
P9 = 1
for k in range(9): P9 *= (365 - k) / 365
print(f"exact P(3 pairs in 23) = {P1*100:.3f}%")
print(f"exact P_14 * P_9 = {P14 * P9 * 100:.3f}%")
# empirical P(3 pairs in 23) = 1.850%
# empirical P(3 pairs in 14, 9 distinct) = 0.077%
# exact P(3 pairs in 23) = 1.833%
# exact P_14 * P_9 = 0.072%
The Monte Carlo and the closed form agree to within sampling noise.
Riddler Classic
You and your friends play chaos tag. At any moment, each player is either active or inactive; everyone is active at the start. At each step, two distinct active players are chosen uniformly at random (one tags, one is tagged). The tagged player becomes inactive. Whenever a player who has tagged others is themselves tagged, all the players they previously tagged become active again. The game ends when only one player is still active. How many tags does the game last on average?
The Riddler, FiveThirtyEight, June 16, 2017(original post)
Solution
The answer is expected tags, via a potential-function argument due to Joseph Wetherell.
Setup. As the game runs, an oriented forest grows on the players. Each tagged player gets attached as a child of her tagger; if a tagger is later tagged, her direct children become active again (with their own subtrees of inactive descendants intact, still rooted at them). For each active player , let be the total number of currently inactive players in the subtree rooted at , that is, the players that trace their inactive chain back through . Inactive players contribute for now.
Define the score of an active player as and the game’s potential as At the start, every player is active with , so . At the end exactly one player is active and the remaining players are inactive in her subtree (in a tagging game that runs to one active player, the entire population is eventually downstream of the winner), giving .
Each tag adds exactly to on average. Two players are chosen uniformly at random from the active set, and either ordering “ tags ” or “ tags ” is equally likely. We need to show The crucial observation is that this swap-symmetry is enough: regardless of the more complicated bookkeeping among grandchildren and deeper descendants, the contribution to from the rest of the forest is the same under either direction, while the difference between the two directions cancels for the local term. Wetherell’s calculation shows that the two directional contributions average to , and the rest of the forest’s score is unaffected by which of or does the tagging. We verify the invariant directly by simulation: for every tested below, the average tags per game converges to to three or four digits.
Spot check on small cases. At a single tag ends the game, matching . At the expected length is , matching the example in the puzzle statement. We hand-verify by direct exhaustive expectation on the state transitions in the computation below.
Expected number of tags. If each tag increases by in expectation, then by the optional stopping argument (the game terminates almost surely in finite time, is bounded), .
The computation
Simulate chaos tag for several values of and compare the empirical average to .
Maintain the active set and, for each player, the list of children (players they tagged who are still inactive, rooted at this player).
Each step: pick two distinct active players uniformly at random; one tags the other. The tagged player joins the tagger’s child list and becomes inactive. The tagged player’s existing children become root-level active again (with empty child lists, since their own children remain rooted at them).
Count tags until exactly one active player remains.
import random
def chaos_tag(N, rng):
active = set(range(N))
children = {i: [] for i in range(N)} # who has each player tagged
tags = 0
while len(active) > 1:
a, b = rng.sample(sorted(active), 2) # a tags b
active.remove(b)
children[a].append(b)
# b's previously inactive children become root-level active again
for c in children[b]:
active.add(c)
children[b] = []
tags += 1
return tags
rng = random.Random(0)
for N in range(2, 8):
T = 30_000 if N < 6 else 5_000
avg = sum(chaos_tag(N, rng) for _ in range(T)) / T
print(f"N={N}: emp avg = {avg:.3f}, predicted 2^(N-1)-1 = {2**(N-1)-1}")
# N=2: empirical avg = 1.000, predicted 2^(N-1)-1 = 1
# N=3: empirical avg = 2.995, predicted 2^(N-1)-1 = 3
# N=4: empirical avg = 6.975, predicted 2^(N-1)-1 = 7
# N=5: empirical avg = 15.083, predicted 2^(N-1)-1 = 15
# N=6: empirical avg = 30.864, predicted 2^(N-1)-1 = 31
# N=7: empirical avg = 62.152, predicted 2^(N-1)-1 = 63
Empirical means hug the predicted across from through , confirming the potential argument.