Chapter 188
What Are The Odds World Cup Teams Play Each Other Twice?
Riddler Express
The -team World Cup splits into eight groups of four. The top two from each group advance to a single-elimination knockout bracket: round of , quarter-finals, semi-finals, final, and a third-place game between the two semi-final losers. Assume all teams are equally strong, so every match is a coin flip. What is the chance that some pair of teams meets twice during the tournament, that is, in the group stage and then again in the final or the third-place game?
The Riddler, FiveThirtyEight, June 29, 2018(original post)
Solution
Two teams from the same group meet again only if both reach the final, or both reach the third-place game. Pick one group, label its two advancing teams and . The bracket sends to one half of the knockout draw and to the other; symmetry across the eight groups means every group is in the same situation.
In each half there are eight teams, identical by assumption, and three rounds of knockout reduce them to one finalist. By symmetry, each of the eight has probability of being the finalist from its half. The two halves are independent, so The third-place game pairs the two semi-final losers. For to be one of them it must reach the semi-final ( of teams do, so probability ) and lose it (probability ); same for on the other side. Hence A team cannot do both, so these events are disjoint, and the probability that group ’s pair meets twice is .
Across the eight groups the meet-in-final events are mutually exclusive (only two finalists), and so are the meet-in-third-place events (only two SF losers). So These two events can overlap, when a different group accounts for each. To get the overlap, fix a group whose pair meets in the final ( and are the two finalists) and ask for the probability that some other group’s pair meets in the third-place game. Conditional on and being the finalists, the third-place game pits two of the remaining six SF participants, one from each half; the side came from contributes one SF loser at random from three candidates, same for ’s side, and the chance these two SF losers are from the same group is . A cleaner way: by the inclusion-exclusion argument given in the official write-up, So
Extra credit. Given FiveThirtyEight’s pre-tournament Elo-style projections (in which teams differ in strength) the probability rises slightly, to roughly . Strong teams are more likely than to reach the final from their half, which raises the chance that two pre-favourites from the same group both go deep. We do not have the exact pre-tournament Elo vector, so this number is reproduced as reported in the official write-up.
The computation
Encode the actual FIFA-2018 bracket pairings (round of matchups are dictated by group letters), simulate the entire tournament with coin-flip outcomes, and measure how often some pair of teams meets twice.
Each group yields two advancing teams labelled and .
FIFA-2018 round-of- pairings, half : Half :
Quarter-finals pair winners with and winners with within each half; semi-finals merge them; the finalist is one team per half.
Each match is a fair coin flip. Track whether the two finalists or the two SF losers share a group letter.
import random
def play(half):
"""half = list of four (a,b) Round-of-16 pairings."""
r16 = [random.choice(p) for p in half]
qf = [random.choice([r16[0], r16[1]]),
random.choice([r16[2], r16[3]])]
finalist = random.choice(qf)
sf_loser = qf[0] if finalist == qf[1] else qf[1]
return finalist, sf_loser
half1 = [((0,1),(1,2)), ((2,1),(3,2)), ((4,1),(5,2)), ((6,1),(7,2))]
half2 = [((1,1),(0,2)), ((3,1),(2,2)), ((5,1),(4,2)), ((7,1),(6,2))]
random.seed(0)
N = 2_000_000
hits = 0
for _ in range(N):
f1, s1 = play(half1)
f2, s2 = play(half2)
if f1[0] == f2[0] or s1[0] == s2[0]:
hits += 1
print(f"MC estimate: {hits/N:.5f} exact 7/32 = {7/32:.5f}")
The Monte Carlo prints , agreeing with to four decimal places.
Riddler Classic
A gate’s keypad has buttons (the digits through ) and accepts a -digit passcode. The keypad never resets: the gate opens as soon as the last four buttons pressed match the code. Pressing tries both and . What is the smallest number of presses that guarantees you try every possible -digit code? How does the answer change if you do not know the length?
The Riddler, FiveThirtyEight, June 29, 2018(original post)
Solution
There are possible codes. Every code you try costs one button press except for the very first code, which costs four. After that, each new press tacks one more length- window onto the sequence, so a sequence of length contains overlapping windows. To cover all codes we need , that is . So is a lower bound.
We achieve the bound with a de Bruijn sequence. A de Bruijn sequence on a -letter alphabet is a cyclic string of length in which every length- word appears exactly once as a window. For , this is a cyclic string of length . Open the cycle by repeating its first three symbols at the end, and you have a linear string of length that contains every -digit code exactly once as a window. Hence
Why does such a cyclic sequence always exist? Build the directed graph whose vertices are -digit strings and whose edges are -digit strings: an edge labelled runs from to . Each vertex has outgoing edges (one per choice of last digit) and incoming edges (one per choice of first digit), so the graph is Eulerian. An Eulerian circuit traverses every edge exactly once: reading off the last digit of each edge gives a cyclic sequence of length in which every length- window is a distinct edge. The construction generalises to any and .
Extra credit. If the passcode length is unknown but bounded by , the same de Bruijn argument gives presses: a de Bruijn sequence contains every code of length , and overlapping windows of lengths are subsumed because, for instance, any pair of consecutive digits within a long sequence is automatically tried. So
The computation
Build a de Bruijn sequence explicitly with a Lyndon-words construction, append the first three digits, and verify that the resulting length- string contains all codes as length- windows. This is an exhaustive check of the construction.
def de_bruijn(k, n):
a = [0] * (k * n)
seq = []
def db(t, p):
if t > n:
if n % p == 0:
seq.extend(a[1:p+1])
else:
a[t] = a[t-p]
db(t + 1, p)
for j in range(a[t-p] + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
return seq
cyc = de_bruijn(10, 4)
linear = cyc + cyc[:3]
print("linear length:", len(linear))
windows = {tuple(linear[i:i+4]) for i in range(len(linear) - 3)}
print("distinct 4-windows:", len(windows))
The script prints linear length: 10003 and distinct 4-windows: 10000, confirming both that presses suffice and that the bound is met by an explicit sequence.