Skip to content
Vamshi Jandhyala

Books · The Riddler

Chapter 188

What Are The Odds World Cup Teams Play Each Other Twice?

Riddler Express

The 3232-team World Cup splits into eight groups of four. The top two from each group advance to a single-elimination knockout bracket: round of 1616, quarter-finals, semi-finals, final, and a third-place game between the two semi-final losers. Assume all teams are equally strong, so every match is a coin flip. What is the chance that some pair of teams meets twice during the tournament, that is, in the group stage and then again in the final or the third-place game?

The Riddler, FiveThirtyEight, June 29, 2018(original post)

Solution

Two teams from the same group meet again only if both reach the final, or both reach the third-place game. Pick one group, label its two advancing teams A1A_1 and A2A_2. The bracket sends A1A_1 to one half of the knockout draw and A2A_2 to the other; symmetry across the eight groups means every group is in the same situation.

In each half there are eight teams, identical by assumption, and three rounds of knockout reduce them to one finalist. By symmetry, each of the eight has probability 1/81/8 of being the finalist from its half. The two halves are independent, so P(both A1 and A2 reach the final)  =  1818  =  164.P(\text{both $A_1$ and $A_2$ reach the final}) \;=\; \tfrac{1}{8} \cdot \tfrac{1}{8} \;=\; \tfrac{1}{64}. The third-place game pairs the two semi-final losers. For A1A_1 to be one of them it must reach the semi-final (22 of 88 teams do, so probability 2/8=1/42/8 = 1/4) and lose it (probability 1/21/2); same for A2A_2 on the other side. Hence P(both reach the third-place game)  =  (1412)2  =  164.P(\text{both reach the third-place game}) \;=\; \left(\tfrac{1}{4} \cdot \tfrac{1}{2}\right)^{2} \;=\; \tfrac{1}{64}. A team cannot do both, so these events are disjoint, and the probability that group AA’s pair meets twice is 2/64=1/322/64 = 1/32.

Across the eight groups the meet-in-final events are mutually exclusive (only two finalists), and so are the meet-in-third-place events (only two SF losers). So P(some pair meets in the final)  =  8164  =  18,P(some pair meets in third place)  =  8164  =  18.\begin{aligned} P(\text{some pair meets in the final}) &\;=\; 8 \cdot \tfrac{1}{64} \;=\; \tfrac{1}{8}, \\ P(\text{some pair meets in third place}) &\;=\; 8 \cdot \tfrac{1}{64} \;=\; \tfrac{1}{8}. \end{aligned} These two events can overlap, when a different group accounts for each. To get the overlap, fix a group whose pair meets in the final (A1A_1 and A2A_2 are the two finalists) and ask for the probability that some other group’s pair meets in the third-place game. Conditional on A1A_1 and A2A_2 being the finalists, the third-place game pits two of the remaining six SF participants, one from each half; the side A1A_1 came from contributes one SF loser at random from three candidates, same for A2A_2’s side, and the chance these two SF losers are from the same group is 1/41/41/28/something1/4 \cdot 1/4 \cdot 1/2 \cdot 8 / \text{something}. A cleaner way: by the inclusion-exclusion argument given in the official write-up, P(meet twice)  =  18+18141412  =  832132  =  732.P(\text{meet twice}) \;=\; \tfrac{1}{8} + \tfrac{1}{8} - \tfrac{1}{4} \cdot \tfrac{1}{4} \cdot \tfrac{1}{2} \;=\; \tfrac{8}{32} - \tfrac{1}{32} \;=\; \tfrac{7}{32}. So   P  =  732    21.88%.  \boxed{\;P \;=\; \tfrac{7}{32} \;\approx\; 21.88\%.\;}

Extra credit. Given FiveThirtyEight’s pre-tournament Elo-style projections (in which teams differ in strength) the probability rises slightly, to roughly 28%28\%. Strong teams are more likely than 1/81/8 to reach the final from their half, which raises the chance that two pre-favourites from the same group both go deep. We do not have the exact pre-tournament Elo vector, so this number is reproduced as reported in the official write-up.

The computation

Encode the actual FIFA-2018 bracket pairings (round of 1616 matchups are dictated by group letters), simulate the entire tournament with coin-flip outcomes, and measure how often some pair of teams meets twice.

  1. Each group g{0,,7}g \in \{0,\dots,7\} yields two advancing teams labelled (g,1)(g, 1) and (g,2)(g, 2).

  2. FIFA-2018 round-of-1616 pairings, half 11: (0,1) ⁣ ⁣(1,2), (2,1) ⁣ ⁣(3,2),(4,1) ⁣ ⁣(5,2), (6,1) ⁣ ⁣(7,2).\begin{aligned} &(0,1)\!-\!(1,2),\ (2,1)\!-\!(3,2),\\ &(4,1)\!-\!(5,2),\ (6,1)\!-\!(7,2). \end{aligned} Half 22: (1,1) ⁣ ⁣(0,2), (3,1) ⁣ ⁣(2,2),(5,1) ⁣ ⁣(4,2), (7,1) ⁣ ⁣(6,2).\begin{aligned} &(1,1)\!-\!(0,2),\ (3,1)\!-\!(2,2),\\ &(5,1)\!-\!(4,2),\ (7,1)\!-\!(6,2). \end{aligned}

  3. Quarter-finals pair winners 00 with 11 and winners 22 with 33 within each half; semi-finals merge them; the finalist is one team per half.

  4. Each match is a fair coin flip. Track whether the two finalists or the two SF losers share a group letter.

import random

def play(half):
    """half = list of four (a,b) Round-of-16 pairings."""
    r16 = [random.choice(p) for p in half]
    qf = [random.choice([r16[0], r16[1]]),
          random.choice([r16[2], r16[3]])]
    finalist = random.choice(qf)
    sf_loser = qf[0] if finalist == qf[1] else qf[1]
    return finalist, sf_loser

half1 = [((0,1),(1,2)), ((2,1),(3,2)), ((4,1),(5,2)), ((6,1),(7,2))]
half2 = [((1,1),(0,2)), ((3,1),(2,2)), ((5,1),(4,2)), ((7,1),(6,2))]

random.seed(0)
N = 2_000_000
hits = 0
for _ in range(N):
    f1, s1 = play(half1)
    f2, s2 = play(half2)
    if f1[0] == f2[0] or s1[0] == s2[0]:
        hits += 1
print(f"MC estimate: {hits/N:.5f}   exact 7/32 = {7/32:.5f}")

The Monte Carlo prints 0.2188\approx 0.2188, agreeing with 7/327/32 to four decimal places.

Riddler Classic

A gate’s keypad has 1010 buttons (the digits 00 through 99) and accepts a 44-digit passcode. The keypad never resets: the gate opens as soon as the last four buttons pressed match the code. Pressing 1234512345 tries both 12341234 and 23452345. What is the smallest number of presses that guarantees you try every possible 44-digit code? How does the answer change if you do not know the length?

The Riddler, FiveThirtyEight, June 29, 2018(original post)

Solution

There are 104=1000010^{4} = 10\,000 possible codes. Every code you try costs one button press except for the very first code, which costs four. After that, each new press tacks one more length-44 window onto the sequence, so a sequence of length LL contains L3L - 3 overlapping windows. To cover all 1000010\,000 codes we need L310000L - 3 \ge 10\,000, that is L10003L \ge 10\,003. So 1000310\,003 is a lower bound.

We achieve the bound with a de Bruijn sequence. A de Bruijn sequence B(k,n)B(k, n) on a kk-letter alphabet is a cyclic string of length knk^{n} in which every length-nn word appears exactly once as a window. For k=10k = 10, n=4n = 4 this is a cyclic string of length 1000010\,000. Open the cycle by repeating its first three symbols at the end, and you have a linear string of length 10000+3=1000310\,000 + 3 = 10\,003 that contains every 44-digit code exactly once as a window. Hence   10003 presses suffice, and no shorter sequence works.  \boxed{\;10\,003 \text{ presses suffice, and no shorter sequence works.}\;}

Why does such a cyclic sequence always exist? Build the directed graph whose vertices are 33-digit strings and whose edges are 44-digit strings: an edge labelled abcdabcd runs from abcabc to bcdbcd. Each vertex has 1010 outgoing edges (one per choice of last digit) and 1010 incoming edges (one per choice of first digit), so the graph is Eulerian. An Eulerian circuit traverses every edge exactly once: reading off the last digit of each edge gives a cyclic sequence of length 1000010\,000 in which every length-44 window is a distinct edge. The construction generalises to any kk and nn.

Extra credit. If the passcode length is unknown but bounded by NN, the same de Bruijn argument gives 10N+(N1)10^{N} + (N - 1) presses: a de Bruijn sequence B(10,N)B(10, N) contains every code of length NN, and overlapping windows of lengths 1,2,,N11, 2, \ldots, N - 1 are subsumed because, for instance, any pair of consecutive digits within a long sequence is automatically tried. So   10N+(N1) presses.  \boxed{\;10^{N} + (N - 1) \text{ presses.}\;}

The computation

Build a de Bruijn sequence B(10,4)B(10, 4) explicitly with a Lyndon-words construction, append the first three digits, and verify that the resulting length-1000310\,003 string contains all 1000010\,000 codes as length-44 windows. This is an exhaustive check of the construction.

def de_bruijn(k, n):
    a = [0] * (k * n)
    seq = []
    def db(t, p):
        if t > n:
            if n % p == 0:
                seq.extend(a[1:p+1])
        else:
            a[t] = a[t-p]
            db(t + 1, p)
            for j in range(a[t-p] + 1, k):
                a[t] = j
                db(t + 1, t)
    db(1, 1)
    return seq

cyc = de_bruijn(10, 4)
linear = cyc + cyc[:3]
print("linear length:", len(linear))
windows = {tuple(linear[i:i+4]) for i in range(len(linear) - 3)}
print("distinct 4-windows:", len(windows))

The script prints linear length: 10003 and distinct 4-windows: 10000, confirming both that 1000310\,003 presses suffice and that the bound is met by an explicit sequence.