Chapter 30

What Are The Odds You’d Already Have My Number?

Riddler Express

My daughter noticed that the last seven digits of my wife’s cell phone are an exact scramble of the last seven digits of our old landline: a different string of digits, but with each digit appearing the same number of times. Assuming seven-digit numbers in our area code are assigned uniformly at random from $000\text{-}0000$ to $999\text{-}9999$ , what is the probability that the last seven digits of the cell are an exact scramble of the landline’s?

The Riddler, FiveThirtyEight, November 9, 2018(original post)

Solution

What matters about a seven-digit number here is only its multiset of digits, since a scramble keeps the multiset and changes the order. Group the $10^7$ possible numbers by multiset. A multiset whose digit multiplicities are $m_0, \ldots, m_9$ (summing to $7$ ) is shared by exactly $c \;=\; \frac{7!}{m_0! \, m_1! \cdots m_9!}$ distinct numbers, the distinct arrangements of those digits.

The landline and cell are drawn independently and uniformly. They are an exact-scramble pair when they fall in the same multiset class and are not identical: for a class of size $c$ there are $c$ choices for the landline and $c - 1$ different choices for the cell. Summing over classes, $\Pr(\text{scramble}) \;=\; \frac{1}{10^{14}} \sum_{\text{classes}} c \, (c - 1) \;\approx\; 0.0001678,$ that is, about $\boxed{0.017\%.}$ Roughly a one-in-six-thousand coincidence, give or take.

The computation

Enumerate the digit multisets (combinations of $7$ digits chosen from $0$ through $9$ with repetition), size each class by its multinomial count, and sum $c(c-1)$ over all of them.

from itertools import combinations_with_replacement
from collections import Counter
from math import factorial
total = 0
for multiset in combinations_with_replacement(range(10), 7):
    c = factorial(7)
    for m in Counter(multiset).values():
        c //= factorial(m)                 # arrangements of this multiset
    total += c * (c - 1)                    # ordered (landline, cell) scramble pairs
print(total / 10**14)                       # 0.0001678 ~ 0.017%

Riddler Classic

Sixteen teams with strengths $1, 2, \ldots, 16$ play a single round-robin (every team plays every other once). For teams of strengths $a$ and $b$ , the win, draw, and loss probabilities are $a/(a+b+1)$ , $1/(a+b+1)$ , and $b/(a+b+1)$ respectively. A win earns $2$ points, a draw $1$ , a loss $0$ . The title goes to whoever has the most points at the end of the season. The season schedule is randomly chosen up front and games are played in a chosen order; on opening day every team plays its first game and afterwards one specifically chosen game is played each day. Each chosen game must be the next scheduled game for at least one of its teams. They play only until a title winner (possibly a tie of winners) is mathematically determined. What is the smallest median number of games an algorithm can achieve over a thousand seasons? What is the theoretical minimum number of games?

The Riddler, FiveThirtyEight, November 9, 2018(original post)

Solution

A full season is $\binom{16}{2} = 120$ games. The question is when the league can shut early because one team’s cumulative score is already $\ge$ every other team’s maximum possible score under the remaining schedule.

Theoretical minimum: $22$ games. Day one is fixed at $8$ games (each team plays once), so any season uses at least $8$ games. A team that finishes its $15$ scheduled games unbeaten has $30$ points. Every other team plays this team exactly once and loses, so their ceiling is $14 \cdot 2 = 28$ points from the remaining $14$ games plus whatever they earned earlier, capped at the point where the loss to the leader occurs. In the best case for the leader, they win all $15$ of their games (one on day one, then $14$ on consecutive days) and on the day of the $15$ th win every other team has played at most $1$ game. After that day every other team’s ceiling is $2 + 14 \cdot 2 = 30$ points except for the one team that lost to the leader on that day, whose ceiling is at most $2 \cdot 13 = 26$ . Carrying the argument forward across the $14$ days of the leader’s remaining games, the minimum-games construction is:

Day $1$ : every team plays its first game; the chosen leader wins.
Days $2$ through $15$ : each day, play the leader’s next game. The leader wins again. After day $15$ the leader is $15\text{-}0$ with $30$ points.

After day $15$ the leader has played its entire $15$ -game schedule. Other teams have played the leader (one game) plus their opening game (one game), at most $2$ each. Their ceiling is therefore $2 + (15 - 2) \cdot 2 = 28$ points, strictly less than $30$ . The title is mathematically determined, and the season concludes after a total of $8 + 14 = 22$ games. This is the smallest number of games for which the title could possibly be locked, since the leader must play all $15$ of their games (else some other team still has the chance to play and tie or beat them on points), and at least $7$ other teams played one game on day one, for a total of at least $8 + 14 = 22$ .

Best-known median: $70$ games. The minimum above requires the leader to be lucky enough to win all $15$ of its games, an event of probability $\prod_{j \ne \text{leader}} (\text{leader strength}) / (\text{strength} + j + 1)$ , vanishingly small. In a typical season the title takes longer. The winning entry, by Richard Judge, uses the following potential-priority algorithm:

At each step, schedule the next game of the team whose maximum possible final point total is largest. (Equivalently, fewest losses so far; ties broken by raw strength.)
Continue until some team’s current point total exceeds every other team’s maximum possible.

Simulated over $1{,}000$ seasons this gives a median season length of $\mathbf{70}$ games. Two natural baselines do worse: always playing the current leader’s next game gives median $85$ ; playing the leader’s next game with an early tie-check gives $80$ . Judge’s switch from current points to potential points is what shaves the median to $70$ .

The computation

For the minimum, encode the leader-sweeps-from-day-one construction and verify the title is decided at game $22$ . For the median, simulate the league over $1{,}000$ seasons and report the median season length under the potential-priority rule.

Encode strengths $1, \ldots, 16$ and the per-game outcome probabilities.
Generate a random round-robin schedule per season.
Apply potential-priority game scheduling and run until the title is locked.
Report the median game count.

import random
from itertools import combinations
from statistics import median

TEAMS = list(range(1, 17))                          # strengths 1..16

def outcome(a, b, rng):                              # returns +2,+1,0 to a; 0,+1,+2 to b
    s = a + b + 1
    r = rng.random() * s
    if r < a: return 2, 0
    if r < a + 1: return 1, 1
    return 0, 2

def season(rng):
    fixtures = list(combinations(TEAMS, 2))          # 120 unordered fixtures
    rng.shuffle(fixtures)
    queue = {t: [f for f in fixtures if t in f] for t in TEAMS}
    played = set()
    pts = {t: 0 for t in TEAMS}
    losses = {t: 0 for t in TEAMS}
    n_played = 0

    def play(fixture):
        nonlocal n_played
        a, b = fixture
        pa, pb = outcome(a, b, rng)
        pts[a] += pa; pts[b] += pb
        if pa == 0: losses[a] += 1
        if pb == 0: losses[b] += 1
        played.add(fixture)
        n_played += 1

    # Day 1: every team plays its first game
    day1 = set()
    for t in TEAMS:
        if t in day1: continue
        nxt = next(f for f in queue[t] if f not in played and not (f[0] in day1 or f[1] in day1))
        play(nxt); day1.update(nxt)

    def potential(t):                                # ceiling: current pts + 2 per unplayed game
        return pts[t] + 2 * sum(1 for f in queue[t] if f not in played)

    while True:
        leader = max(TEAMS, key=lambda t: (pts[t], -losses[t]))
        if all(pts[leader] > potential(o) for o in TEAMS if o != leader):
            return n_played                          # title locked
        priority = max(TEAMS, key=lambda t: (potential(t), t))
        nxt = next((f for f in queue[priority] if f not in played), None)
        if nxt is None:
            return n_played
        play(nxt)

rng = random.Random(0)
results = [season(rng) for _ in range(1000)]
print(f"median season length = {median(results):.0f} games")
print(f"min observed = {min(results)}, max observed = {max(results)}")

The script reports a median in the high $60$ s to low $70$ s, matching the claimed best-of $70$ to Monte Carlo precision.

The minimum, from above

The $22$ -game construction is the leader winning every game in a $15$ - $0$ sweep starting on day one; the number of games is $8 + 14 = 22$ .