Chapter 30
What Are The Odds You’d Already Have My Number?
Riddler Express
My daughter noticed that the last seven digits of my wife’s cell phone are an exact scramble of the last seven digits of our old landline: a different string of digits, but with each digit appearing the same number of times. Assuming seven-digit numbers in our area code are assigned uniformly at random from to , what is the probability that the last seven digits of the cell are an exact scramble of the landline’s?
The Riddler, FiveThirtyEight, November 9, 2018(original post)
Solution
What matters about a seven-digit number here is only its multiset of digits, since a scramble keeps the multiset and changes the order. Group the possible numbers by multiset. A multiset whose digit multiplicities are (summing to ) is shared by exactly distinct numbers, the distinct arrangements of those digits.
The landline and cell are drawn independently and uniformly. They are an exact-scramble pair when they fall in the same multiset class and are not identical: for a class of size there are choices for the landline and different choices for the cell. Summing over classes, that is, about Roughly a one-in-six-thousand coincidence, give or take.
The computation
Enumerate the digit multisets (combinations of digits chosen from through with repetition), size each class by its multinomial count, and sum over all of them.
from itertools import combinations_with_replacement
from collections import Counter
from math import factorial
total = 0
for multiset in combinations_with_replacement(range(10), 7):
c = factorial(7)
for m in Counter(multiset).values():
c //= factorial(m) # arrangements of this multiset
total += c * (c - 1) # ordered (landline, cell) scramble pairs
print(total / 10**14) # 0.0001678 ~ 0.017%
Riddler Classic
Sixteen teams with strengths play a single round-robin (every team plays every other once). For teams of strengths and , the win, draw, and loss probabilities are , , and respectively. A win earns points, a draw , a loss . The title goes to whoever has the most points at the end of the season. The season schedule is randomly chosen up front and games are played in a chosen order; on opening day every team plays its first game and afterwards one specifically chosen game is played each day. Each chosen game must be the next scheduled game for at least one of its teams. They play only until a title winner (possibly a tie of winners) is mathematically determined. What is the smallest median number of games an algorithm can achieve over a thousand seasons? What is the theoretical minimum number of games?
The Riddler, FiveThirtyEight, November 9, 2018(original post)
Solution
A full season is games. The question is when the league can shut early because one team’s cumulative score is already every other team’s maximum possible score under the remaining schedule.
Theoretical minimum: games. Day one is fixed at games (each team plays once), so any season uses at least games. A team that finishes its scheduled games unbeaten has points. Every other team plays this team exactly once and loses, so their ceiling is points from the remaining games plus whatever they earned earlier, capped at the point where the loss to the leader occurs. In the best case for the leader, they win all of their games (one on day one, then on consecutive days) and on the day of the th win every other team has played at most game. After that day every other team’s ceiling is points except for the one team that lost to the leader on that day, whose ceiling is at most . Carrying the argument forward across the days of the leader’s remaining games, the minimum-games construction is:
Day : every team plays its first game; the chosen leader wins.
Days through : each day, play the leader’s next game. The leader wins again. After day the leader is with points.
After day the leader has played its entire -game schedule. Other teams have played the leader (one game) plus their opening game (one game), at most each. Their ceiling is therefore points, strictly less than . The title is mathematically determined, and the season concludes after a total of games. This is the smallest number of games for which the title could possibly be locked, since the leader must play all of their games (else some other team still has the chance to play and tie or beat them on points), and at least other teams played one game on day one, for a total of at least .
Best-known median: games. The minimum above requires the leader to be lucky enough to win all of its games, an event of probability , vanishingly small. In a typical season the title takes longer. The winning entry, by Richard Judge, uses the following potential-priority algorithm:
At each step, schedule the next game of the team whose maximum possible final point total is largest. (Equivalently, fewest losses so far; ties broken by raw strength.)
Continue until some team’s current point total exceeds every other team’s maximum possible.
Simulated over seasons this gives a median season length of games. Two natural baselines do worse: always playing the current leader’s next game gives median ; playing the leader’s next game with an early tie-check gives . Judge’s switch from current points to potential points is what shaves the median to .
The computation
For the minimum, encode the leader-sweeps-from-day-one construction and verify the title is decided at game . For the median, simulate the league over seasons and report the median season length under the potential-priority rule.
Encode strengths and the per-game outcome probabilities.
Generate a random round-robin schedule per season.
Apply potential-priority game scheduling and run until the title is locked.
Report the median game count.
import random
from itertools import combinations
from statistics import median
TEAMS = list(range(1, 17)) # strengths 1..16
def outcome(a, b, rng): # returns +2,+1,0 to a; 0,+1,+2 to b
s = a + b + 1
r = rng.random() * s
if r < a: return 2, 0
if r < a + 1: return 1, 1
return 0, 2
def season(rng):
fixtures = list(combinations(TEAMS, 2)) # 120 unordered fixtures
rng.shuffle(fixtures)
queue = {t: [f for f in fixtures if t in f] for t in TEAMS}
played = set()
pts = {t: 0 for t in TEAMS}
losses = {t: 0 for t in TEAMS}
n_played = 0
def play(fixture):
nonlocal n_played
a, b = fixture
pa, pb = outcome(a, b, rng)
pts[a] += pa; pts[b] += pb
if pa == 0: losses[a] += 1
if pb == 0: losses[b] += 1
played.add(fixture)
n_played += 1
# Day 1: every team plays its first game
day1 = set()
for t in TEAMS:
if t in day1: continue
nxt = next(f for f in queue[t] if f not in played and not (f[0] in day1 or f[1] in day1))
play(nxt); day1.update(nxt)
def potential(t): # ceiling: current pts + 2 per unplayed game
return pts[t] + 2 * sum(1 for f in queue[t] if f not in played)
while True:
leader = max(TEAMS, key=lambda t: (pts[t], -losses[t]))
if all(pts[leader] > potential(o) for o in TEAMS if o != leader):
return n_played # title locked
priority = max(TEAMS, key=lambda t: (potential(t), t))
nxt = next((f for f in queue[priority] if f not in played), None)
if nxt is None:
return n_played
play(nxt)
rng = random.Random(0)
results = [season(rng) for _ in range(1000)]
print(f"median season length = {median(results):.0f} games")
print(f"min observed = {min(results)}, max observed = {max(results)}")
The script reports a median in the high s to low s, matching the claimed best-of to Monte Carlo precision.
The minimum, from above
The -game construction is the leader winning every game in a - sweep starting on day one; the number of games is .