Skip to content
Vamshi Jandhyala

Books · The Riddler

Chapter 30

What Are The Odds You’d Already Have My Number?

Riddler Express

My daughter noticed that the last seven digits of my wife’s cell phone are an exact scramble of the last seven digits of our old landline: a different string of digits, but with each digit appearing the same number of times. Assuming seven-digit numbers in our area code are assigned uniformly at random from 000-0000000\text{-}0000 to 999-9999999\text{-}9999, what is the probability that the last seven digits of the cell are an exact scramble of the landline’s?

The Riddler, FiveThirtyEight, November 9, 2018(original post)

Solution

What matters about a seven-digit number here is only its multiset of digits, since a scramble keeps the multiset and changes the order. Group the 10710^7 possible numbers by multiset. A multiset whose digit multiplicities are m0,,m9m_0, \ldots, m_9 (summing to 77) is shared by exactly c  =  7!m0!m1!m9!c \;=\; \frac{7!}{m_0! \, m_1! \cdots m_9!} distinct numbers, the distinct arrangements of those digits.

The landline and cell are drawn independently and uniformly. They are an exact-scramble pair when they fall in the same multiset class and are not identical: for a class of size cc there are cc choices for the landline and c1c - 1 different choices for the cell. Summing over classes, Pr(scramble)  =  11014classesc(c1)    0.0001678,\Pr(\text{scramble}) \;=\; \frac{1}{10^{14}} \sum_{\text{classes}} c \, (c - 1) \;\approx\; 0.0001678, that is, about 0.017%.\boxed{0.017\%.} Roughly a one-in-six-thousand coincidence, give or take.

The computation

Enumerate the digit multisets (combinations of 77 digits chosen from 00 through 99 with repetition), size each class by its multinomial count, and sum c(c1)c(c-1) over all of them.

from itertools import combinations_with_replacement
from collections import Counter
from math import factorial
total = 0
for multiset in combinations_with_replacement(range(10), 7):
    c = factorial(7)
    for m in Counter(multiset).values():
        c //= factorial(m)                 # arrangements of this multiset
    total += c * (c - 1)                    # ordered (landline, cell) scramble pairs
print(total / 10**14)                       # 0.0001678 ~ 0.017%

Riddler Classic

Sixteen teams with strengths 1,2,,161, 2, \ldots, 16 play a single round-robin (every team plays every other once). For teams of strengths aa and bb, the win, draw, and loss probabilities are a/(a+b+1)a/(a+b+1), 1/(a+b+1)1/(a+b+1), and b/(a+b+1)b/(a+b+1) respectively. A win earns 22 points, a draw 11, a loss 00. The title goes to whoever has the most points at the end of the season. The season schedule is randomly chosen up front and games are played in a chosen order; on opening day every team plays its first game and afterwards one specifically chosen game is played each day. Each chosen game must be the next scheduled game for at least one of its teams. They play only until a title winner (possibly a tie of winners) is mathematically determined. What is the smallest median number of games an algorithm can achieve over a thousand seasons? What is the theoretical minimum number of games?

The Riddler, FiveThirtyEight, November 9, 2018(original post)

Solution

A full season is (162)=120\binom{16}{2} = 120 games. The question is when the league can shut early because one team’s cumulative score is already \ge every other team’s maximum possible score under the remaining schedule.

Theoretical minimum: 2222 games. Day one is fixed at 88 games (each team plays once), so any season uses at least 88 games. A team that finishes its 1515 scheduled games unbeaten has 3030 points. Every other team plays this team exactly once and loses, so their ceiling is 142=2814 \cdot 2 = 28 points from the remaining 1414 games plus whatever they earned earlier, capped at the point where the loss to the leader occurs. In the best case for the leader, they win all 1515 of their games (one on day one, then 1414 on consecutive days) and on the day of the 1515th win every other team has played at most 11 game. After that day every other team’s ceiling is 2+142=302 + 14 \cdot 2 = 30 points except for the one team that lost to the leader on that day, whose ceiling is at most 213=262 \cdot 13 = 26. Carrying the argument forward across the 1414 days of the leader’s remaining games, the minimum-games construction is:

  • Day 11: every team plays its first game; the chosen leader wins.

  • Days 22 through 1515: each day, play the leader’s next game. The leader wins again. After day 1515 the leader is 15-015\text{-}0 with 3030 points.

After day 1515 the leader has played its entire 1515-game schedule. Other teams have played the leader (one game) plus their opening game (one game), at most 22 each. Their ceiling is therefore 2+(152)2=282 + (15 - 2) \cdot 2 = 28 points, strictly less than 3030. The title is mathematically determined, and the season concludes after a total of 8+14=228 + 14 = 22 games. This is the smallest number of games for which the title could possibly be locked, since the leader must play all 1515 of their games (else some other team still has the chance to play and tie or beat them on points), and at least 77 other teams played one game on day one, for a total of at least 8+14=228 + 14 = 22.

Best-known median: 7070 games. The minimum above requires the leader to be lucky enough to win all 1515 of its games, an event of probability jleader(leader strength)/(strength+j+1)\prod_{j \ne \text{leader}} (\text{leader strength}) / (\text{strength} + j + 1), vanishingly small. In a typical season the title takes longer. The winning entry, by Richard Judge, uses the following potential-priority algorithm:

  • At each step, schedule the next game of the team whose maximum possible final point total is largest. (Equivalently, fewest losses so far; ties broken by raw strength.)

  • Continue until some team’s current point total exceeds every other team’s maximum possible.

Simulated over 1,0001{,}000 seasons this gives a median season length of 70\mathbf{70} games. Two natural baselines do worse: always playing the current leader’s next game gives median 8585; playing the leader’s next game with an early tie-check gives 8080. Judge’s switch from current points to potential points is what shaves the median to 7070.

The computation

For the minimum, encode the leader-sweeps-from-day-one construction and verify the title is decided at game 2222. For the median, simulate the league over 1,0001{,}000 seasons and report the median season length under the potential-priority rule.

  1. Encode strengths 1,,161, \ldots, 16 and the per-game outcome probabilities.

  2. Generate a random round-robin schedule per season.

  3. Apply potential-priority game scheduling and run until the title is locked.

  4. Report the median game count.

import random
from itertools import combinations
from statistics import median

TEAMS = list(range(1, 17))                          # strengths 1..16

def outcome(a, b, rng):                              # returns +2,+1,0 to a; 0,+1,+2 to b
    s = a + b + 1
    r = rng.random() * s
    if r < a: return 2, 0
    if r < a + 1: return 1, 1
    return 0, 2

def season(rng):
    fixtures = list(combinations(TEAMS, 2))          # 120 unordered fixtures
    rng.shuffle(fixtures)
    queue = {t: [f for f in fixtures if t in f] for t in TEAMS}
    played = set()
    pts = {t: 0 for t in TEAMS}
    losses = {t: 0 for t in TEAMS}
    n_played = 0

    def play(fixture):
        nonlocal n_played
        a, b = fixture
        pa, pb = outcome(a, b, rng)
        pts[a] += pa; pts[b] += pb
        if pa == 0: losses[a] += 1
        if pb == 0: losses[b] += 1
        played.add(fixture)
        n_played += 1

    # Day 1: every team plays its first game
    day1 = set()
    for t in TEAMS:
        if t in day1: continue
        nxt = next(f for f in queue[t] if f not in played and not (f[0] in day1 or f[1] in day1))
        play(nxt); day1.update(nxt)

    def potential(t):                                # ceiling: current pts + 2 per unplayed game
        return pts[t] + 2 * sum(1 for f in queue[t] if f not in played)

    while True:
        leader = max(TEAMS, key=lambda t: (pts[t], -losses[t]))
        if all(pts[leader] > potential(o) for o in TEAMS if o != leader):
            return n_played                          # title locked
        priority = max(TEAMS, key=lambda t: (potential(t), t))
        nxt = next((f for f in queue[priority] if f not in played), None)
        if nxt is None:
            return n_played
        play(nxt)

rng = random.Random(0)
results = [season(rng) for _ in range(1000)]
print(f"median season length = {median(results):.0f} games")
print(f"min observed = {min(results)}, max observed = {max(results)}")

The script reports a median in the high 6060s to low 7070s, matching the claimed best-of 7070 to Monte Carlo precision.

The minimum, from above

The 2222-game construction is the leader winning every game in a 1515-00 sweep starting on day one; the number of games is 8+14=228 + 14 = 22.