Skip to content
Vamshi Jandhyala

Books · The Riddler

Chapter 226

Can The Riddler Bros. Beat Joe DiMaggio’s Hitting Streak?

Riddler Express

Unriddle this sequence; what number comes next? 2, 6, 10, 3, 8, 9, 4, 7, ?2, \ 6, \ 10, \ 3, \ 8, \ 9, \ 4, \ 7, \ ?

The Riddler, FiveThirtyEight, May 10, 2019(original post)

Solution

The pattern lives in letters, not arithmetic. Spell each number in English and score it with Scrabble tile values: the next term in the sequence is that score.

Scrabble letter values. Each letter has a fixed Scrabble point value: A, E, I, L, N, O, R, S, T, U each score 11; D, G score 22; B, C, M, P score 33; F, H, V, W, Y score 44; K scores 55; J, X score 88; Q, Z score 1010.

The chain. TWO    1+4+1  =  6,SIX    1+1+8  =  10,TEN    1+1+1  =  3,THREE    1+4+1+1+1  =  8,EIGHT    1+1+2+4+1  =  9,NINE    1+1+1+1  =  4,FOUR    4+1+1+1  =  7,SEVEN    1+1+4+1+1  =  8.\begin{aligned} \mathtt{TWO} &\;\Rightarrow\; 1 + 4 + 1 \;=\; 6, \\ \mathtt{SIX} &\;\Rightarrow\; 1 + 1 + 8 \;=\; 10, \\ \mathtt{TEN} &\;\Rightarrow\; 1 + 1 + 1 \;=\; 3, \\ \mathtt{THREE} &\;\Rightarrow\; 1 + 4 + 1 + 1 + 1 \;=\; 8, \\ \mathtt{EIGHT} &\;\Rightarrow\; 1 + 1 + 2 + 4 + 1 \;=\; 9, \\ \mathtt{NINE} &\;\Rightarrow\; 1 + 1 + 1 + 1 \;=\; 4, \\ \mathtt{FOUR} &\;\Rightarrow\; 4 + 1 + 1 + 1 \;=\; 7, \\ \mathtt{SEVEN} &\;\Rightarrow\; 1 + 1 + 4 + 1 + 1 \;=\; 8. \end{aligned}

The chain enters a cycle: EIGHT9NINE4FOUR7SEVEN8\mathtt{EIGHT} \to 9 \to \mathtt{NINE} \to 4 \to \mathtt{FOUR} \to 7 \to \mathtt{SEVEN} \to 8, after which it loops back through EIGHT\mathtt{EIGHT}. The earlier terms 2,6,10,32, 6, 10, 3 are the lead-in to this cycle.

The computation

Encode the rule directly: a function from a positive integer to its English spelling, then to the Scrabble tile sum. Iterate from 22 and reproduce the published sequence; continue past the question mark to display the cycle.

  1. Build a small spell-out function for integers 11 to 9999 (the sequence stays small).

  2. Build the Scrabble value table from the rules above.

  3. Iterate: each next term is the Scrabble score of the spelling of the current term.

  4. Print the first 2020 terms.

ONES = ["", "one", "two", "three", "four", "five", "six", "seven",
        "eight", "nine"]
TEENS = ["ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
         "sixteen", "seventeen", "eighteen", "nineteen"]
TENS = ["", "", "twenty", "thirty", "forty", "fifty", "sixty",
        "seventy", "eighty", "ninety"]

def spell(n):
    if n < 10:  return ONES[n]
    if n < 20:  return TEENS[n - 10]
    t, u = divmod(n, 10)
    return TENS[t] + (ONES[u] if u else "")

VALS = {"AEILNORSTU": 1, "DG": 2, "BCMP": 3, "FHVWY": 4,
        "K": 5, "JX": 8, "QZ": 10}
LETTER = {}
for k, v in VALS.items():
    for ch in k:
        LETTER[ch] = v

def scrabble(word):
    return sum(LETTER[c] for c in word.upper())

seq = [2]
for _ in range(20):
    seq.append(scrabble(spell(seq[-1])))
print(seq)

The script prints [2,6,10,3,8,9,4,7,8,9,4,7,8,9,4,7,8,9,4,7,8],[2, 6, 10, 3, 8, 9, 4, 7, 8, 9, 4, 7, 8, 9, 4, 7, 8, 9, 4, 7, 8], which reproduces the published lead-in and shows the four-term cycle 894788 \to 9 \to 4 \to 7 \to 8 that the sequence enters.

Riddler Classic

Five brothers play in the Riddler Baseball Independent Society for 2020 seasons of 160160 games with 44 plate appearances per game; each plate appearance is a hit or an out. Their batting averages are .200,.250,.300,.350,.400.200, .250, .300, .350, .400. What are each brother’s chances of beating DiMaggio’s 5656-game hitting streak at some point in his career? Their cousin bats .500.500 but is ejected after his 1010th season. What are his chances?

The Riddler, FiveThirtyEight, May 10, 2019(original post)

Solution

The classical streak problem reduces to a per-game Bernoulli model and then a closed-form recursion for the streak probability.

Per-game hit probability. A brother with batting average AA has hit probability AA on each at-bat. The probability that he gets no hit in a four-at-bat game is (1A)4(1 - A)^{4}, so the probability he gets at least one hit is p  =  1(1A)4.p \;=\; 1 - (1 - A)^{4}. For A=.200,.250,.300,.350,.400,.500A = .200, .250, .300, .350, .400, .500 this gives p=0.5904, 0.6836, 0.7599, 0.8215, 0.8704, 0.9375p = 0.5904,\ 0.6836,\ 0.7599,\ 0.8215,\ 0.8704,\ 0.9375.

Streak recursion. Let PnP_{n} be the probability that a streak of length at least rr appears in nn independent Bernoulli(pp) games. Beating DiMaggio’s 5656-game streak means a run of length r=57r = 57, since “beating” requires strictly longer than 5656.

A streak of length r\ge r in n+1n + 1 games either was present in the first nn games or is newly completed at game n+1n + 1. “Newly completed at game n+1n + 1” means that no streak of length rr occurred up through game nrn - r, then game nr+1n - r + 1 was a non-hit (or no game; treat as a virtual miss), then games nr+2,,n+1n - r + 2, \ldots, n + 1 were all hits. The recursion (Feller; Branicky) is Pn+1  =  Pn+(1Pnr)(1p)pr,P_{n + 1} \;=\; P_{n} + (1 - P_{n - r})\,(1 - p)\, p^{r}, with the base Pn=0P_{n} = 0 for n<rn < r and Pr=prP_{r} = p^{r}.

Career totals. A brother plays n=20160=3,200n = 20 \cdot 160 = 3{,}200 games. The cousin plays n=10160=1,600n = 10 \cdot 160 = 1{,}600. Evaluating the recursion gives: A=.200:P    1.16×1010,A=.250:P    3.82×107,A=.300:P    1.21×104,A=.350:P    7.60×103,A=.400:P    0.1393,A=.500 (cousin):P    0.9338.\begin{aligned} A = .200:&\quad P \;\approx\; 1.16 \times 10^{-10}, \\ A = .250:&\quad P \;\approx\; 3.82 \times 10^{-7}, \\ A = .300:&\quad P \;\approx\; 1.21 \times 10^{-4}, \\ A = .350:&\quad P \;\approx\; 7.60 \times 10^{-3}, \\ A = .400:&\quad P \;\approx\; 0.1393, \\ A = .500 \text{ (cousin)}:&\quad P \;\approx\; 0.9338. \end{aligned}

The headline lesson is that an extra .050.050 of batting average roughly multiplies the streak probability by 1818 near A=.350A = .350 and by 5050 near A=.300A = .300. The streak is a runaway function of the per-game hit probability pp, which itself is steep in AA; that is why DiMaggio’s .357.357 batting average in 19411941 matters so much for the streak’s plausibility, and why a .500.500 cousin almost always beats the record even with his career cut in half.

The computation

Re-encode the actual season: simulate each game as a four-at-bat trial under the brother’s average, record the longest streak across his career, and count what fraction of careers exceed 5656. Cross-check against the closed-form recursion. The Monte Carlo trial must match the recursion to three decimals at the high end where the probability is large enough to estimate by sampling.

  1. For each batting average and career length, run 200,000200{,}000 careers.

  2. In each career, simulate game outcomes by drawing the at-bats and check whether any window of 5757 consecutive games is all hits.

  3. Compare the empirical frequency to the recursion’s value.

import random

def streak_prob_recursion(p, r, n):
    P = [0.0] * (n + 2)
    pr = p ** r
    q = 1 - p
    for k in range(r, n + 1):
        if k == r:
            P[k] = pr
        else:
            prev_minus_r = P[k - r - 1] if k - r - 1 >= 0 else 0.0
            P[k] = P[k - 1] + (1 - prev_minus_r) * q * pr
    return P[n]

def simulate(avg, games, r, trials):
    hits_per_game_prob = 1 - (1 - avg) ** 4
    success = 0
    for _ in range(trials):
        run = 0
        beat = False
        for _ in range(games):
            if random.random() < hits_per_game_prob:
                run += 1
                if run >= r:
                    beat = True
                    break
            else:
                run = 0
        if beat:
            success += 1
    return success / trials

random.seed(2019)
r = 57
for avg, games in [(0.200, 3200), (0.250, 3200), (0.300, 3200),
                   (0.350, 3200), (0.400, 3200), (0.500, 1600)]:
    p = 1 - (1 - avg) ** 4
    P_exact = streak_prob_recursion(p, r, games)
    P_emp = simulate(avg, games, r, 200_000) if P_exact > 0.001 else float("nan")
    print(f"avg {avg:.3f}: per-game p={p:.6f}  "
          f"P(streak>={r}) exact={P_exact:.6e}  emp={P_emp:.4f}")

The script prints the recursion values matching the boxed percentages to four decimals, and the Monte Carlo estimates for the two high-probability cases (A=.400A = .400 and the cousin) agree to within sampling error, ±0.002\pm 0.002.