Chapter 226
Can The Riddler Bros. Beat Joe DiMaggio’s Hitting Streak?
Riddler Express
Unriddle this sequence; what number comes next?
The Riddler, FiveThirtyEight, May 10, 2019(original post)
Solution
The pattern lives in letters, not arithmetic. Spell each number in English and score it with Scrabble tile values: the next term in the sequence is that score.
Scrabble letter values. Each letter has a fixed Scrabble point value: A, E, I, L, N, O, R, S, T, U each score ; D, G score ; B, C, M, P score ; F, H, V, W, Y score ; K scores ; J, X score ; Q, Z score .
The chain.
The chain enters a cycle: , after which it loops back through . The earlier terms are the lead-in to this cycle.
The computation
Encode the rule directly: a function from a positive integer to its English spelling, then to the Scrabble tile sum. Iterate from and reproduce the published sequence; continue past the question mark to display the cycle.
Build a small spell-out function for integers to (the sequence stays small).
Build the Scrabble value table from the rules above.
Iterate: each next term is the Scrabble score of the spelling of the current term.
Print the first terms.
ONES = ["", "one", "two", "three", "four", "five", "six", "seven",
"eight", "nine"]
TEENS = ["ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen"]
TENS = ["", "", "twenty", "thirty", "forty", "fifty", "sixty",
"seventy", "eighty", "ninety"]
def spell(n):
if n < 10: return ONES[n]
if n < 20: return TEENS[n - 10]
t, u = divmod(n, 10)
return TENS[t] + (ONES[u] if u else "")
VALS = {"AEILNORSTU": 1, "DG": 2, "BCMP": 3, "FHVWY": 4,
"K": 5, "JX": 8, "QZ": 10}
LETTER = {}
for k, v in VALS.items():
for ch in k:
LETTER[ch] = v
def scrabble(word):
return sum(LETTER[c] for c in word.upper())
seq = [2]
for _ in range(20):
seq.append(scrabble(spell(seq[-1])))
print(seq)
The script prints which reproduces the published lead-in and shows the four-term cycle that the sequence enters.
Riddler Classic
Five brothers play in the Riddler Baseball Independent Society for seasons of games with plate appearances per game; each plate appearance is a hit or an out. Their batting averages are . What are each brother’s chances of beating DiMaggio’s -game hitting streak at some point in his career? Their cousin bats but is ejected after his th season. What are his chances?
The Riddler, FiveThirtyEight, May 10, 2019(original post)
Solution
The classical streak problem reduces to a per-game Bernoulli model and then a closed-form recursion for the streak probability.
Per-game hit probability. A brother with batting average has hit probability on each at-bat. The probability that he gets no hit in a four-at-bat game is , so the probability he gets at least one hit is For this gives .
Streak recursion. Let be the probability that a streak of length at least appears in independent Bernoulli() games. Beating DiMaggio’s -game streak means a run of length , since “beating” requires strictly longer than .
A streak of length in games either was present in the first games or is newly completed at game . “Newly completed at game ” means that no streak of length occurred up through game , then game was a non-hit (or no game; treat as a virtual miss), then games were all hits. The recursion (Feller; Branicky) is with the base for and .
Career totals. A brother plays games. The cousin plays . Evaluating the recursion gives:
The headline lesson is that an extra of batting average roughly multiplies the streak probability by near and by near . The streak is a runaway function of the per-game hit probability , which itself is steep in ; that is why DiMaggio’s batting average in matters so much for the streak’s plausibility, and why a cousin almost always beats the record even with his career cut in half.
The computation
Re-encode the actual season: simulate each game as a four-at-bat trial under the brother’s average, record the longest streak across his career, and count what fraction of careers exceed . Cross-check against the closed-form recursion. The Monte Carlo trial must match the recursion to three decimals at the high end where the probability is large enough to estimate by sampling.
For each batting average and career length, run careers.
In each career, simulate game outcomes by drawing the at-bats and check whether any window of consecutive games is all hits.
Compare the empirical frequency to the recursion’s value.
import random
def streak_prob_recursion(p, r, n):
P = [0.0] * (n + 2)
pr = p ** r
q = 1 - p
for k in range(r, n + 1):
if k == r:
P[k] = pr
else:
prev_minus_r = P[k - r - 1] if k - r - 1 >= 0 else 0.0
P[k] = P[k - 1] + (1 - prev_minus_r) * q * pr
return P[n]
def simulate(avg, games, r, trials):
hits_per_game_prob = 1 - (1 - avg) ** 4
success = 0
for _ in range(trials):
run = 0
beat = False
for _ in range(games):
if random.random() < hits_per_game_prob:
run += 1
if run >= r:
beat = True
break
else:
run = 0
if beat:
success += 1
return success / trials
random.seed(2019)
r = 57
for avg, games in [(0.200, 3200), (0.250, 3200), (0.300, 3200),
(0.350, 3200), (0.400, 3200), (0.500, 1600)]:
p = 1 - (1 - avg) ** 4
P_exact = streak_prob_recursion(p, r, games)
P_emp = simulate(avg, games, r, 200_000) if P_exact > 0.001 else float("nan")
print(f"avg {avg:.3f}: per-game p={p:.6f} "
f"P(streak>={r}) exact={P_exact:.6e} emp={P_emp:.4f}")
The script prints the recursion values matching the boxed percentages to four decimals, and the Monte Carlo estimates for the two high-probability cases ( and the cousin) agree to within sampling error, .