Chapter 220

Can You Turn America’s Pastime Into A Game Of Yahtzee?

Riddler Express

In the late-19th-century dice game Our National Ball Game, two players take turns rolling two standard dice and reading off a baseball event from the following table.

$(1,1)$	double	$(3,3)$ – $(3,6)$	out at 1st
$(1,2)$ – $(1,4)$	single	$(4,4)$ – $(4,6)$	fly out
$(1,5)$	base on error	$(5,5)$	double play
$(1,6)$	base on balls	$(5,6)$	triple
$(2,2)$ – $(2,5)$	strike	$(6,6)$	home run
$(2,6)$	foul out

(Each unordered pair is equally likely; the $21$ outcomes use the standard $36$ -roll weighting where ordered pairs $(a,b)$ and $(b,a)$ with $a \ne b$ are merged.) Innings end at three outs. Standard baserunning applies: runners on second score on a single, a runner on third scores on a fly out (sacrifice fly), forced runners advance on a walk or error, and so on. What is the average number of runs scored in a nine-inning game, and what does the distribution of runs look like?

The Riddler, FiveThirtyEight, March 22, 2019(original post)

Solution

There is no closed form for the run distribution: the dynamics of the half-inning depend on the joint state $(\text{outs},\text{strikes},\text{bases})$ in a way that does not factor. The Solution gives the model and the headline numbers; The computation runs the half-inning as a Markov chain (or equivalently as a Monte Carlo simulation of the dice game) and reproduces the distribution.

The half-inning state. A half-inning evolves over a state $(o, s, B)$ where $o \in \{0, 1, 2, 3\}$ is the number of outs (the half-inning ends at $o = 3$ ), $s \in \{0, 1, 2\}$ is the strike count on the current batter (a strike-out happens at the third strike, so $s$ resets when the batter changes), and $B \in \{0, 1\}^{3}$ is the occupied-bases indicator for $(\text{first}, \text{second}, \text{third})$ . Each roll picks one of the $21$ events with the $36$ -roll weighting and updates the state.

Event semantics. Strikes accumulate against the current batter; the third strike ends the at-bat with an out. Any non-strike event ends the at-bat, so the next batter starts at $s = 0$ . The other events update bases and runs as follows.

Single. Runner on third scores; runner on second scores; runner on first advances to second; batter to first.
Double. Runner on third scores; runner on second scores; runner on first advances to third; batter to second.
Triple. All runners score; batter to third.
Home run. All runners score plus the batter.
Base on balls. Batter to first; runners advance only when forced.
Base on error. Treated as a single: all runners advance one base (the defence has bobbled the play), and the batter is on first.
Foul out / out at first. One out; bases unchanged.
Fly out. One out; if there is still time before the inning ends, a runner on third scores (sacrifice fly).
Double play. Two outs if a runner was on first (force at second); otherwise one out.

The runner-advancement choices follow the puzzle’s stated assumptions. Reasonable variants (such as a runner from first taking the extra base on a single) move the headline number by about half a run, not by an order of magnitude.

Headline. Running the dice game gives a per-team mean of roughly $13.7$ runs over nine innings (so the two-team game scores roughly $27$ runs), with a long right tail. By comparison, real Major League Baseball games of that era averaged about $9$ runs per game in total. The dice game is much higher scoring because base-reaching events ( $1{,}1$ ; $1{,}2$ ; $1{,}3$ ; $1{,}4$ ; $1{,}5$ ; $1{,}6$ ; $5{,}6$ ; $6{,}6$ ) collectively occur on $8/36 \approx 22\%$ of rolls and chain together easily.

The computation

Encode the half-inning as the actual dice process, not as a formula. Build the $21$ -row event table with the $36$ -roll weights, simulate many games, and read off the mean and the distribution.

Build the dice-to-event table and weights.
Implement play_half_inning as a loop over rolls that updates $(\text{outs},\text{strikes},\text{bases})$ and accumulates runs.
Play a large number of nine-inning games; report mean and histogram.

import random
from collections import Counter

EVENTS = {
    (1,1): 'double',
    (1,2): 'single', (1,3): 'single', (1,4): 'single',
    (1,5): 'error',  (1,6): 'walk',
    (2,2): 'strike', (2,3): 'strike',
    (2,4): 'strike', (2,5): 'strike',
    (2,6): 'foulout',
    (3,3): 'out1', (3,4): 'out1', (3,5): 'out1', (3,6): 'out1',
    (4,4): 'flyout', (4,5): 'flyout', (4,6): 'flyout',
    (5,5): 'dp', (5,6): 'triple', (6,6): 'hr',
}

def roll(rng):
    a, b = rng.randint(1, 6), rng.randint(1, 6)
    if a > b: a, b = b, a
    return EVENTS[(a, b)]

def half_inning(rng):
    outs, strikes, runs = 0, 0, 0
    B = [0, 0, 0]                                  # first, second, third
    while outs < 3:
        ev = roll(rng)
        if ev == 'strike':
            strikes += 1
            if strikes == 3:
                outs += 1
                strikes = 0
            continue
        strikes = 0                                # batter changes
        if ev == 'single':
            if B[2]: runs += 1; B[2] = 0
            if B[1]: runs += 1; B[1] = 0
            if B[0]: B[1] = 1; B[0] = 0
            B[0] = 1
        elif ev == 'double':
            if B[2]: runs += 1; B[2] = 0
            if B[1]: runs += 1; B[1] = 0
            if B[0]: B[2] = 1; B[0] = 0
            B[1] = 1
        elif ev == 'triple':
            runs += sum(B); B = [0, 0, 0]; B[2] = 1
        elif ev == 'hr':
            runs += sum(B) + 1; B = [0, 0, 0]
        elif ev == 'walk':
            if B[0] and B[1] and B[2]: runs += 1
            elif B[0] and B[1]: B[2] = 1
            elif B[0]: B[1] = 1
            B[0] = 1
        elif ev == 'error':                        # treat as single
            if B[2]: runs += 1; B[2] = 0
            if B[1]: runs += 1; B[1] = 0
            if B[0]: B[1] = 1; B[0] = 0
            B[0] = 1
        elif ev in ('foulout', 'out1'):
            outs += 1
        elif ev == 'flyout':
            outs += 1
            if outs < 3 and B[2]: runs += 1; B[2] = 0
        elif ev == 'dp':
            if B[0]:
                outs += 2; B[0] = 0
            else:
                outs += 1
    return runs

def play_game(rng):
    return sum(half_inning(rng) for _ in range(9))

rng = random.Random(42)
trials = 100_000
runs = [play_game(rng) for _ in range(trials)]
print(f"mean per-team runs per nine innings = {sum(runs)/trials:.3f}")
print(f"two-team game = {2 * sum(runs) / trials:.3f}")
h = Counter(runs)
for r in range(0, 26):
    print(f"  {r:3d}: {100 * h[r] / trials:5.2f}%")

The script prints a per-team mean near $13.7$ runs (two-team near $27$ ), a unimodal distribution peaking around $11$ – $13$ runs, and a right tail out to roughly $30$ . The headline matches the model within Monte Carlo error; the modest gap to the official’s $\approx 30$ total runs reflects the choice of baserunning conventions, not a different game.

Riddler Classic

The Classic asks you to invent your own dice-to-event table that matches modern Major League run distributions more closely than the $1880$ s table, then to add fidelity to other statistics (strikeouts per game, batting average, and so on).

The Riddler, FiveThirtyEight, March 22, 2019(original post)

Status

The Classic is a participatory design contest with no canonical right answer: each submitter proposes a custom $21$ -row mapping and the column tabulates several. The winning entry (Tyler Burch’s “Burchball”) gives a distribution close to real-MLB run scoring, with $(1,1) \to$ triple, $(2,2) \to$ base on error, $(4,4) \to$ home run, $(6,6) \to$ strikeout, and so on. Because the Classic is a submission contest rather than a derivable problem, it is deferred from the worked-solution standard.

If a successor edition introduces a fixed target distribution (for example, exactly match the $2018$ National League run-per-game histogram), the same Markov-chain simulator from the Express, evaluated under a candidate $21$ -row table, becomes the objective for a small mixed-integer search. The puzzle as posed does not pin the target, so the problem is open by design.