Skip to content
Vamshi Jandhyala

Books · The Riddler

Chapter 106

Can You Solve The Puzzle Of The Baseball Division Champs?

In a sport where each team plays 162162 games a season, take a division of five teams of exactly equal ability: each has a 50%50\% chance of winning any given game. What is the expected number of wins for the team that finishes first?

The Riddler, FiveThirtyEight (Nick Keenan)(original post)

Solution

A single team of average ability wins half its games, 8181. The division champion is not an average team though, it is the best of five, and being the best of a group pulls the number up. The expected first-place total is about 88.4.\boxed{88.4}.

The honest caveat first: a real schedule couples the teams, since one club’s win is another’s loss, so the five totals are not quite independent. But each team plays only 7676 of its 162162 games inside the division and 8686 outside it, and those out-of-division games swamp the coupling. The clean model that the column settles on treats every game as its own coin flip, making each team’s season an independent Binomial(162,12)\mathrm{Binomial}(162,\tfrac12) and the champion their maximum.

For the maximum of five independent counts, lean on the tail rather than the bell. The champion wins at least ww games unless all five fall short of ww, so with FF the single-team cumulative distribution, E[champ]=w1Pr(champw)=w0(1F(w)5).\mathbb{E}[\text{champ}]=\sum_{w\ge 1}\Pr(\text{champ}\ge w) =\sum_{w\ge 0}\bigl(1-F(w)^5\bigr). Summing this against the exact binomial FF gives 88.3988.39, so the first-place team averages a little over 8888 wins, some seven games above the 8181 an average team manages. (A real big-league schedule nudges this only slightly, to about 88.888.8.)

The computation

Compute the single-team distribution exactly, raise its cumulative function to the fifth power for the champion, and sum the tail. A Monte Carlo season cross-checks the closed form.

from math import comb
import numpy as np

n, teams = 162, 5
cdf = [sum(comb(n, k) for k in range(w + 1)) / 2**n for w in range(n + 1)]
exact = sum(1 - cdf[w] ** teams for w in range(n + 1))   # E[max] via the tail sum
print(round(exact, 4))                                   # 88.3943

rng = np.random.default_rng(0)
sim = rng.binomial(n, 0.5, size=(2_000_000, teams)).max(axis=1).mean()
print(round(sim, 3))                                     # 88.391