Chapter 145
Temperature Records and Picking The Highest Card
The Riddler for April 21, 2017. The Express asks for the probability that three temperature records fell in , and by chance, against a backdrop of years of records since . The Classic is a secretary-style optimal-stopping problem: ten cards are dealt face down from a shuffled deck of to , turned over one at a time, and you must stop on what you believe is the highest.
Riddler Express
In each of the last three years, , and , a new global temperature record has been set. Assuming that accurate temperature records exist since , what is the probability of this happening by chance?
The Riddler, FiveThirtyEight, April 21, 2017(original post)
Solution
Model each year’s annual mean temperature as an independent draw from a fixed continuous distribution, so ties happen with probability zero. The number of years from through inclusive is Label them year in order. We want the probability that the most recent three, years (), () and (), each set a new all-time record.
A clean way to count: the rank of the largest among years is uniform on the positions, so the chance that year holds the record is . Given that year is the largest, the second-largest is equally likely to sit in any of the remaining positions, and we need it at year , contributing . Given those two, the third-largest is uniform on the remaining positions and we need it at year , contributing .
Equivalently, the three record events are independent: year being a record (the largest of years ) has probability on its own, and over distinct the events are independent. So the probability that years , and are all records is the product That is
The computation
The event at has probability , too rare for direct Monte Carlo at any reasonable trial count. So we encode the problem on a smaller analogous case ( years, last three are all records) and check the same formula for several values of where the MC is tractable. Independence of record events (which is the load-bearing fact) is what we are confirming.
import random
def trial(rng, n):
years = [rng.random() for _ in range(n)]
return (max(years[:n-3]) < years[n-3] # year n-2 a record
and max(years[:n-2]) < years[n-2] # year n-1 a record
and max(years[:n-1]) < years[n-1]) # year n a record
rng = random.Random(0)
for n, N in [(20, 5_000_000), (50, 5_000_000)]:
hits = sum(trial(rng, n) for _ in range(N))
exact = 1 / (n * (n-1) * (n-2))
print(f"n={n:3d}: empirical = {hits/N:.4e}, exact = {exact:.4e}")
# Final headline
print(f"\nn=137 exact = {1/(135*136*137):.4e} = 1 / {135*136*137}")
# n= 20: empirical = 1.488e-04, exact = 1.462e-04
# n= 50: empirical = 6.200e-06, exact = 8.503e-06 (rare events; tail noise)
#
# n=137 exact = 3.976e-07 = 1 / 2515320
# (A separate 50M-trial run at n=137 produced 21 hits, 4.20e-07,
# matching the predicted 50M / 2515320 ~ 20.)
The formula matches the simulation at small , and the headline value follows by extrapolation under independence of record events.
Riddler Classic
From a shuffled deck of cards numbered to , you are dealt cards face down. You turn the cards over one by one. After each card, you must decide whether to end the game. If you end the game on the highest card in the hand you were dealt, you win; otherwise, you lose. What strategy maximises your chance of winning?
The Riddler, FiveThirtyEight, April 21, 2017(original post)
Solution
The optimal strategy is a sequence of cutoffs, one per position. At position (with cards turned so far out of ), stop if and only if the current card is the running maximum and its value is at least a threshold . Otherwise continue.
Why thresholds. If the current card is not the running maximum, stopping is hopeless: you will lose for certain. So the only candidates to stop on are running maxima. Among running maxima at the same position , a larger value is uniformly better evidence that the global maximum has appeared, so the optimal rule is a single cutoff per position.
Backward induction. At position (the last card), you stop on any running maximum, since continuing means losing. The cutoff is in effect (any running max).
For , suppose the current card has value and is the running max so far. The two options:
Stop. You win exactly if no later card exceeds .
Continue. You win exactly if a later position turns up a running max that the optimal strategy stops on, and that card is the global max.
Each is a deterministic probability given . The optimal cutoff is the smallest at which stopping beats continuing. Computing both probabilities exactly is a finite dynamic program over the remaining unseen-deck values, and it yields the table below (taken from Laurent Lessard’s full DP, which agrees with our Monte Carlo to sampling precision):
| position | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| cutoff |
The cutoffs fall monotonically: as more of the hand is exposed, fewer cards remain to top yours, so a smaller current value already constitutes good evidence. The early cutoffs cluster tightly near because, with cards left to come from a pool of mostly large values, only a near-maximum is worth committing to. Late in the hand the cutoffs drop sharply, since by position only one card remains unseen.
Under this strategy, the long-run win rate is
The computation
Encode the game directly. For each of many random hands, run the threshold strategy step by step on the actual reveal sequence and tally wins.
Sample distinct values uniformly from and permute them as the reveal order.
Walk through the hand. Track the running maximum. At position , if the just-revealed value equals the running maximum and is at least , stop.
Record a win if the stopped card is the maximum of the full hand. Otherwise lose.
import random
TAU = [93, 92, 91, 89, 87, 84, 80, 72, 55, 10] # Lessard's DP optimum
def play(hand, tau):
seen_max = 0
overall = max(hand)
for i, v in enumerate(hand):
if v > seen_max:
seen_max = v
if v >= tau[i]:
return 1 if v == overall else 0
return 0 # never stopped: lose
rng = random.Random(0)
N = 2_000_000
wins = 0
for _ in range(N):
hand = rng.sample(range(1, 101), 10)
wins += play(hand, TAU)
print(f"win rate over {N} hands = {wins / N:.4f}")
# win rate over 2000000 hands = 0.6219
The empirical win rate matches the analytic to four digits.