Chapter 179

Shut the Box

Riddler Classic

Shut the Box has $9$ tiles, numbered $1$ through $9$ , all starting open. On each turn you roll two dice and shut any combination of open tiles whose values sum to the dice total. Once $7, 8, 9$ are all shut you may choose to roll one or two dice; otherwise you must roll both. The game ends when you shut every tile (you win) or you cannot shut a subset summing to the roll (you lose). What is your probability of winning under perfect play?

The Riddler, FiveThirtyEight, April 13, 2018(original post)

The Riddler Express that week asked which two-card Texas Hold’em starting hand most often makes the nuts. The published answer is $A\text{-}10$ suited (with the argument that the nut flush dominates the count). A careful Monte-Carlo over full $5$ -card boards in fact ranks $J\text{-}10$ suited above $A\text{-}10$ suited as the most frequent nut-making starting hand, with the difference driven by straight-and-flush coverage rather than nut-flush coverage alone. Because we have not resolved this discrepancy with the official to certainty, the Express is deferred.

Solution

The state space. The position of the game is the set $S \subseteq \{1, 2, \ldots, 9\}$ of still-open tiles. There are $2^9 = 512$ states; the empty set is the winning terminal. Let $V(S)$ denote the probability of eventually winning from state $S$ under optimal play.

The Bellman equation. From state $S$ , the player chooses how many dice to roll (constrained: two dice unless $\{7, 8, 9\} \cap S = \varnothing$ , in which case either one or two), then observes the total $T$ , then picks a subset $A \subseteq S$ with $\sum A = T$ to shut. If no such $A$ exists for a given $T$ , that roll loses.

For two dice the total $T$ has the standard distribution $\Pr(T = t) = n_t / 36$ with $n = (1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1)$ for $t = 2, 3, \ldots, 12$ . For one die, $\Pr(T = t) = 1/6$ for $t = 1, \ldots, 6$ . Write the per-roll value at $S$ for $d \in \{1, 2\}$ dice as $R_d(S) \;=\; \sum_{t} \Pr_d(T = t)\,\Bigl( \max_{A \subseteq S,\ \sum A = t} V(S \setminus A) \Bigr),$ with the convention that an empty maximum (no shutting subset exists) contributes $0$ . Then $V(S) \;=\; \begin{cases} 1 & \text{if } S = \varnothing, \\ R_2(S) & \text{if } \{7, 8, 9\} \cap S \ne \varnothing, \\ \max\bigl(R_1(S), R_2(S)\bigr) & \text{otherwise.} \end{cases}$ This is a straight dynamic programme on $2^9 = 512$ states. Each state’s update requires enumerating shutting subsets summing to each total, which is fast.

The headline. Solving the DP gives $\boxed{\,V(\{1, 2, \ldots, 9\}) \;\approx\; 0.0976 \;\approx\; 9.76\%.\,}$ The starting-state value is the win probability under perfect play. The official quotes the same $\approx 9.8\%$ .

Where the choice of $1$ vs $2$ dice matters. Among the states where the choice is available (those with $\{7,8,9\}$ all shut), the optimal choice depends on $S$ . With many small tiles still open, two dice (which average $7$ ) is dangerous; one die (averaging $3.5$ ) gives a better chance of hitting an achievable total. With only large tiles left, two dice gives the only reach to the needed totals.

The computation

Directly solve the DP described above.

Enumerate every state $S \in 2^{\{1,\ldots,9\}}$ in order of increasing $|S|$ so that $V(S \setminus A)$ is known when computing $V(S)$ .
For each $S$ , compute the candidate per-roll values $R_1(S)$ (when allowed) and $R_2(S)$ , taking the max over shutting subsets at each total.
Return $V(\{1, 2, \ldots, 9\})$ .

from functools import lru_cache

ALL = frozenset(range(1, 10))

def subsets_summing_to(open_set, target):
    open_list = sorted(open_set)
    results = []
    def rec(i, cur, s):
        if s == target:
            results.append(frozenset(cur)); return
        if s > target or i == len(open_list): return
        rec(i + 1, cur, s)
        rec(i + 1, cur + [open_list[i]], s + open_list[i])
    rec(0, [], 0)
    return results

DICE2 = {t: 0 for t in range(2, 13)}
for d1 in range(1, 7):
    for d2 in range(1, 7):
        DICE2[d1 + d2] += 1

@lru_cache(maxsize=None)
def V(S):
    if not S: return 1.0
    can_one = all(x not in S for x in (7, 8, 9))
    def per_roll(ndice):
        if ndice == 1:
            tot = 0.0
            for r in range(1, 7):
                subs = subsets_summing_to(S, r)
                if subs:
                    tot += max(V(S - s) for s in subs)
            return tot / 6
        tot = 0.0
        for t, n in DICE2.items():
            subs = subsets_summing_to(S, t)
            if subs:
                tot += n * max(V(S - s) for s in subs)
        return tot / 36
    return max(per_roll(2), per_roll(1)) if can_one else per_roll(2)

print(f"V(all open) = {V(ALL):.6f}")

The script prints $V(\{1,\ldots,9\}) \approx 0.097614$ , matching the boxed $9.76\%$ .