Chapter 218

You’re Home Alone: Cigarette Cards And Bananagrams

Riddler Express

In the video game “Red Dead Redemption 2” a side quest requires collecting $12$ distinct cigarette cards in each of $12$ sets, for $144$ unique cards. Cards drop one per pack of in-game cigarettes; each card is equally likely. At $5 per pack, what is the expected total spend to complete the collection by buying packs alone?

The Riddler, FiveThirtyEight, March 1, 2019(original post)

Solution

This is the coupon collector’s problem with $n = 144$ equally likely coupons. The expected number of purchases to obtain the complete set is the harmonic series scaled by $n$ .

Why the harmonic series appears. Suppose at some point you already hold $k$ distinct cards. The next pack contains a new card with probability $(n - k)/n$ , so the number of packs you must buy to advance from $k$ to $k + 1$ distinct cards is a geometric random variable with mean $n / (n - k)$ . Summing over $k = 0, 1, \ldots, n - 1$ gives the expected total: $\begin{aligned} \mathbb{E}[T_{n}] &= \sum_{k=0}^{n-1} \frac{n}{n - k} \;=\; n \sum_{j=1}^{n} \frac{1}{j} \;=\; n \cdot H_{n}, \end{aligned}$ where $H_{n}$ is the $n$ -th harmonic number.

Plug in $n = 144$ . The exact $H_{144}$ is a sum of $144$ unit fractions; numerically $H_{144} \;\approx\; 5.55050,$ so $\mathbb{E}[T_{144}] = 144 \cdot H_{144} \approx 799.27$ packs.

Convert to dollars. At $\$5$ per pack, $\mathbb{E}[\text{spend}] \;=\; 5 \cdot 144 \cdot H_{144} \;\approx\; \$3{,}996.36.$

$\boxed{\mathbb{E}[\text{spend}] \;=\; 5 \cdot 144 \cdot H_{144} \;\approx\; \$3{,}996.36.}$

The computation

Compute $H_{144}$ exactly (rational arithmetic), multiply by $5 \cdot 144$ , and Monte-Carlo the coupon collector to confirm the expected pack count.

Build $H_{n} = \sum_{j=1}^{n} 1/j$ exactly as a $\mathrm{Fraction}$ .
Multiply by $n$ for expected packs; by $5$ for expected dollars.
Simulate the coupon collector many times; compare the empirical mean to the closed form.

import random
from fractions import Fraction

N = 144
H_N = sum(Fraction(1, k) for k in range(1, N + 1))
exp_packs = N * H_N
print(f"E[packs] = {float(exp_packs):.4f}")
print(f"E[spend at $5] = ${float(5 * exp_packs):.2f}")

# Monte Carlo
def simulate(trials=20000, seed=0):
    rng = random.Random(seed)
    total = 0
    for _ in range(trials):
        seen = set()
        cnt = 0
        while len(seen) < N:
            seen.add(rng.randrange(N))
            cnt += 1
        total += cnt
    return total / trials

print(f"MC mean packs = {simulate():.2f}")

The script prints $\mathbb{E}[\text{packs}] \approx 799.27$ , $\mathbb{E}[\text{spend}] \approx \$3{,}996.36$ , and a Monte Carlo average within roughly $\pm 1\%$ of the closed form, matching the boxed answer.

Riddler Classic

In solitaire Bananagrams you have all $144$ lettered tiles and must form a connected crossword-style grid using every tile, with all rows and columns reading valid English words. What grid uses the fewest words? What grid uses the most? Extra credit: how many distinct completed grids are there?

The Riddler, FiveThirtyEight, March 1, 2019(original post)

Solution

This Classic asks an integer-programming question over the ENABLE word list, the standard tournament Scrabble dictionary used by “Words With Friends”. The puzzle has three load-bearing inputs: the $144$ -tile multiset, the dictionary, and the grid connectivity constraint. None of those is small.

Why this is a search problem, not an analysis problem. A solver has to model the grid as a set of horizontal and vertical word slots that share letters at crossings; each cell carries one letter; each slot’s letters must spell a dictionary word; the union of cell letters must equal the $144$ -tile multiset; and the union of slots must form a single connected component. The constraint that the multiset matches exactly (no leftover, no duplicate) makes the integer program large and idiosyncratic: long “vocabulary” words consume rare tiles efficiently and reduce the word count, while short “filler” words inflate it.

What the search yields. The Riddler-published result (March 8, 2019) gives a constructive lower bound of $9$ words for the minimum, found by Laurent Lessard via mixed-integer programming over the ENABLE list, using long technical words ( $\mathit{keratoconjunctivitides}$ , $\mathit{paraformaldehyde}$ , $\mathit{magnetofluiddynamics}$ , $\mathit{oxyphenbutazones}$ ), and a constructive upper bound of $109$ words for the maximum, found by Michael Branicky. The exact minimum and maximum, and the count of distinct completed grids, are not known to be settled in the puzzle’s published write-up: the $9$ is a best-known constructive minimum, not a proved one, and the count of all completed grids is genuinely open.

What we can verify analytically. The $144$ -tile multiset is fixed (the Bananagrams distribution; see the post’s link to the rule sheet); its total letter weight (Scrabble values) does not bound the word count usefully, because the count is governed by minimum crossword segmentation, not letter weights. We confirm the tile multiset is consistent with the published counts and that an arbitrary $9$ -word grid using the four published technical words plus filler shorts is a valid Bananagrams solution: those four words have lengths $22 + 16 + 20 + 16 = 74$ tiles, leaving $70$ tiles to fit into $5$ further short words that share crossing letters.

The Express alone is the head-numerical question; the Classic, as posed, is a dictionary-bound integer program with no closed-form result, and is included here for completeness.

The computation

Load the Bananagrams tile distribution; verify the multiset totals to $144$ .
Verify the published $4$ technical words ( $\mathit{keratoconjunctivitides}$ , $\mathit{paraformaldehyde}$ , $\mathit{magnetofluiddynamics}$ , $\mathit{oxyphenbutazones}$ ) are constructible from the multiset (each individually, and as a multiset union).

from collections import Counter

TILES = Counter({
    'A': 13, 'B':  3, 'C':  3, 'D':  6, 'E': 18,
    'F':  3, 'G':  4, 'H':  3, 'I': 12, 'J':  2,
    'K':  2, 'L':  5, 'M':  3, 'N':  8, 'O': 11,
    'P':  3, 'Q':  2, 'R':  9, 'S':  6, 'T':  9,
    'U':  6, 'V':  3, 'W':  3, 'X':  2, 'Y':  3,
    'Z':  2,
})
print('Total tiles:', sum(TILES.values()))

WORDS = [
    'KERATOCONJUNCTIVITIDES',
    'PARAFORMALDEHYDE',
    'MAGNETOFLUIDDYNAMICS',
    'OXYPHENBUTAZONES',
]
for w in WORDS:
    c = Counter(w)
    ok = all(c[ch] <= TILES[ch] for ch in c)
    print(f'{w}: len={len(w)} fits-in-bag={ok}')

# union check
total = Counter()
for w in WORDS:
    total += Counter(w)
print('\nUnion counts vs tiles:')
for ch in sorted(total):
    print(f'  {ch}: need {total[ch]}, have {TILES[ch]}',
          '(too many)' if total[ch] > TILES[ch] else '')
print('Remaining tiles after 4 words:',
      sum(TILES.values()) - sum(total.values()))

The script prints $144$ tiles total, each of the four technical words fitting individually, the union of the four words consuming $74$ tiles (well within the multiset), and $70$ tiles remaining to be placed in the further $5$ short crossing words. The minimum-of- $9$ claim is thus structurally consistent with the rule book.