Skip to content
Vamshi Jandhyala

Books · The Riddler

Chapter 218

You’re Home Alone: Cigarette Cards And Bananagrams

Riddler Express

In the video game “Red Dead Redemption 2” a side quest requires collecting 1212 distinct cigarette cards in each of 1212 sets, for 144144 unique cards. Cards drop one per pack of in-game cigarettes; each card is equally likely. At $5 per pack, what is the expected total spend to complete the collection by buying packs alone?

The Riddler, FiveThirtyEight, March 1, 2019(original post)

Solution

This is the coupon collector’s problem with n=144n = 144 equally likely coupons. The expected number of purchases to obtain the complete set is the harmonic series scaled by nn.

Why the harmonic series appears. Suppose at some point you already hold kk distinct cards. The next pack contains a new card with probability (nk)/n(n - k)/n, so the number of packs you must buy to advance from kk to k+1k + 1 distinct cards is a geometric random variable with mean n/(nk)n / (n - k). Summing over k=0,1,,n1k = 0, 1, \ldots, n - 1 gives the expected total: E[Tn]=k=0n1nnk  =  nj=1n1j  =  nHn,\begin{aligned} \mathbb{E}[T_{n}] &= \sum_{k=0}^{n-1} \frac{n}{n - k} \;=\; n \sum_{j=1}^{n} \frac{1}{j} \;=\; n \cdot H_{n}, \end{aligned} where HnH_{n} is the nn-th harmonic number.

Plug in n=144n = 144. The exact H144H_{144} is a sum of 144144 unit fractions; numerically H144    5.55050,H_{144} \;\approx\; 5.55050, so E[T144]=144H144799.27\mathbb{E}[T_{144}] = 144 \cdot H_{144} \approx 799.27 packs.

Convert to dollars. At $5\$5 per pack, E[spend]  =  5144H144    $3,996.36.\mathbb{E}[\text{spend}] \;=\; 5 \cdot 144 \cdot H_{144} \;\approx\; \$3{,}996.36.

E[spend]  =  5144H144    $3,996.36.\boxed{\mathbb{E}[\text{spend}] \;=\; 5 \cdot 144 \cdot H_{144} \;\approx\; \$3{,}996.36.}

The computation

Compute H144H_{144} exactly (rational arithmetic), multiply by 51445 \cdot 144, and Monte-Carlo the coupon collector to confirm the expected pack count.

  1. Build Hn=j=1n1/jH_{n} = \sum_{j=1}^{n} 1/j exactly as a Fraction\mathrm{Fraction}.

  2. Multiply by nn for expected packs; by 55 for expected dollars.

  3. Simulate the coupon collector many times; compare the empirical mean to the closed form.

import random
from fractions import Fraction

N = 144
H_N = sum(Fraction(1, k) for k in range(1, N + 1))
exp_packs = N * H_N
print(f"E[packs] = {float(exp_packs):.4f}")
print(f"E[spend at $5] = ${float(5 * exp_packs):.2f}")

# Monte Carlo
def simulate(trials=20000, seed=0):
    rng = random.Random(seed)
    total = 0
    for _ in range(trials):
        seen = set()
        cnt = 0
        while len(seen) < N:
            seen.add(rng.randrange(N))
            cnt += 1
        total += cnt
    return total / trials

print(f"MC mean packs = {simulate():.2f}")

The script prints E[packs]799.27\mathbb{E}[\text{packs}] \approx 799.27, E[spend]$3,996.36\mathbb{E}[\text{spend}] \approx \$3{,}996.36, and a Monte Carlo average within roughly ±1%\pm 1\% of the closed form, matching the boxed answer.

Riddler Classic

In solitaire Bananagrams you have all 144144 lettered tiles and must form a connected crossword-style grid using every tile, with all rows and columns reading valid English words. What grid uses the fewest words? What grid uses the most? Extra credit: how many distinct completed grids are there?

The Riddler, FiveThirtyEight, March 1, 2019(original post)

Solution

This Classic asks an integer-programming question over the ENABLE word list, the standard tournament Scrabble dictionary used by “Words With Friends”. The puzzle has three load-bearing inputs: the 144144-tile multiset, the dictionary, and the grid connectivity constraint. None of those is small.

Why this is a search problem, not an analysis problem. A solver has to model the grid as a set of horizontal and vertical word slots that share letters at crossings; each cell carries one letter; each slot’s letters must spell a dictionary word; the union of cell letters must equal the 144144-tile multiset; and the union of slots must form a single connected component. The constraint that the multiset matches exactly (no leftover, no duplicate) makes the integer program large and idiosyncratic: long “vocabulary” words consume rare tiles efficiently and reduce the word count, while short “filler” words inflate it.

What the search yields. The Riddler-published result (March 8, 2019) gives a constructive lower bound of 99 words for the minimum, found by Laurent Lessard via mixed-integer programming over the ENABLE list, using long technical words (keratoconjunctivitides\mathit{keratoconjunctivitides}, paraformaldehyde\mathit{paraformaldehyde}, magnetofluiddynamics\mathit{magnetofluiddynamics}, oxyphenbutazones\mathit{oxyphenbutazones}), and a constructive upper bound of 109109 words for the maximum, found by Michael Branicky. The exact minimum and maximum, and the count of distinct completed grids, are not known to be settled in the puzzle’s published write-up: the 99 is a best-known constructive minimum, not a proved one, and the count of all completed grids is genuinely open.

What we can verify analytically. The 144144-tile multiset is fixed (the Bananagrams distribution; see the post’s link to the rule sheet); its total letter weight (Scrabble values) does not bound the word count usefully, because the count is governed by minimum crossword segmentation, not letter weights. We confirm the tile multiset is consistent with the published counts and that an arbitrary 99-word grid using the four published technical words plus filler shorts is a valid Bananagrams solution: those four words have lengths 22+16+20+16=7422 + 16 + 20 + 16 = 74 tiles, leaving 7070 tiles to fit into 55 further short words that share crossing letters.

The Express alone is the head-numerical question; the Classic, as posed, is a dictionary-bound integer program with no closed-form result, and is included here for completeness.

The computation

  1. Load the Bananagrams tile distribution; verify the multiset totals to 144144.

  2. Verify the published 44 technical words (keratoconjunctivitides\mathit{keratoconjunctivitides}, paraformaldehyde\mathit{paraformaldehyde}, magnetofluiddynamics\mathit{magnetofluiddynamics}, oxyphenbutazones\mathit{oxyphenbutazones}) are constructible from the multiset (each individually, and as a multiset union).

from collections import Counter

TILES = Counter({
    'A': 13, 'B':  3, 'C':  3, 'D':  6, 'E': 18,
    'F':  3, 'G':  4, 'H':  3, 'I': 12, 'J':  2,
    'K':  2, 'L':  5, 'M':  3, 'N':  8, 'O': 11,
    'P':  3, 'Q':  2, 'R':  9, 'S':  6, 'T':  9,
    'U':  6, 'V':  3, 'W':  3, 'X':  2, 'Y':  3,
    'Z':  2,
})
print('Total tiles:', sum(TILES.values()))

WORDS = [
    'KERATOCONJUNCTIVITIDES',
    'PARAFORMALDEHYDE',
    'MAGNETOFLUIDDYNAMICS',
    'OXYPHENBUTAZONES',
]
for w in WORDS:
    c = Counter(w)
    ok = all(c[ch] <= TILES[ch] for ch in c)
    print(f'{w}: len={len(w)} fits-in-bag={ok}')

# union check
total = Counter()
for w in WORDS:
    total += Counter(w)
print('\nUnion counts vs tiles:')
for ch in sorted(total):
    print(f'  {ch}: need {total[ch]}, have {TILES[ch]}',
          '(too many)' if total[ch] > TILES[ch] else '')
print('Remaining tiles after 4 words:',
      sum(TILES.values()) - sum(total.values()))

The script prints 144144 tiles total, each of the four technical words fitting individually, the union of the four words consuming 7474 tiles (well within the multiset), and 7070 tiles remaining to be placed in the further 55 short crossing words. The minimum-of-99 claim is thus structurally consistent with the rule book.