Skip to content
Vamshi Jandhyala

Books · The Riddler

Chapter 126

Growing Words and a Point-Rich Boggle Board

The Riddler for October 21, 2016, two word puzzles. Both are computational, and both depend on a dictionary; we use the public-domain ENABLE word list the column itself suggested.

Riddler Express

What is the longest word you can build one letter at a time? Start from a valid two-letter word and, at each step, add a single letter to the front or the back so that the result is again a valid word: a three-letter word, then a four-letter word, and so on. (For example, HE \to THE \to THEM \to THEME \to THEMES.)

The Riddler, FiveThirtyEight, October 21, 2016(original post)

Solution

This is a reachability question on the dictionary, so the clean way to think about it is backwards. Call a word reachable if it can be grown by the rules from some two-letter start. A word of length three or more is reachable exactly when it is valid and at least one of its two “parents”, the word with its first letter removed or the word with its last letter removed, is itself valid and reachable. Every valid two-letter word is reachable by definition. Marking words reachable in order of length, length 2 first, gives every reachable word and its growth path in one sweep, and the answer is the longest reachable word.

Run against the ENABLE list, the longest reachable length is 9 letters,\boxed{9 \text{ letters}}, and there are exactly three nine-letter endpoints, each with a clean growth path:

la, lap, laps, lapse, elapse, relapse, relapser, relapsers
in, pin, ping, aping, raping, craping, scraping, scrapings
at, eat, eath, heath, sheath, sheathe, sheather, sheathers

The result is dictionary-dependent: the North American Scrabble list adds a few more nine-letter chains (through lassi, which ENABLE omits), and the international Collins list stretches to a ten-letter chain, hi, hin, hing, ching, eching, eeching, reeching, breeching, breechings.

The computation

Encode the backward rule directly. Load the word list, then for each length from 3 upward mark a word reachable when one of its two parents is, recording that parent so the chain can be read back.

# enable1.txt: the public-domain ENABLE word list (the list the puzzle suggests)
words = set(w.strip() for w in open('enable1.txt'))
from collections import defaultdict
bylen = defaultdict(list)
for w in words: bylen[len(w)].append(w)

reach, parent, longest_len = {}, {}, 2
for w in bylen[2]: reach[w] = True; parent[w] = None     # two-letter seeds
for L in range(3, max(bylen) + 1):
    grew = False
    for w in bylen[L]:
        if   w[1:]  in reach: reach[w] = True; parent[w] = w[1:];  grew = True
        elif w[:-1] in reach: reach[w] = True; parent[w] = w[:-1]; grew = True
    if grew: longest_len = L

def chain(w):
    out = []
    while w is not None: out.append(w); w = parent[w]
    return ", ".join(out[::-1])

print("longest chain length:", longest_len)
for w in sorted(w for w in bylen[longest_len] if w in reach):
    print(" ", chain(w))
# longest chain length: 9
#   la, lap, laps, lapse, elapse, relapse, relapser, relapsers
#   in, pin, ping, aping, raping, craping, scraping, scrapings
#   at, eat, eath, heath, sheath, sheathe, sheather, sheathers

Riddler Classic

What arrangement of letters on a 4×44\times 4 Boggle board has the most points attainable? Words are read along paths of orthogonally or diagonally adjacent cells, never reusing a cell, and must be at least three letters long. Words of length 3,4,5,6,7,8+3,4,5,6,7,8^+ score 1,1,2,3,5,111,1,2,3,5,11 points, and each distinct word counts once.

The Riddler, FiveThirtyEight, October 21, 2016(original post)

Solution

Unlike the Express, this has no tidy answer and no proof of optimality: the space of 261626^{16} boards is far too large to search, and the best results come from heuristics, hill-climbing, Metropolis sampling, letter-frequency seeding, run for hours. So the honest statement is that this is an open optimisation, and the most we can give is the best board anyone has found. The record from the column, a board independently produced by Chuong Do and matched by solver Daniel Ferguson, scores about 4,600 points (best known, not proven optimal)\boxed{4{,}600 \text{ points (best known, not proven optimal)}} on the reference Boggle solver, on a board such as SLPSEIAERNTRSEGS\begin{array}{cccc} \textsf{S}&\textsf{L}&\textsf{P}&\textsf{S}\\ \textsf{E}&\textsf{I}&\textsf{A}&\textsf{E}\\ \textsf{R}&\textsf{N}&\textsf{T}&\textsf{R}\\ \textsf{S}&\textsf{E}&\textsf{G}&\textsf{S} \end{array} The recipe behind such boards is plain: load the grid with common, flexible letters (S, E, R, T, N, A, I) and avoid dead weight, so that a few short, high-frequency stems sprout into hundreds of scoring words.

The number itself is unavoidably dictionary-dependent. What a computation can pin down rigorously is the score of a given board under a given word list, and the gap between word lists is large: the board above scores 4,6004{,}600 on the column’s reference solver but 3,6013{,}601 on the public-domain ENABLE list, a reminder that “most points” is only well defined once the dictionary is fixed.

The computation

Encode the scoring exactly: walk every self-avoiding path on the grid, pruning with a prefix trie of the dictionary, collect the distinct words found, and add up their points. Here it scores the record board against ENABLE.

words = set(w.strip().upper() for w in open('enable1.txt') if w.strip().isalpha())
trie = {}
for w in words:
    if len(w) < 3: continue
    d = trie
    for ch in w: d = d.setdefault(ch, {})
    d['$'] = True

def points(L): return {3:1, 4:1, 5:2, 6:3, 7:5}.get(L, 11 if L >= 8 else 0)

def score(board):
    g = [list(r) for r in board]; found = set()
    def dfs(r, c, node, path, seen):
        ch = g[r][c]
        if ch not in node: return
        node = node[ch]; path += ch
        if node.get('$') and len(path) >= 3: found.add(path)
        for dr in (-1, 0, 1):
            for dc in (-1, 0, 1):
                nr, nc = r + dr, c + dc
                if (dr or dc) and 0 <= nr < 4 and 0 <= nc < 4 and (nr, nc) not in seen:
                    dfs(nr, nc, node, path, seen | {(nr, nc)})
    for r in range(4):
        for c in range(4):
            dfs(r, c, trie, "", {(r, c)})
    return sum(points(len(w)) for w in found), len(found)

board = ["SLPS", "EIAE", "RNTR", "SEGS"]
s, n = score(board)
print(f"ENABLE score of the record board: {s} points across {n} words")
# ENABLE score of the record board: 3601 points across 1126 words