Chapter 126
Growing Words and a Point-Rich Boggle Board
The Riddler for October 21, 2016, two word puzzles. Both are computational, and both depend on a dictionary; we use the public-domain ENABLE word list the column itself suggested.
Riddler Express
What is the longest word you can build one letter at a time? Start from a valid two-letter word and, at each step, add a single letter to the front or the back so that the result is again a valid word: a three-letter word, then a four-letter word, and so on. (For example, HE THE THEM THEME THEMES.)
The Riddler, FiveThirtyEight, October 21, 2016(original post)
Solution
This is a reachability question on the dictionary, so the clean way to think about it is backwards. Call a word reachable if it can be grown by the rules from some two-letter start. A word of length three or more is reachable exactly when it is valid and at least one of its two “parents”, the word with its first letter removed or the word with its last letter removed, is itself valid and reachable. Every valid two-letter word is reachable by definition. Marking words reachable in order of length, length 2 first, gives every reachable word and its growth path in one sweep, and the answer is the longest reachable word.
Run against the ENABLE list, the longest reachable length is and there are exactly three nine-letter endpoints, each with a clean growth path:
la, lap, laps, lapse, elapse, relapse, relapser, relapsers
in, pin, ping, aping, raping, craping, scraping, scrapings
at, eat, eath, heath, sheath, sheathe, sheather, sheathers
The result is dictionary-dependent: the North American Scrabble list adds a few more nine-letter chains (through lassi, which ENABLE omits), and the international Collins list stretches to a ten-letter chain, hi, hin, hing, ching, eching, eeching, reeching, breeching, breechings.
The computation
Encode the backward rule directly. Load the word list, then for each length from 3 upward mark a word reachable when one of its two parents is, recording that parent so the chain can be read back.
# enable1.txt: the public-domain ENABLE word list (the list the puzzle suggests)
words = set(w.strip() for w in open('enable1.txt'))
from collections import defaultdict
bylen = defaultdict(list)
for w in words: bylen[len(w)].append(w)
reach, parent, longest_len = {}, {}, 2
for w in bylen[2]: reach[w] = True; parent[w] = None # two-letter seeds
for L in range(3, max(bylen) + 1):
grew = False
for w in bylen[L]:
if w[1:] in reach: reach[w] = True; parent[w] = w[1:]; grew = True
elif w[:-1] in reach: reach[w] = True; parent[w] = w[:-1]; grew = True
if grew: longest_len = L
def chain(w):
out = []
while w is not None: out.append(w); w = parent[w]
return ", ".join(out[::-1])
print("longest chain length:", longest_len)
for w in sorted(w for w in bylen[longest_len] if w in reach):
print(" ", chain(w))
# longest chain length: 9
# la, lap, laps, lapse, elapse, relapse, relapser, relapsers
# in, pin, ping, aping, raping, craping, scraping, scrapings
# at, eat, eath, heath, sheath, sheathe, sheather, sheathers
Riddler Classic
What arrangement of letters on a Boggle board has the most points attainable? Words are read along paths of orthogonally or diagonally adjacent cells, never reusing a cell, and must be at least three letters long. Words of length score points, and each distinct word counts once.
The Riddler, FiveThirtyEight, October 21, 2016(original post)
Solution
Unlike the Express, this has no tidy answer and no proof of optimality: the space of boards is far too large to search, and the best results come from heuristics, hill-climbing, Metropolis sampling, letter-frequency seeding, run for hours. So the honest statement is that this is an open optimisation, and the most we can give is the best board anyone has found. The record from the column, a board independently produced by Chuong Do and matched by solver Daniel Ferguson, scores about on the reference Boggle solver, on a board such as The recipe behind such boards is plain: load the grid with common, flexible letters (S, E, R, T, N, A, I) and avoid dead weight, so that a few short, high-frequency stems sprout into hundreds of scoring words.
The number itself is unavoidably dictionary-dependent. What a computation can pin down rigorously is the score of a given board under a given word list, and the gap between word lists is large: the board above scores on the column’s reference solver but on the public-domain ENABLE list, a reminder that “most points” is only well defined once the dictionary is fixed.
The computation
Encode the scoring exactly: walk every self-avoiding path on the grid, pruning with a prefix trie of the dictionary, collect the distinct words found, and add up their points. Here it scores the record board against ENABLE.
words = set(w.strip().upper() for w in open('enable1.txt') if w.strip().isalpha())
trie = {}
for w in words:
if len(w) < 3: continue
d = trie
for ch in w: d = d.setdefault(ch, {})
d['$'] = True
def points(L): return {3:1, 4:1, 5:2, 6:3, 7:5}.get(L, 11 if L >= 8 else 0)
def score(board):
g = [list(r) for r in board]; found = set()
def dfs(r, c, node, path, seen):
ch = g[r][c]
if ch not in node: return
node = node[ch]; path += ch
if node.get('$') and len(path) >= 3: found.add(path)
for dr in (-1, 0, 1):
for dc in (-1, 0, 1):
nr, nc = r + dr, c + dc
if (dr or dc) and 0 <= nr < 4 and 0 <= nc < 4 and (nr, nc) not in seen:
dfs(nr, nc, node, path, seen | {(nr, nc)})
for r in range(4):
for c in range(4):
dfs(r, c, trie, "", {(r, c)})
return sum(points(len(w)) for w in found), len(found)
board = ["SLPS", "EIAE", "RNTR", "SEGS"]
s, n = score(board)
print(f"ENABLE score of the record board: {s} points across {n} words")
# ENABLE score of the record board: 3601 points across 1126 words