Can You Solve The Vexing Vexillology?

A FiveThirtyEight Riddler puzzle.

By Vamshi Jandhyala in mathematics Riddler

January 3, 2020

Problem

The New York Times recently launched some new word puzzles, one of which is Spelling Bee. In this game, seven letters are arranged in a honeycomb lattice, with one letter in the center. Here’s the lattice from December 24, 2019:

Spelling Bee screenshot, with the required letter G, and the additional letters L, A, P, X, M and E. The goal is to identify as many words that meet the following criteria:

The word must be at least four letters long.

The word must include the central letter.

The word cannot include any letter beyond the seven given letters.

Note that letters can be repeated. For example, the words GAME and AMALGAM are both acceptable words. Four-letter words are worth 1 point each, while five-letter words are worth 5 points, six-letter words are worth 6 points, seven-letter words are worth 7 points, etc. Words that use all of the seven letters in the honeycomb are known as “pangrams” and earn 7 bonus points (in addition to the points for the length of the word). So in the above example, MEGAPLEX is worth 15 points.

Which seven-letter honeycomb results in the highest possible game score? To be a valid choice of seven letters, no letter can be repeated, it must not contain the letter S (that would be too easy) and there must be at least one pangram.

For consistency, please use this word list to check your game score.2

Solution

import datetime
from collections import defaultdict
from itertools import combinations
import urllib.request

# Load all dictionary words from file
DICT_URL = "https://norvig.com/ngrams/enable1.txt"


def load_dictionary(url):
    response = urllib.request.urlopen(url)
    return [l.decode('utf-8').strip() for l in response.readlines()]


ALL_WORDS = load_dictionary(DICT_URL)

print(ALL_WORDS[0:10])

Calculate word score based on the rules below

  1. Four-letter words are worth 1 point each, while five-letter words are worth 5 points, six-letter words are worth 6 points, seven-letter words are worth 7 points, etc.
  2. Words that use all of the seven letters in the honeycomb are known as “pangrams” and earn 7 bonus points(in addition to the points for the length of the word)
def word_score(word):
    bonus = 7 if len(set(word)) == 7 else 0
    return 1 if len(word) == 4 else len(word) + bonus

Valid HoneyCombPuzzle words as per the rules below

  1. The word must be at least four letters long.
  2. The word must include the central letter.
  3. The word cannot include any letter beyond the seven given letters.
HCP_WORDS = [w for w in ALL_WORDS if len(w) >= 4 and ("s" not in w)
             and len(set(w)) <= 7]

# Word scores for all HCP WORD
HCP_WORD_SCORES = {w: word_score(w) for w in HCP_WORDS}

# The set of HPC Words for a set of letters
HCP_WORDSETS = defaultdict(list)
for w in HCP_WORDS:
    HCP_WORDSETS[frozenset(w)].append(w)

Algorithm for calculating the score obtainable in a HoneyCombPuzzle

  1. Find all subsets using the 6 letters surrounding the central letter (this results in 64 subsets for each puzzle)
  2. Add the central letter to each of the subsets (as central character should be present in each word formed)
  3. For each subset, identify the list of HPC words that can be formed using letters of the subset
  4. Taking the union of all the words that can be formed by each subset, gives us all the HPC words that can be formed using the letters of the puzzle
  5. Sum the scores of all the HPC words for all the subsets to get the puzzle score
def letter_subsets(hc):
    center, surrounding = hc
    subsets = []
    for r in range(1, 7):
        for comb_r in combinations(surrounding, r):
            subset = frozenset(list(comb_r)+[center])
            subsets.append(subset)
    return subsets


def score(hc):
    return sum([HCP_WORD_SCORES[w] for subset in letter_subsets(hc)
                for w in HCP_WORDSETS[subset]])


def solution(hc):
    return [w for subset in letter_subsets(hc)
            for w in HCP_WORDSETS[subset]]


def print_solution(hc):
    print("HoneyCombPuzzle:", hc)
    words = solution(hc)
    print("Number of words:", len(words))
    for w in words:
        print(w, HCP_WORD_SCORES[w])
    sc = score(hc)
    print("Total score:", sc)

Which seven-letter honeycomb results in the highest possible game score? To be a valid choice of seven letters, no letter can be repeated, it must not contain the letter S (that would be too easy) and there must be at least one pangram.

Algorithm for identifying all valid sets of seven letter honeycombs

  1. Out of all valid HPC words, identify words which are formed with seven different letters (because the seven letters should form atleast one pangram)
  2. From each word identify the set of 7 distinct letters
  3. Each set of 7 distinct letters gives rise to 7 distinct honeycombs
  4. Identify the honeycomb with the largest score
# Words that use all of the seven letters in the honeycomb are known as “pangrams”
def best():
    def valid_honeycombs():
        pangram_sets = [set(w) for w in HCP_WORDS if len(set(w)) == 7]
        return [(c, ls.difference(c)) for ls in pangram_sets for c in ls]
    return max([(score(hc), hc) for hc in valid_honeycombs()])

if __name__ == "__main__":
    print_solution(('g', 'ameplx'))
    print("Start:", datetime.datetime.now())
    print("The Best HoneyCombPuzzle Score: %d, Puzzle: %s" % best())
    print("End:", datetime.datetime.now())