The Counterfeit Coin

Twelve coins look identical, but one is counterfeit and slightly off in weight. Whether it is heavier or lighter than the rest, nobody knows. The only instrument is a two-pan balance, which for any chosen handful on each side reports one of three things: the left side falls, the right side falls, or the two sides hold level. In three weighings, find the false coin and say whether it is heavy or light. The catch that makes the puzzle sharp is this: the three weighings must be fixed in advance, each pan loaded before any pan has moved, with no chance to let one result guide the next.

A code for each coin

Decide the weighings ahead of time and each coin acquires a fixed itinerary: in each of the three weighings it sits on the left pan, on the right pan, or off to the side. Write that itinerary as three marks, one per weighing, each $-1$ for left, $+1$ for right, $0$ for off. A coin’s itinerary is thus a vector in $\{-1, 0, +1\}^3$ , its private code.

The balance, in turn, produces three readings, one per weighing, each of the same three values: left-heavy, right-heavy, or level. Suppose the counterfeit is coin $c$ and that it is the heavy one. In any weighing, the pan holding $c$ is the one that falls, so the three readings spell out exactly $c$ ’s code. If instead $c$ is the light coin, the pan holding it rises, and the readings spell the negation of $c$ ’s code. So the sequence of readings is $+\,\text{code}(c)$ when $c$ is heavy and $-\,\text{code}(c)$ when $c$ is light, and the scheme succeeds precisely when these two dozen possible reading-sequences, $\pm\,\text{code}(c)$ over the twelve coins, are all different and none is the all-level sequence. One more thing is needed for a reading to mean anything: each weighing must put the same number of coins on each pan, or a tilt tells us nothing.

How many coins can three weighings handle? The readings form a word of three trits, $3^3 = 27$ in all. One word, all-level, cannot arise when a counterfeit is present, leaving $26$ , in thirteen mirror-pairs $\pm v$ ; naively that is room for thirteen coins. The balancing requirement spends some of that room, and the true maximum, as Dyson proved, is $\tfrac{1}{2}(3^n - 3)$ for $n$ weighings: three coins with two weighings, twelve with three, thirty-nine with four. Twelve it is.

The model

A constraint program lays out the codes directly. Encode each coin’s code as the number it spells in base three, so that distinctness, the all-off ban, and the mirror-pair ban all become arithmetic on those numbers, while the balance is a sum over each weighing’s marks.

from ortools.sat.python import cp_model

def design(n, w):
    m = cp_model.CpModel()
    c = {(i, t): m.NewIntVar(-1, 1, "") for i in range(n) for t in range(w)}
    code = [m.NewIntVar(0, 3 ** w - 1, "") for _ in range(n)]
    for i in range(n):
        m.Add(code[i] == sum((c[i, t] + 1) * 3 ** t for t in range(w)))
        m.Add(code[i] != (3 ** w - 1) // 2)  # not the all-off code
    for t in range(w):
        m.Add(sum(c[i, t] for i in range(n)) == 0)  # weighing t balances
    for i in range(n - 1):
        m.Add(code[i] < code[i + 1])  # distinct codes, in order
    for i in range(n):
        for j in range(i + 1, n):
            m.Add(code[i] + code[j] != 3 ** w - 1)  # not mirror images
    s = cp_model.CpSolver()
    if s.Solve(m) not in (cp_model.OPTIMAL, cp_model.FEASIBLE):
        return None
    return [tuple(s.Value(c[i, t]) for t in range(w)) for i in range(n)]

The all-off code spells the middle number $(3^w - 1)/2$ , and a code and its mirror always sum to $3^w - 1$ , which is why those two bans read as they do.

Twelve coins, and not thirteen

The program returns a scheme like the one in Figure 40.1: three weighings, each a clean four against four, the other four coins set aside. To read it, perform the weighings, record the three tilts as a word, and match it against the coins’ codes. A word that is some coin’s code names that coin, heavy; the same word reversed names it, light; and because the twenty-four signed codes are all distinct, the match is never ambiguous.

Ask the same program for thirteen coins in three weighings and it returns nothing: no balancing scheme separates all twenty-six cases, exactly as Dyson’s count $\tfrac{1}{2}(3^3 - 3) = 12$ foretells. The same check at two weighings caps the count at three. The bound is not a counting accident patched onto the puzzle; it is what the model proves when asked for one coin too many.

A non-adaptive scheme for twelve coins. Each row is a coin’s itinerary, each column a weighing: L on the left pan, R on the right, a dot for set aside. Every weighing is four against four. The three tilts, read as a word, name the counterfeit and whether it is heavy or light.

Sources. The counterfeit-coin problem entered print as E. D. Schell’s “Problem E651: Weighed and Found Wanting,” American Mathematical Monthly volume $52$ ( $1945$ ), page $42$ . The general solution, including the maximum $\tfrac{1}{2}(3^n - 3)$ coins identifiable (and weighed heavy or light) in $n$ non-adaptive weighings, is Freeman J. Dyson’s, “The Problem of the Pennies,” The Mathematical Gazette volume $30$ ( $1946$ ), pages $231$ – $234$ . A different and gentler version, in which the counterfeit is known in advance to be the lighter coin, is treated by dynamic programming, both worst-case (the $\lceil \log_3 n \rceil$ bound) and expected-case, by Moshe Sniedovich, OR/MS Games: 3. Counterfeit Coin Problem, INFORMS Transactions on Education volume $3$ , number $2$ ( $2003$ ), pages $32$ – $41$ . The twelve-coin scheme drawn here is produced by the model above and re-checked to separate all twenty-four cases with every weighing balanced; that twelve is the most three weighings allow, and three the most for two, is confirmed by the model reporting infeasibility for one more, in agreement with Dyson’s formula.