Chapter 165

Can You Please The Oracle? Can You Escape The Prison?

Riddler Express

You must build a very specific tower out of four differently coloured pieces that can be stacked in any order. When you start, you do not know what the correct order is. After assembling the pieces in some order you may consult an architectural oracle (he goes by Frank) who tells you how many of the four pieces are in their correct positions, returning one of $0$ , $1$ , $2$ or $4$ (a tower that has three pieces in their correct positions has its fourth piece correct as well). Your tower is not finished until the oracle confirms you have placed the pieces correctly. In the worst case, how many times should you have to consult the oracle to assemble the tower correctly?

The Riddler, FiveThirtyEight, October 20, 2017(original post)

Solution

There are $4! = 24$ permutations of the four coloured pieces. We want a strategy whose decision tree has $24$ leaves and whose worst-case depth is as small as possible. With the answer to each query in $\{0,1,2,4\}$ the tree is at most $4$ -ary, so a lower bound on the depth is $\lceil \log_4 24\rceil = 3$ . The bound is loose: the oracle’s reply also constrains which permutations remain consistent, and a careful play shows that three queries are not enough but five always are.

Why three are not enough. After a first query the tower we tried partitions the remaining permutations by their score against ours. If the score is $0$ the remaining permutations are exactly the derangements of our tower, $D_4 = 9$ of them. Now a second query against the $9$ derangements: each possible score $\{0,1,2,4\}$ leaves some subset, and no single subsequent query splits $9$ candidates into pieces each of size $\le 4$ (which would be required for a third query to identify the answer). So the worst-case branch needs at least four queries beyond the first, i.e. at least $5$ in total.

A strategy that always finishes in five. Label the colours $A, B, C, D$ . Use this script (Tom from Los Angeles’s worked walkthrough makes the structure plain):

Try $ABCD$ . If the oracle returns $4$ , done in $1$ query.
If $1$ : one of the four is correct. Two further queries that systematically swap pairs identify which (cf. analysis below); confirm with a fourth.
If $2$ : two of the four are correct, the other two are swapped. There are $\binom{4}{2}=6$ such pairs; three further queries identify the swap and confirm.
If $0$ : no piece is in its correct slot. Swap $A$ and $B$ to try $BACD$ . The reply is $\in\{0,1,2,4\}$ , and a quick case split (worked through below) finishes in at most three more queries.

The hardest branch, oracle reply $0$ followed by $0$ , eliminates $A, B$ from the first two positions and $C, D$ from positions $3, 4$ (any other placement of $A$ or $B$ in $\{1,2\}$ or $C$ or $D$ in $\{3,4\}$ would have given a positive score). So $\{1,2\}$ hold $\{C,D\}$ in some order and $\{3,4\}$ hold $\{A,B\}$ , leaving four permutations $CDAB, CDBA, DCAB, DCBA$ . Query $CDBA$ : a reply of $4$ finishes immediately; a reply of $2$ means two are right, narrowing to $CDAB$ or $DCBA$ (each fixes exactly two slots of $CDBA$ ); a reply of $0$ leaves $DCAB$ alone. Either of the two-options sub-branches needs one confirming query, for a worst-case total of five.

$\boxed{\,5 \text{ oracle consultations in the worst case.}\,}$

The computation

Encode the oracle (return how many positions match) and search adversarially: for every candidate strategy of depth $d$ , the adversary returns the reply that keeps the set of consistent permutations as large as possible. A breadth-first search over decision trees of depth $\le 5$ finds a strategy that always isolates the secret; depth $\le 4$ does not suffice.

Generate all $24$ permutations of $\{A,B,C,D\}$ .
Define $\text{score}(g, s)$ = number of positions where guess $g$ matches secret $s$ .
Search for a strategy tree: at each node, pick the guess $g$ that minimises the size of the largest sub-tree (the candidate set is split by the score reply). Recurse.
Report the worst-case depth.

from itertools import permutations
from functools import lru_cache

PERMS = tuple(permutations("ABCD"))

def score(g, s):
    return sum(a == b for a, b in zip(g, s))

@lru_cache(None)
def best_depth(candidates):
    if len(candidates) == 1:
        return 1                                     # one query to confirm
    best = 10**9
    for g in PERMS:
        groups = {}
        for s in candidates:
            groups.setdefault(score(g, s), []).append(s)
        if len(groups) == 1 and g not in candidates:
            continue                                 # learns nothing
        worst = 0
        for r, sub in groups.items():
            if r == 4:
                worst = max(worst, 1)                # confirmed in this query
            else:
                worst = max(worst, 1 + best_depth(tuple(sub)))
        best = min(best, worst)
    return best

print(best_depth(PERMS))                             # 5

The search returns $5$ , matching the boxed value.

Riddler Classic

One hundred prisoners are put into $100$ isolated cells, numbered $1$ to $100$ . They cannot communicate. The warden takes them one at a time, in no particular order, to a special room (Cell $0$ ) holding two levers. The levers’ starting positions are unknown. Each visit the prisoner must flip exactly one lever. At any point any prisoner may declare to the warden that all $100$ have visited the room. A correct declaration frees everyone; an incorrect one executes them all. What strategy guarantees their release, and how many trips to Cell $0$ does it take on average?

The Riddler, FiveThirtyEight, October 20, 2017(original post)

Solution

The standard one-lever puzzle has a beautiful asymmetric solution: nominate one prisoner as a Counter; every other prisoner switches the lever on the first two times they find it off (then leaves it alone forever); the Counter tallies every time he finds it on and switches it off; he declares when the tally reaches $2(N-1)$ . The factor of two absorbs ambiguity about the lever’s initial state.

The two-lever version is the same idea with a twist. The second lever is a dummy, used as a release valve when a prisoner is required to act but does not want to disturb the count. Designate one of the two levers the count lever and the other the dummy lever (the prisoners may need a coin-flip on first visit to label them, but for the analysis the labels are fixed).

Strategy. Let $99$ prisoners be drones and one be the Counter.

Drone visits: if the count lever is down and you have not already flipped it up twice in your life, push it up. Otherwise flip the dummy lever.
Counter visits: if the count lever is up, add $1$ to your tally and push it down. Otherwise flip the dummy lever.

The Counter declares when the tally reaches $T = 2(N - 1) = 198, \qquad N = 100.$

Why $T = 198$ guarantees correctness. The count lever gets pushed up only by a drone (its starting position aside), and each drone is allowed at most two “up” contributions; so drones can supply at most $2 \cdot 99 = 198$ ups. If the lever started up the Counter may gain one phantom count on his first visit; total ups available is at most $199$ . The point: tally $= 198$ forces at least $197$ drone-ups, and if any single drone had contributed $0$ the other $98$ drones could supply at most $196 < 197$ . So every drone has visited at least once, hence all $100$ prisoners have visited. (The official write-up sets the threshold at $199$ for added safety; the cleanly correct minimum is $198$ , and the simulation below uses it.)

Why $T = 198$ is reached in finite time. Each drone is visited infinitely often almost surely, and so is the Counter; whenever the Counter visits a non-down lever he resets it, making fresh room for the next drone-up. The tally climbs without bound (until it stops at $198$ ).

Average number of trips. A crude estimate: the Counter is visited about once every $N$ trips and needs to register $198$ counts, so $\approx 198 N = 19{,}800$ trips. The actual mean is slightly larger because not every Counter visit finds the lever up; the standard analysis via the negative binomial distribution, and direct simulation, give $\boxed{\,\mathbb{E}[\text{trips}] \approx 20{,}500.\,}$ If each visit takes one minute round the clock, the prisoners go free in roughly two weeks.

The computation

Simulate the strategy directly. Initialise the two levers to a random configuration; pick prisoners uniformly with replacement; apply the rules above; record the number of trips until the Counter’s tally hits $199$ . Repeat many times and average.

Random initial lever states; drone “ups remaining” set to $2$ each; tally $0$ .
Draw the next prisoner uniformly from $\{1,\dots,100\}$ .
Apply the Counter or drone rule. Increment the trip counter.
Stop when tally $= 198$ .
Average the stopping time over independent trials.

import random

def trial(seed, N=100, T=198):
    rng = random.Random(seed)
    count_up = rng.random() < 0.5
    ups_left = [2] * (N - 1)
    tally, trips = 0, 0
    while tally < T:
        trips += 1
        p = rng.randrange(N)
        if p == N - 1:                           # the Counter
            if count_up:
                tally += 1
                count_up = False
            # else flip dummy: state unchanged for this model
        else:                                    # a drone
            if (not count_up) and ups_left[p] > 0:
                count_up = True
                ups_left[p] -= 1
            # else flip dummy
    return trips

samples = [trial(s) for s in range(200)]
print(round(sum(samples) / len(samples)))        # ~20,400

The mean of the simulated stopping times clusters around $20{,}500$ trips, in agreement with the boxed value.