Chapter 96

How Badly Can A Car Salesman Swindle You?

You want a car whose fair price is $N$ , but you have no idea what $N$ is. The dealer writes the five numbers $N$ , $N+100$ , $N+200$ , $N+300$ and $N+400$ on index cards (actual numbers, not formulas) and shows them to you one at a time in random order. At each card you either pay that price or ask for the next; you cannot go back, and the fifth card, if you reach it, is forced. Playing optimally, how much should you expect to pay above $N$ on average?

The Riddler, FiveThirtyEight, by Zach Wissner-Gross(original post)

Solution

The markup you pay is one of $\$0,\$100,\$200,\$300,\$400$ , shown in a random order. Because $N$ is unknown, a single card tells you nothing: its number could be any of the five. What you learn are the differences between cards. Seeing two cards $\$400$ apart pins them as the extremes $N$ and $N+400$ , which fixes $N$ and lets you wait for whichever low card you like; seeing cards $\$300$ apart narrows things almost as much.

Taking the first card blind costs the full average markup, $\$200$ , so the rule “never take the first card” already beats the dealer. From there the policy is found by backward induction: at each step you compare the markup you would pay now against the expected markup of playing on optimally, given only the differences seen so far, and you stop when stopping is no worse. Carrying that comparison back from the forced fifth card through every information state collapses the four opening gaps ( $\$100,\$200,\$300,\$400$ between the first two cards) to their best continuations, and the expected payment above $N$ comes out to $\boxed{\$90.}$ So the dealer’s theatrics are worth only ninety dollars to him, less than half of the $\$200$ a naive buyer would surrender.

The computation

Encode the game and the knowledge. The true markups are a random permutation of $\{0,1,2,3,4\}$ (hundreds of dollars). At any moment the buyer knows each seen card only relative to the first, so two orderings sharing the same differences are indistinguishable and must be played alike. Group the permutations by those differences and, by backward induction from the last card, take the cheaper of paying now or playing on.

from itertools import permutations

deck = list(permutations(range(5)))      # markups in hundreds of dollars

def value(group, k):
    # group: permutations still indistinguishable; k: index of the card now shown
    pay_now = sum(p[k] for p in group) / len(group)   # expected markup if you stop
    if k == 4:                                         # fifth card is forced
        return pay_now
    branches = {}                                      # next card splits the group
    for p in group:
        seen = tuple(p[i] - p[0] for i in range(1, k + 2))   # differences so far
        branches.setdefault(seen, []).append(p)
    play_on = sum(len(s) / len(group) * value(s, k + 1) for s in branches.values())
    return min(pay_now, play_on)

print(round(value(deck, 0) * 100))
# 90