Chapter 96
How Badly Can A Car Salesman Swindle You?
You want a car whose fair price is , but you have no idea what is. The dealer writes the five numbers , , , and on index cards (actual numbers, not formulas) and shows them to you one at a time in random order. At each card you either pay that price or ask for the next; you cannot go back, and the fifth card, if you reach it, is forced. Playing optimally, how much should you expect to pay above on average?
The Riddler, FiveThirtyEight, by Zach Wissner-Gross(original post)
Solution
The markup you pay is one of , shown in a random order. Because is unknown, a single card tells you nothing: its number could be any of the five. What you learn are the differences between cards. Seeing two cards apart pins them as the extremes and , which fixes and lets you wait for whichever low card you like; seeing cards apart narrows things almost as much.
Taking the first card blind costs the full average markup, , so the rule “never take the first card” already beats the dealer. From there the policy is found by backward induction: at each step you compare the markup you would pay now against the expected markup of playing on optimally, given only the differences seen so far, and you stop when stopping is no worse. Carrying that comparison back from the forced fifth card through every information state collapses the four opening gaps ( between the first two cards) to their best continuations, and the expected payment above comes out to So the dealer’s theatrics are worth only ninety dollars to him, less than half of the a naive buyer would surrender.
The computation
Encode the game and the knowledge. The true markups are a random permutation of (hundreds of dollars). At any moment the buyer knows each seen card only relative to the first, so two orderings sharing the same differences are indistinguishable and must be played alike. Group the permutations by those differences and, by backward induction from the last card, take the cheaper of paying now or playing on.
from itertools import permutations
deck = list(permutations(range(5))) # markups in hundreds of dollars
def value(group, k):
# group: permutations still indistinguishable; k: index of the card now shown
pay_now = sum(p[k] for p in group) / len(group) # expected markup if you stop
if k == 4: # fifth card is forced
return pay_now
branches = {} # next card splits the group
for p in group:
seen = tuple(p[i] - p[0] for i in range(1, k + 2)) # differences so far
branches.setdefault(seen, []).append(p)
play_on = sum(len(s) / len(group) * value(s, k + 1) for s in branches.values())
return min(pay_now, play_on)
print(round(value(deck, 0) * 100))
# 90