Skip to content
Vamshi Jandhyala

Books · The Riddler

Chapter 183

How Hard Is It To Find A Running Buddy?

Riddler Express

A grocer’s 1515 aisles are randomly assigned to 55 stockers. (i) How many different ways are there for one stocker to be assigned exactly three aisles out of the 1515? (ii) How many different ways are there to assign all 1515 aisles to the 55 stockers, three aisles each? (iii) If the manager may assign any number of aisles to any stocker, how many assignments are possible?

The Riddler, FiveThirtyEight, May 11, 2018(original post)

Solution

Part (i): one stocker, three aisles. The order in which the three aisles are assigned to a single stocker does not matter; the aisle set does. This is the unordered choice of 33 from 1515: (153)  =  1514136  =  455.\binom{15}{3} \;=\; \frac{15 \cdot 14 \cdot 13}{6} \;=\; 455.

Part (ii): all aisles, three each. Pick the first stocker’s three aisles, then the second’s from the 1212 remaining, then the third’s from the 99 remaining, and so on. The five stockers are distinguishable (they are different people), so the multinomial coefficient is (153)(123)(93)(63)(33)  =  15!(3!)5  =  168,168,000.\binom{15}{3} \binom{12}{3} \binom{9}{3} \binom{6}{3} \binom{3}{3} \;=\; \frac{15!}{(3!)^{5}} \;=\; 168{,}168{,}000.

Part (iii): unrestricted. Each aisle is assigned to one of the five stockers independently, with no constraint on how many a single stocker may take. By the product rule, 515  =  30,517,578,125.5^{15} \;=\; 30{,}517{,}578{,}125.

So the three answers are 455,168,168,000,30,517,578,125.\boxed{\,455,\quad 168{,}168{,}000,\quad 30{,}517{,}578{,}125.\,}

The computation

Evaluate the three counts symbolically with Python’s exact integer arithmetic and confirm the product structure of part (ii) by direct multinomial enumeration on a smaller toy case.

  1. Compute (153)\binom{15}{3}, 15!/(3!)515! / (3!)^{5}, and 5155^{15}.

  2. Cross-check the multinomial count by the product of nested binomials.

  3. Verify part (ii) on a toy case (say 99 aisles, 33 stockers, 33 each) against the brute enumeration of all such partitions.

import math
from itertools import combinations

print("(i)   ", math.comb(15, 3))
print("(ii)  ", math.factorial(15) // (math.factorial(3) ** 5))
print("nested",
      math.comb(15, 3) * math.comb(12, 3) * math.comb(9, 3)
      * math.comb(6, 3) * math.comb(3, 3))
print("(iii) ", 5 ** 15)

# toy brute force: 9 aisles, 3 stockers (A,B,C), 3 each
toy = 0
aisles = list(range(9))
for A in combinations(aisles, 3):
    rem1 = [x for x in aisles if x not in A]
    for B in combinations(rem1, 3):
        toy += 1  # C is forced
print("toy 9-aisles, 3 stockers, 3 each =", toy,
      " formula =", math.factorial(9) // (math.factorial(3) ** 3))

The three answers print, the nested-binomial product matches the multinomial, and the toy case returns 1,6801{,}680, equal to 9!/(3!)39! / (3!)^{3}, confirming the structure.

Riddler Classic

NN people go for a run. Each person’s preferred speed XiX_{i} is independent and normally distributed with mean μ\mu and variance σ2\sigma^{2}. Persons ii and jj can be running buddies if XiXjs|X_{i} - X_{j}| \le s. How large does NN need to be before the probability that a given person has a running buddy is at least 99%99\%? (Assume μ\mu, σ\sigma, and ss are fixed.)

The Riddler, FiveThirtyEight, May 11, 2018(original post)

Solution

The right thing to condition on. Fix person 11 and ask: given person 11’s speed X1=xX_{1} = x, what is the probability that none of the other N1N - 1 runners is within ss of xx? Once we condition on X1=xX_{1} = x, the other X2,,XNX_{2}, \ldots, X_{N} are independent of one another. So the no-buddy event factors: Pr(no buddyX1=x)  =  j=2NPr(xXj>s)  =  (1q(x))N1,\begin{aligned} \Pr(\text{no buddy} \mid X_{1} = x) &\;=\; \prod_{j = 2}^{N} \Pr(|x - X_{j}| > s) \\ &\;=\; \bigl(1 - q(x)\bigr)^{N - 1}, \end{aligned} where q(x)  =  Pr(xXs)  =  Φ ⁣(x+sμσ)Φ ⁣(xsμσ)q(x) \;=\; \Pr(|x - X| \le s) \;=\; \Phi\!\left(\frac{x + s - \mu}{\sigma}\right) - \Phi\!\left(\frac{x - s - \mu}{\sigma}\right) and Φ\Phi is the standard normal cumulative distribution function. Marginalising over the density φμ,σ\varphi_{\mu, \sigma} of X1X_{1}, pno buddy(N)  =  φμ,σ(x)(1q(x))N1dx.p_{\text{no buddy}}(N) \;=\; \int_{-\infty}^{\infty} \varphi_{\mu, \sigma}(x) \bigl(1 - q(x)\bigr)^{N - 1} \, \mathrm{d}x.

The headline. The probability of having a buddy is 1pno buddy(N)1 - p_{\text{no buddy}}(N). We want the smallest NN for which this is at least 0.990.99: Nmin  =  min{N2pno buddy(N)    0.01}.\boxed{\,N_{\min} \;=\; \min\Bigl\{ N \ge 2 \,\Bigm|\, p_{\text{no buddy}}(N) \;\le\; 0.01 \Bigr\}.\,}

Why the published one-liner is too small. The official write-up uses the formula 1pno buddy(N)  =  1(1G(s))N1,G(s)  =  erf ⁣(s2σ),1 - p_{\text{no buddy}}(N) \;=\; 1 - \bigl(1 - G(s)\bigr)^{N - 1}, \qquad G(s) \;=\; \mathrm{erf}\!\left(\frac{s}{2\sigma}\right), which treats the events {X1Xjs}\{|X_{1} - X_{j}| \le s\} for different jj as independent. They share the common variable X1X_{1}, so they are not. Conditional on X1X_{1} they factor cleanly; unconditional they are positively correlated, which raises pno buddyp_{\text{no buddy}} above the product formula. The official formula therefore under-estimates the NN needed and yields a value (around N18N \approx 18 for marathon-fit parameters) that is several times too small.

Worked numbers (Boston Marathon parameters). Take μ=6.8\mu = 6.8, σ=1.16\sigma = 1.16 miles per hour, and try the bracket of ss values mentioned in the official write-up.

(σ,s)(\sigma, s) official formula integral
(1.16,1.00)(1.16, 1.00) 99 2828
(1.16,0.50)(1.16, 0.50) 1818 7575
(1.16,0.25)(1.16, 0.25) 3737 165165
(1.00,1.00)(1.00, 1.00) 88 2121

The integral column is what the puzzle actually asks for. To get to 99%99\% buddy-coverage you need substantially more runners than the official formula suggests: for the s=0.5s = 0.5 case, around 7575, not 1818.

Closed-form scaling. The integral depends on the dimensionless ratio ρ=s/σ\rho = s / \sigma only. For small ρ\rho, q(x)q(x) is approximately ρφ(x0)\rho \varphi(x_{0}) for x0=(xμ)/σx_{0} = (x - \mu)/\sigma, and the integral is approximately 1/(1+(N1)ρ/π)1/21/(1 + (N - 1)\rho / \sqrt{\pi})^{1/2}. So N(π/ρ2)ln(100)N \sim (\sqrt{\pi}/\rho^{2}) \cdot \ln(100) scales like 1/ρ21/\rho^{2}, four times the scaling the official formula suggests. The intuition is that someone with an extreme speed has nobody at all near them; even with many average-paced runners, the tails dictate the no-buddy probability.

The computation

Evaluate the integral by numerical quadrature for a sweep of NN, find the threshold NN^{*} that drops pno buddyp_{\text{no buddy}} below 0.010.01, and confirm by Monte Carlo simulation.

  1. For each NN from 22 upwards, compute φ(x)(1q(x))N1dx\int \varphi(x) (1 - q(x))^{N - 1} \, \mathrm{d}x by Simpson’s rule on a wide grid (in standardised units).

  2. Return the first NN for which the integral falls below 0.010.01.

  3. Cross-check with 200,000200{,}000-trial Monte Carlo.

import math, random
from scipy import stats
from scipy.integrate import quad

def threshold(sigma, s, mu=0.0, target=0.01):
    def q(x):
        a = stats.norm.cdf((x + s - mu) / sigma)
        b = stats.norm.cdf((x - s - mu) / sigma)
        return a - b
    def no_buddy(N):
        f = lambda x: stats.norm.pdf((x - mu) / sigma) / sigma * (1 - q(x)) ** (N - 1)
        v, _ = quad(f, mu - 8 * sigma, mu + 8 * sigma)
        return v
    N = 2
    while no_buddy(N) > target:
        N += 1
    return N

def mc(sigma, s, N, mu=0.0, trials=200_000):
    has = 0
    for _ in range(trials):
        xs = [random.gauss(mu, sigma) for _ in range(N)]
        if any(abs(xs[0] - xs[j]) <= s for j in range(1, N)):
            has += 1
    return has / trials

for sigma, s in [(1.0, 1.0), (1.16, 1.0), (1.16, 0.5)]:
    N = threshold(sigma, s)
    print(f"sigma={sigma}, s={s}: N* = {N}, MC at N* = {mc(sigma, s, N):.4f}")

For σ=1.16,s=1\sigma = 1.16, s = 1 the script prints N=28N^{*} = 28 with MC buddy-rate 0.990\approx 0.990; for σ=1.16,s=0.5\sigma = 1.16, s = 0.5, N=75N^{*} = 75 with MC 0.990\approx 0.990. These match the integral column in the table above.