# Collection of interesting problems in probability

## With computational solutions in Python

By Vamshi Jandhyala in mathematics

December 6, 2019

## Problem 1

Two players A and B play a series of games where a win is worth one point. For each game, A wins with a probability $p$ and B with a probability $1-p$. The match ends when one player get a lead of two points. What is the probability that A wins the match?

### Solution

Let $x$ be the probability of A winning the game.

We have three **mutually exclusive** events where A wins with a lead of two points.

When A wins the first two games. The probability of this event is $p^2$.

When A wins the first game and B wins the second game and then A goes on to win the match. The probability of this event is $p(1-p)x$. After the first two games in this case, the difference in the scores is $0$ and it is equivalent to starting the match from the beginning.

When B wins the first game and A wins the second game and then A goes on to win tha match.The probability of this event is $(1-p)px$. After the first two games in this case, the differnce in the scores is $0$ and it is equivalent to starting the match from the beginning.

Therefore we have,

$$
\begin{align*}
x &= p^2 + p(1-p)x + (1-p)px \\

\implies x &= \frac{p^2}{1-2p(1-p)}
\end{align*}
$$

## Problem 2

Let $\alpha$ and $\beta$ be given positive real numbers, with $\alpha < \beta$. If two points are selected at random from a straight line segment of length $\beta$, what is the probability that the distance between them is at least $\alpha$?

### Solution

A point $(x,y)$ in the square $(0,\beta) \times (0,\beta)$ represents the two random points selected from a straight line of length $\beta$. For the distance between them to be at least $\alpha$, we have $\vert x - y \vert \geq \alpha$. The area of the region $\vert x \vert \leq \beta$, $\vert y \vert \leq \beta$ and $\vert x - y \vert \geq \alpha$ is given by $(\beta - \alpha)^2$ and the probability is given $\frac{(\beta - \alpha)^2}{\beta^2}$.

## Problem 3

Consider a particle situated at the origin $(0,0)$ of $\mathbb{R}^2$. At successive times a direction is chosen independently by picking an angle uniformly at random in the interval $[0,2π]$, and the particle then moves an Euclidean unit length in this direction. Find the expected squared Euclidean distance of the particle from the origin after $n$ such movements.(Cambridge Tripos, Part 1A, 2014, Paper 2, Section 1, 3F).

### Solution

The $x$ coordinate of the particle after $n$ movements is given by $\sum_{i=1}^n cos(θ_i)$.

The $y$ coordinate of the particle after $n$ movements is given by $\sum_{i=1}^n sin(θ_i)$.

The squared Euclidean distance of the particle from the origin after $n$ movements is given by

$$
\begin{align*}
d^2 &&=&& (\sum_{i=1}^n cos(\theta_i))^2 + (\sum_{i=1}^n sin(\theta_i))^2 \\

&&=&& n + \sum_{1 \leq i,j \leq n, i \neq j} cos(\theta_i - \theta_j)
\end{align*}
$$

$$ \begin{align*} \mathbb{E}[d^2] &&=&& n + \sum_{1 \leq i,j \leq n, i \neq j} \mathbb{E}[cos(\theta_i - \theta_j)] \end{align*} $$

where each $θ_i$ is uniformly distributed between $[0,2π]$.

It can be shown that $θ_i−θ_j$ follows a triangular distribution and that $\mathbb{E}[cos(\theta_i - \theta_j)] = 0$.

Therefore the expected value of the squared Euclidean distance from the origin after $n$ movements is $n$.

### Computational verification

```
from random import uniform
from math import pi, cos, sin
import matplotlib.pyplot as plt
def expSqrLen(n, its):
el = 0
for i in xrange(its):
xsum = sum(cos(uniform(0, 2*pi)) for i in xrange(n))
ysum = sum(sin(uniform(0, 2*pi)) for i in xrange(n))
el += xsum*xsum + ysum*ysum
return el / its
x = range(1, 26)
y = [expSqrLen(n, 10000) for n in x]
plt.plot(x, y)
plt.xlabel('n')
plt.ylabel('Expected value of length^2')
plt.show()
```

## Problem 4

Suppose $n$ balls are thrown randomly into $n$ boxes, so each ball lands in each box with uniform probability. Also, suppose the outcome of each throw is independent of all the other throws.

a) Let $X_i$ be an indicator random variable whose value is $1$ if box $i$ is empty and $0$ otherwise. Write a simple closed form expression for the probability distribution of $X_i$. Are $X_1, X_2,\dots, X_n$ independent random variables?

b) Find a constant, $c$, such that the expected number of empty boxes is asymptotically equal $cn$.

### Solution

a) The probability of a ball not falling in a box in a throw is $\frac{n-1}{n}$. The probability that the box $i$ to be empty after $n$ throws is given by $(\frac{n-1}{n})^{n}$ as each throw is independent. The random variables are pairwise independent because knowing whether box \(i\) is empty or not doesn’t give any information about whether box $j$ is empty.

b) The expected number of empty boxes asymptotically is given by

$$ \mathop{\mathbb{E}}(\sum_{i=1}^{n}X_i) = \frac{(n-1)^n}{n^{n-1}} \sim \frac{n}{e} $$

### Computational verification

```
from numpy import random
from math import exp
runs = 10000
n_balls = 10000
s = 0
for _ in range(runs):
n_empty = n_balls - len(set(random.randint(1, n_balls, n_balls)))
s += n_empty
print("Simulation : %f, Derivation: %f" % (s*1.0/runs, n_balls/exp(1)))
Simulation result: 3678.818300, Calculated result: 3678.794412
```

### References

MIT OCW - Mathematics for Computer Science - Assignment 12

## Problem 5

At time $t = 1$ choose two numbers $x_1, y_1$ uniformly and independently from $[0, 1]$. At time $t= 2$ choose two further numbers $x_2, y_2$, etc. What is the expected time $n$ at $\sum_{i=1}^n (x_i^2+y_i^2) \geq 1$ for the first time?

### Solution 1

Let $R =X^2 + Y^2$ where $X \thicksim U[0,1]$ and $Y \thicksim U[0,1]$.

The CDF of $R$ is given by

\begin{equation} \label{1} \mathbb{P}[R \leq r] = \mathbb{P}[X^2 + Y^2 \leq r] = \frac{\pi r}{4} \end{equation} when $0 \leq r \leq 1$.

The PDF, $g(r)$ of $R$ is therefore $g(r) = \frac{\pi}{4}$ when $0 \leq r \leq 1$.

Let $t(s)$ be the expected time at which $\sum (x_i^2+y_i^2) \geq s$, we then have

\begin{align} \label{2} t(s) = 1 + \int_{0}^s t(s-r) g(r) dr = 1 + \frac{\pi}{4} \int_{0}^s t(r) dr \end{align} where $s \leq 1$.

Differentiating $\ref{2}$, we get

\begin{equation} \label{3} t'(s) = \frac{\pi}{4} t(s) \end{equation}

Integrating $\ref{3}$, we get

\begin{equation} \label{4} t(s) = e^{\frac{\pi s}{4}} \end{equation}

as $t(0) = 1$.

To understand more about differentiating under the Integral sign, please refer to differentiating under the integral sign.

$\therefore$ the expected time $n$ when $\sum_{i=1}^n (x_i^2+y_i^2) \geq 1$ for the first time is $e^{\frac{\pi}{4}} = 2.19$.

### Solution 2 (Credit: Vaibhav Mehrotra)

Expectation of $N$, the random variable which denotes the time it takes for

$\sum_{i=1}^n (x_i^2+y_i^2) \geq 1$ is given by

\begin{equation} \mathbb{E}[N] = \sum_{i=1}^{\infty} i\mathbb{P}[N = i] = \sum_{i=1}^{\infty} \mathbb{P}[N \geq i] \end{equation}

The probability of $\mathbb{P}[N \geq d]$ is equivalent to $\mathbb{P}[\sum_{i=1}^{d-1} (x_i^2+y_i^2) < 1]$ and is given by

\begin{equation} \label{6} \mathbb{P}[N \geq d] = \frac{V_{2d-2}(1)}{2^{2d-2}} = \frac{\pi^{d-1}}{4^{d-1}(d-1)!} \end{equation}

where $V_{2d}(1)$ is the volume of a ball of radius $1$ in $2d$-dimensional space.

For details please refer to volume of an n-ball.

$\therefore$ the expected time $n$ when $\sum_{i=1}^n (x_i^2+y_i^2) \geq 1$ for the first time is given by

\begin{equation} \label{7} \mathbb{E}[N] = \sum_{i=0}^{\infty} \frac{\pi^{i}}{4^{i}i!} = e^{\frac{\pi}{4}} \end{equation}

### Computational verification

```
from random import random
runs = 10000
sum_t = 0
for _ in range(runs):
t = 1
x, y = random(), random()
d = x**2 + y**2
while(d < 1.0):
x, y = random(), random()
d += x**2 + y**2
t += 1
sum_t += t
print("Expected time from simulation = ", sum_t/runs)
```

### References

Volume of an n-ball in Wikipedia

Differentiating under the Integral sign

## Problem 6

Starting from $0$, we keep adding a different random number each of which is the minimum of two independent uniform random variables between $0$ and $1$. On average, how many such random pairs need to be generated to get the sum that exceeds 1?

### Solution

#### Integral equation for expectation

Let $f(t)$ be the expected number of pairs that need to be drawn so that $\sum \min(X,Y) \geq t$.

We have the following integral equation:

$$ \begin{equation} f(t) = 1 + \int_0^t f(t-z)g(z)dz \end{equation} $$ where $g(z)$ is the PDF of $Z = \min (X,Y)$ where $X \sim \mathcal{U}[0,1]$ and $Y \sim \mathcal{U}[0,1]$.

#### PDF of $\min (\mathcal{U}[0,1],\mathcal{U}[0,1])$

We have

$$ \mathbb{P}[Z \geq z] = \mathbb{P}[X \geq z]\cdot\mathbb{P}[Y \geq z] = (1-z)^2 $$

The CDF of $Z$ is therefore $1-(1-z)^2$.

Differentiating the CDF, we get the PDF of $Z$, $g(z) = 2(1-z)$.

#### Solution of the integral equation

Substituting $g(z)$ into the integral equation for expectation, we get

$$
\begin{align*}
f(t) &= 1 + \int_0^t f(t-z)2(1-z)dz \\

\implies f(t) &= 1 + 2\int_0^t f(u)(1-t+u)du \text{ where } u = t-z \\

\implies f(t) &= 1 + 2(1-t)\int_0^t f(u)du + 2\int_0^t uf(u)du
\end{align*}
$$

Differentiating on both sides, we get

$$ f'(t) = 2f(t) - 2\int_0^t f(u)du $$

Differentiating on both sides again, we get

$$ f''(t) = 2f'(t) - 2f(t) $$

Any $f$ which is a solution of the above equation has to satisfy the conditions $f(0)=1$ and $f'(0) = 2$.

The solution to the above equation is $f(t) = e^t(\sin t+ \cos t)$.

Therefore, on average the number of random pairs that need to be generated to get to a sum of $1$ is given by $f(1) = e(\sin 1+ \cos 1) \sim 3.756$.

### Computational verification

```
from random import random
runs = 100000
tn = 0
for _ in range(runs):
t = 0
while t <= 1:
m = min(random(), random())
t += m
tn += 1
print(tn/runs)
```

## Problem 7

Starting from $0$, we keep adding a different random number each of which is the maximum of two independent uniform random variables between $0$ and $1$. On average, how many such random pairs need to be generated to get the sum that exceeds 1?

### Solution

#### Integral equation for expectation

Let $f(t)$ be the expected number of pairs that need to be drawn so that $\sum \max (X,Y) \geq t$.

We have the following integral equation:

$$ \begin{equation} f(t) = 1 + \int_0^t f(t-z)g(z)dz \end{equation} $$ where $g(z)$ is the PDF of $Z = \max (X,Y)$ where $X \sim \mathcal{U}[0,1]$ and $Y \sim \mathcal{U}[0,1]$.

#### PDF of $\max (\mathcal{U}[0,1],\mathcal{U}[0,1])$

We have

$$ \mathbb{P}[Z \leq z] = \mathbb{P}[X \leq z]\cdot\mathbb{P}[Y \leq z] = z^2 $$

The CDF of $Z$ is therefore $z^2$.

Differentiating the CDF, we get the PDF of $Z$, $g(z) = 2z$.

#### Solution of the integral equation

Substituting $g(z)$ into the integral equation for expectation, we get

$$
\begin{align*}
f(t) &= 1 + \int_0^t f(t-z)2zdz \\

\implies f(t) &= 1 + 2\int_0^t f(u)(t-u)du \text{ where } u = t-z
\end{align*}
$$

Differentiating on both sides, we get

$$ f'(t) = 2\int_0^t f(u)du $$

Differentiating on both sides again, we get

$$ f''(t) = 2f(t) $$

Any $f$ which is a solution of the above equation has to satisfy the conditions $f(0)=1$ and $f'(0) = 0$.

The solution to the above equation is $f(t) = \frac{1}{2}e^{-\sqrt{2}t}(e^{2\sqrt{2}t}+1)$.

Therefore, on average the number of random pairs that need to be generated to get to a sum of $1$ is given by $f(1) = \frac{1}{2}e^{-\sqrt{2}}(e^{2\sqrt{2}}+1) \sim 2.178$.

### Computational verification

```
from random import random
runs = 100000
tn = 0
for _ in range(runs):
t = 0
while t <= 1:
m = max(random(), random())
t += m
tn += 1
print(tn/runs)
```

## Problem 8

From a string of length $1$m we choose a length uniformly at random between $0$m and $1$m and cut as many of these lengths as possible from the string. What is the expected length of the remaining string (in meters)?

### Solution

Let $X \sim \mathcal{U}[0,1]$ and $Y = 1 - nX$ where $0 \leq 1 - nX \leq X$ and $n = 1,2,\dots$. Therefore
$$
\begin{align*}
\mathbb{E}[Y] &= \mathbb{E}[\mathbb{E}[Y|n]] = \mathbb{E}[\mathbb{E}[1-nX|\frac{1}{n+1} \leq X \leq \frac{1}{n}]] \\

&= \sum_{n=1}^\infty \left(1 - n(\frac{\frac{1}{n} + \frac{1}{n+1}}{2})
\right)(\frac{1}{n} - \frac{1}{n+1}) \\

&= \frac{1}{2}\left(\sum_{n=1}^\infty(\frac{1}{n} -\frac{1}{n+1}) - \sum _{n=1}^\infty\frac{1}{(n+1)^2} \right) \\

&= \frac{1}{2}\left(1 - (\frac{\pi^2}{6}-1) \right) = 1 - \frac{\pi^2}{12}
\end{align*}
$$

## Problem 9

Consider a random shuffle of a $52$ card deck. The ‘Shuffle Index’ is then determined by the thirteen ranks of the deck. If any two cards of the same rank appear next to each other after the shuffle, then that rank contributes $0$ to the index. Else the rank contributes $1$. What is the expected ‘Shuffle Index’ of a random shuffle?

### Solution

The ‘Shuffle index’ can be written as $S=\sum_{i=1}^{13}R_i$ where $R_i$ is the indicator random variable taking the value $1$ when no two cards of rank $i$ appear next to each other and $0$ otherwise. $\mathbb{E}[R_i] = {49 \choose 4}/{52 \choose 4}$. Therefore, $\mathbb{E}[S] = 13 {49 \choose 4}/{52 \choose 4} \approx 10.174$.

### Computational Verification

```
from random import shuffle
from itertools import product
runs = 100000
deck = list(product(range(4), range(13)))
total_si = 0
for _ in range(runs):
shuffle(deck)
si = [1]*13
for i in range(51):
if deck[i][1] == deck[i+1][1]:
si[deck[i][1]] = 0
total_si += sum(si)
print(total_si/runs)
```

## Problem 10 (Credit: Muthu Veerappan)

It is well known that if you break a stick of unit length at two points, the probability the broken pieces form a triangle is $\frac{1}{4}$. What is the expected area of the triangle?

### Solution

Let $X$ and $Y$ be two uniform random variables on $[0,1]$. The lengths of the three pieces are $x$, $y-x$ and $1-y$ when $x<y$. If the three pieces form a triangle, we have the following inequalities

$$
\begin{align*}
x + y - x > 1 - y \implies y > \frac{1}{2}\\

y - x + 1 - y > x \implies x < \frac{1}{2}\\

x + 1 - y > y - x \implies y - x < \frac{1}{2}
\end{align*}
$$

The set of all $(x,y)$ satisfying the above is given by the region

$$ C_1= {(u,v): v > \frac{1}{2}, u < \frac{1}{2}, v-u < \frac{1}{2} }. $$

Similarly when $x>y$, we get the region

$$ C_2= {(u,v): u > \frac{1}{2}, v < \frac{1}{2}, u-v < \frac{1}{2} }. $$

Area of the triangle using *Heron’s formula* is

$$ A(x,y) = \sqrt{\frac{1}{2}(\frac{1}{2}-x)(\frac{1}{2}-y+x)(y - \frac{1}{2})} $$

whenever $(x,y) \in C_1 \cup C_2$.

The area of the region $C_1=C_2=\frac{1}{8}.$

The joint density function $f(x,y)=1.$

Therefore, $\mathbb{P}[C_1]= \int_{C_1} f(u,v) dudv = \frac{1}{4}.$

The conditional joint density $f_{X,Y|C_1}(x,y) = \frac{f(x,y)}{\mathbb{P}[C_1]} = 4.$

Using symmetry, the conditional expectation of the area given that a triangle can be formed is given by

$$
\begin{align*}
&\int_{C_1} f_{U,V|C_1}(u,v) A(u,v) du dv + \int_{C_2} f_{U,V|C_1}(u,v) A(u,v) du dv\\

&=2 \int_0^{\frac{1}{2}} \int_{\frac{1}{2}}^{\frac{1}{2}+u} 4 \sqrt{\frac{1}{2}(\frac{1}{2}-u)(\frac{1}{2}-v+u)(v - \frac{1}{2})} dv du \

&= \frac{\pi}{105}.
\end{align*}
$$

### Computational Verification

```
from random import random
from math import sqrt, pi
runs = 1000000
cnt = 0
total_area = 0
for _ in range(runs):
u,v = sorted([random(), random()])
if u < 0.5 and v > 0.5 and (v-u) < 0.5:
a = sqrt(0.5*(0.5-u)*(v-0.5)*(0.5-v+u))
cnt += 1
total_area += a
print(total_area/cnt, pi/105)
```

## Problem 11

Choose three points at random on the circumference of a unit circle centred at the origin. What is the expected length of the arc that contains the point $(1,0)$?

### Solution

Let the point $(1,0)$ be marked on a circle. The problem of marking three other random points is equivalent to marking four random points on the circle. The expected length of the arc between any two random points is $2\cdot\pi/4$. The expected length of the arc containing the mark $(1,0$ is therefore $2\cdot2\cdot\pi/4 = \pi$.

### Computational Verification

```
from random import uniform
from math import pi
runs = 100000
for n in range(3,10):
tl = 0
for _ in range(runs):
points = sorted([uniform(0, 2*pi) for _ in range(n)])
tl += (points[0] + 2*pi-points[-1])
print("n= %d" % n, tl/runs, 4*pi/(n+1))
```

## Problem 12 (Credit: Muthu Veerappan)

Consider a wire of length $1$ unit. Five birds sit on this wire with all positions equally likely. For each bird, colour the segment from that bird to its nearest neighbour. What is the expected length of the wire that is coloured?

### Solution

From the simulation below, we see that the expected length of the wire that is coloured is $\approx \textbf{0.463}$.

```
from random import random
def colored_length(runs=100000):
total_len = 0
for _ in range(runs):
birds = sorted([random() for _ in range(5)])
d1, d2, d3, d4 = [birds[i]-birds[i-1] for i in range(1,5)]
total_len += d1 + d4 + (d2 if d2 < d1 or d2 < d3 else 0) + \
(d3 if d3 < d4 or d3 < d2 else 0)
return total_len/runs
print(colored_length())
```