Using Probabilistic Method in Combinatorics

A powerful tool especially for attacking existence problems

By Vamshi Jandhyala in mathematics

March 16, 2020

$$ \newcommand\EE{\mathbb E} \newcommand\PP{\mathbb P} $$

Problem [IMC 2002, 6]

Two hundred students participated in a mathematical contest.

They had $6$ problems to solve. It is known that each problem was correctly solved by at least $120$ participants.

Prove that there must be two participants such that every problem was solved by at least one of these two students.

Solution

Let $M_i$ be the indicator variable which takes the value $1$ when a random pair of participants both miss the question $i$ and $0$ otherwise.

We have $\PP[M_i=1] = \EE[M_i] \leq 0.4^2= 0.16$.

The expected number of questions which are missed by both participants of a random pair of participants is given by

$$ \begin{align} \sum_{i=1}^{6}\EE[M_i] \leq 6 \cdot 0.16 = 0.96 < 1
\end{align} $$

Computational verification

from random import sample

num_conts = 200
num_capq = 120
num_questions = 6
runs = 10000
tot_successes = 0
conts = list(range(num_conts))
questions = list(range(num_questions))
for _ in range(runs):
    answers = [sample(conts, num_capq) for _ in questions]
    x, y = sample(conts, 2)
    tot_successes += sum([1 if (x not in answers[i]) and (y not in answers[i])
                          else 0 for i in questions])
print(tot_successes/runs)

Problem [IMO Shortlist 1987]

Show that we can colour the elements of the set ${1, 2, \dots, 1987}$ with $4$ colours so that any arithmetic progression of ten terms, each in the set, is not monochromatic.

Solution

Let $S$ be the set of all arithmetic progression of ten terms where each term is in the set ${1, 2, \dots, 1987}$.

Let $\vert S \vert = n$ be the number of elements in $S$.

If the first term of a 10-term progression is $k$ and the difference is $d$, then $1 \leq k \leq 1978$ and $d \leq \left[ \frac{1987−k}{9} \right]$; hence

$$ \begin{align} n = \sum_{k=1}^{1978} \left[ \frac{1987−k}{9} \right] < \frac{1986 + 1985 + \cdots +9}{9} = \frac{1995 \cdot 1978}{18} \end{align} $$

The probability of an element of $S$ being monochromatic is $4\left( \frac{1}{4} \right)^{10} = \left( \frac{1}{4} \right)^9$.

The probability of at least one element of $S$ being monochromatic is $\PP[S_1 \cup S_2 \dots S_n]$ where $S_i$ is the event of element $i$ of the set $S$ being monochromatic. We have

$$ \begin{align} \PP[S_1 \cup S_2 \cup \dots \cup S_n] \leq \PP[S_1] + \dots + \PP[S_n] = \frac{n}{4^9} \end{align} $$

Therefore the probability that no element in set $S$ is monochromatic is given by

$$ \begin{align} 1 - \PP[S_1 \cup S_2 \cup \dots \cup S_n] \geq 1 - \frac{n}{4^9} > 0 \end{align} $$

because $n < \frac{1995 \cdot 1978}{18} < 4^9$.

Problem [Zarankiewicz]

Show that there exists a partition of the set of positive integers into two classes such that neither class contains an infinite arithmetic progression, and neither class contains $3$ consecutive integers.

Solution

Color each odd integer red or blue randomly. Colour each even integer the opposite colour of the preceding odd integer.

As any set of three consecutive integers has two consecutive integers, one of which is even and one of which is odd.

They both will have different colours based on the above colouring scheme. So we cannot have $3$ consecutive integers all having the same colour.

Problem [IMO 1987]

Let $p_n(k)$ be the number of permutations of the set ${1, \dots , n}$, $n \geq 1$, which have exactly $k$ fixed points. Prove that

$$ \begin{align} \sum_{k=0}^n k p_n(k) = n! \end{align} $$

(Remark: A permutation f of a set S is a one-to-one mapping of S onto itself.An element i in S is called a fixed point of the permutation f if f(i) = i.)

Solution

The above problem is equivalent to proving that the expected number of fixed points in a permutation is $1$ because $\frac{p_n(k)}{n!}$ is the probability of a permutation having $k$ fixed points.

Let $K$ be the random variable indicating the number of fixed points in a permutation.

We have $K=P_1 + \dots + P_n$ where $P_i$ is an indicator random variable which takes the value $1$ when $i$ is a fixed point of the permutation and $0$ otherwise.

We have $\PP[P_i=1] = \EE[P_i] = \frac{1}{n}$. Therefore,

$$ \begin{align} \EE[K] = \sum_{i=1}^n\EE[P_i] = n\cdot \frac{1}{n} = 1 \end{align} $$

Computational verification

from random import shuffle

num = 10
runs = 1000000
tot_successes = 0
nums = list(range(num))
for _ in range(runs):
    shuffle(nums)
    tot_successes += sum([1 if i == j else 0 for i, j in enumerate(nums)])
print(tot_successes/runs)

Problem

With the same notation as in the previous problem, determine the value of

$$ \begin{align*} \sum_{k=0}^n k^2 p_n(k) \end{align*} $$

Solution

As mentioned in the previous solution, $\frac{p_n(k)}{n!}$ is the probability of a permutation having $k$ fixed points, the above expression is equivalent to finding $n!\EE[K^2]$ where $K$ is the random variable indicating the number of fixed points in a permutation.

We have $K=P_1 + \dots + P_n$ where $P_i$ is an indicator random variable which takes the value $1$ when $i$ is a fixed point and $0$ otherwise.

We have $\PP[P_i=1] = \EE[P_i] = \EE[P_i^2] = \frac{1}{n}$ and $\PP[P_i=1, P_j=1] = \EE[P_i P_j] = \frac{1}{n}\frac{1}{n-1}$. Therefore,

$$ \begin{align} \EE[K^2] = \sum_{i=1}^n\EE[P_i^2] + 2\sum_{i=1}^{n-1} \sum_{j=i+1}^{n}\EE[P_i P_j] \
= n\cdot \frac{1}{n} + 2 \frac{n(n-1)}{2} \frac{1}{n}\frac{1}{n-1} = 2 \end{align} $$

which means $\sum_{k=0}^n k^2 p_n(k) = 2n!$.

Computational verification

from random import shuffle

num = 23
runs = 100000
tot_successes = 0
nums = list(range(num))
for _ in range(runs):
    shuffle(nums)
    k = sum([1 if i == j else 0 for i, j in enumerate(nums)])
    tot_successes += k**2
print(tot_successes/runs)

Problem [Kraft’s inequality 1949]

Let $C$ be a set of binary strings. (A binary string is a finite sequence of 0’s and 1’s). Say that $C$ is a prefix-free code if no string in $C$ is a prefix of another string in $C$.

1.Show that if $C$ is a prefix-free code, then

$$ \begin{align} \sum_{x \in C} \frac{1}{2^{\vert x \vert}} \leq 1 \end{align} $$

Here $\vert x \vert$ means the length of string $x$.

2.Show that if $C$ is a prefix-free code with $n \geq 1$ strings, then the average length of a string in $C$ is at least $log_2n$.

Solution

Let $r$ be a random infinite string. The probability that a particular string of length $k$ is a a prefix of $r$ is $\frac{1}{2^k}$.

1.The probability that $C$ contains a prefix of $r$ is given by $\PP[C_1 \cup \dots \cup C_n]$ where $C_i$ is the event that $c_i \in C$ is a prefix of $r$.

We have $\PP[C_i] = \frac{1}{2^{\vert c_i \vert}}$ and $\PP[C_i \cap C_j] = 0$.

$C_i$ and $C_j$ are disjoint because if $c_i$ and $c_j$ are both prefixes of $r$, then either of $c_i$ or $c_j$ has to be a prefix of the other and this is not possible because $C$ is a prefix-free code.

The same argument can be extended to intersection of more than two events.Therefore

$$ \begin{align} \PP[C_1 \cup \dots \cup C_n] = \sum_{x \in C} \frac{1}{2^{\vert x \vert}} \leq 1 \end{align} $$

2.Using $AM \geq GM$, we have

$$ \begin{align} \frac{1}{n} \geq \frac{\sum_{x \in C} \frac{1}{2^{\vert x \vert}}}{n} \geq \left(\frac{1}{2^{\sum_{i=1}^n \vert x_i \vert}} \right)^\frac{1}{n} \end{align} $$

Taking the logarithm to the base 2 on both sides, we have

$$ \begin{align} \frac{\sum_{i=1}^n \vert x_i \vert}{n} \geq log_2 n \end{align} $$

Problem [Iberoamerican Olympiad 2001, 9]

Let $S$ be a set with $n$ elements. Let $S_1, S_2, \dots , S_k$ be subsets of $S$ (for $k \geq 2$), each with at least $r$ elements.

Prove that there exist $i$ and $j$ (with $1 \leq i < j \leq k$) such that the number of common elements of $S_i$ and $S_j$ is at least

$$ \begin{align} r - \frac{nk}{4(k-1)} \end{align} $$

Solution

Sample a pair of sets uniformly at random (i.e., randomly pick one of the ${k \choose 2}$ possible pairs).

Let $X$ be the number of common elements which are in both chosen sets.

Note that $X = X_1 + \dots + X_n$, where each $X_i$ is the ${0, 1}$-random variable telling whether the $i-th$ element was in both chosen sets.

By linearity of expectation, $\EE[X] = \EE[X_1] + \dots + \EE[X_n]$. Let $m_i$ be the number of sets that the $i-th$ element belongs to.

Then, each $\EE[X_i] = \PP[\text{i-th element is in both picked sets}] = {m_i \choose 2}/{k \choose 2}$.

The only piece of information we know about the ${m_i}$ is that their sum $\sum_i m_i \geq k \cdot r$.

We have the inequality ${m_i \choose 2} \geq (k-1)m_i/2 - k^2/8$.

This can easily derived from the fact that $(m-\frac{k}{2})^2 \geq 0$. Therefore,

$$ \begin{align} \EE[X] \geq \left( \frac{(k-1) }{2} \cdot k \cdot r - n \cdot \frac{k^2}{8} \right)/{k \choose 2} \geq r - \frac{n \cdot k}{4(k-1)} \end{align} $$

Problem [AHSME 1989]

Suppose that $7$ boys and $13$ girls line up in a row. Let $S$ be the number of places in the row where a boy and a girl are standing next to each other.

For example, for the row $GBBGGGBGBGGGBGBGGBGG$ we have $S = 12$.

Find the expected value of $S$.

Solution

Let $A_i$ be the indicator variable taking the value $1$ if an adjacent pair of places has a boy and girl and $0$ otherwise.

The probability $\PP[A_i = 1] = \frac{7}{20} \cdot \frac{13}{19} + \frac{13}{20} \cdot \frac{7}{19} $.

We have $S= \sum_{i=1}^{19} A_i$.

Therefore the expected value of $S$ is

$$ \begin{align} \EE[S] = \sum_{i=1}^{19} \EE[A_i] = 19 \cdot 2 \cdot \frac{7 \cdot 13}{20 \cdot 19} = 9.1 \end{align} $$

Computational verification

from random import shuffle

row = ['B']*7 + ['G']*13
runs = 1000000
tot_succ = 0
for _ in range(runs):
    shuffle(row)
    for i in range(1, len(row)):
        if row[i-1] != row[i]:
            tot_succ += 1
print(tot_succ/runs)

Problem [AIME 2006 #6]

Let $\mathcal{S}$ be the set of real numbers that can be represented as repeating decimals of the form $0.\overline{abc}$ where $a, b, c$ are distinct digits.

Find the sum of the elements of $\mathcal{S}$.

Solution

The sum of the elements in $\mathcal{S}$ is given by $\left\lvert \mathcal S \right\rvert \cdot \EE[0.\overline{ABC}]$ as probability of each element of $\mathcal{S}$ is given by $\frac{1}{\left\lvert \mathcal S \right\rvert}$.

Let $A$, $B$ and $C$ have the distribution $\mathcal{U}[0,9]$. The expectation,

$$ \begin{align*} \EE[0.\overline{ABC}] = \EE[\frac{100A}{999} + \frac{10B}{999} + \frac{C}{999}] \
= \frac{100}{999}\cdot \frac{9}{2} +\frac{10}{999}\cdot \frac{9}{2} + \frac{1}{999}\cdot \frac{9}{2} = \frac{1}{2} \end{align*} $$

Therefore the sum of the elements of $\mathcal{S}$ is $720 \cdot \frac{1}{2}= 360$.

Computational verification

from fractions import Fraction
from itertools import product

sum = 0
for a, b, c in product(*[range(0, 10)]*3):
    if a != b and b != c and a != c:
        sum += Fraction(1, 999)*(100*a + 10*b + c)
print(sum)

Problem[NIMO 4.3]

One day, a bishop and a knight were on squares in the same row of an infinite chessboard, when a huge meteor storm occurred, placing a meteor in each square on the chessboard independently and randomly with probability $p$.

Neither the bishop nor the knight were hit, but their movement may have been obstructed by the meteors.

For what value of $p$ is the expected number of valid squares that the bishop can move to (in one move) equal to the expected number of squares that the knight can move to (in one move)?

Solution

Let the knight be at the purple square to begin with

The knight can move to any of the blue squares in one move starting from the central purple square.

The expected number of blue squares that will not contain a meteor is $8(1-p)$ by linearity of expectation.

Staring from the central purple square, the bishop can move to any of the green squares, pink squares provided the corresponding green squares don’t have a meteor, red squares provided the corresponding green and pink squares don’t have a meteor and so on diagonally.

The expected number of green squares without the meteor is $4(1-p)$, pink squares is $4(1-p)^2$, red squares is $4(1-p)^3$ and so on for all the squares located diagonally.

The value of $p$ for which the expected number of valid squares that the bishop can move to (in one move) equal to the expected number of squares that the knight can move to (in one move) is given by

$$ \begin{align} 8(1-p) &= 4\left\{(1-p) + (1-p)^2 + (1-p)^3 + \dots + \right\} \
\implies p &= \frac{1}{2} \end{align} $$

The squares the bishop and knight can move to in one step(the diagonal squares extend infinitely)

Problem [Iran TST 2008/6]

Suppose $799$ teams participate in a round-robin tournament. Prove that one can find two disjoint groups $A$ and $B$ of seven teams each such that all teams in $A$ defeated all teams in $B$.

Solution

A round robin tournament is a complete graph where each vertex is a team and a directed edge indicates the direction of win/loss.

Sample $A$ as a random $7$-set. Let $X$ be the number of guys that are totally dominated by $A$.

Letting $d_{v}^-$ denote the in-degree of $v$, we have $\EE[X] = \sum_{v}{d_{v}^- \choose 7} / {799 \choose 7}$ But $\sum_v d_{v}^- = {799 \choose 2}$, which means that the average in-degree is exactly $399$.

By convexity,

$$ \begin{align} \EE[X] \geq 799 · {399 \choose 7} / {799 \choose 7} \approx 800·(1/2)^7 \approx 6.25, \end{align} $$

which is enough since $X$ is an integer. Pick $7$ teams $B$ from the dominated group.

Problem

Let $n$ be a positive integer. Suppose $11n$ points are arranged in a circle, colored with one of $n$ colors, so that each color appears exactly $11$ times.

Prove that one can select a point of every color such that no two are adjacent.

Solution

Let’s label the points around the circle $a_1, a_2, \dots, a_{11n}$, and $a0 = a11n$. Let’s pick a point of each color randomly to form a set $B$ of $n$ points.

For each $1 \geq i \geq 11n$, let $X_i$ be the event that both $a_i$ and $a_{i+1}$ are in $B$. Note that

$$ \begin{align} \PP[X_i] &= \frac{1}{11} \cdot \frac{1}{11} \text{if $a_i$ and $a_{i+1}$ are of different colours}\
&= 0 \text{ otherwise} \end{align} $$

However, we see that each $X_i$ is independent from $X_j$ for all $j$ unless $a_j$ or $a_{j+1}$ has the same color as $a_i$ or $a_{i+1}$.

Note that $10$ other points are colored the same color as $a_i$, and each point appears in two consecutive pairs of points.

So there are at most $2 · 10 + 1 = 21$ pairs $(a_j , a_{j+1})$ other than $(a_i, , a_{i+1})$ sharing a color with $a_i$.

Similarly, there are at most another $21$ pairs sharing a color with $a_{i+1}$.

Thus $X_i$ and $X_j$ are independent for all but at most $42$ values of $j$.

Now we can apply Lov'asz Local Lemma with $p = \frac{1}{11^2}$ and $d = 42$. Since

$$ \begin{align} e p (d + 1) = e · \frac{1}{11^2} \cdot 43 < \frac{28}{10} \cdot \frac{43}{121} < 1 \end{align} $$

Lov'asz Local Lemma tells us that the probability that none of the bad events occur is positive.

Therefore, there must be a set of $n$ points each with a different color and no two adjacent.

Problem [Russia 1996]

In the Duma there are $1600$ delegates, who have formed $16000$ committees of $80$ people each.

Prove that one can find two committees having no fewer than four common members.

Solution

Sample a pair of committees uniformly at random (i.e., randomly pick one of the ${16000 \choose 2}$ possible pairs).

Let $X$ be the number of people who are in both chosen committees.

Note that $X = X_1 + \dots + X_{1600}$, where each $X_i$ is the {0, 1}-random variable telling whether the $i-th$ person was in both chosen committees.

By linearity of expectation, $\EE[X] = \EE[X_1] + \dots + \EE[X_{1600}]$.

Then, each $\EE[X_i] = \PP[\text{i-th person is in both committees}] = {n_i \choose 2} / {16000 \choose 2}$.

The only piece of information we know about the ${n_i}$ is that their $\sum n_i = 16000 · 80$, so this suggests that we use convexity to bound $\EE[X]$ in terms of the average of ${n_i}$, which we denote by $\overline{n} = (16000 · 80)/1600 = 800$.

We have,

$$ \begin{align} \EE[X] \geq 1600 \cdot {\overline{n} \choose 2} / {16000 \choose 2} \
= 1600 \cdot {800 \choose 2} / {16000 \choose 2} = 1600 ·\frac{800 · 799}{16000 · 15999} = 3.995. \end{align} $$

(One could see that since $799 \approx 800$ and $15999 \approx 16000$, the last fraction is roughly $1/400$.)

But by the Lemma, we know that some outcome of the probabilistic sampling produces an $X \geq 3.995$.

Since $X$ is always an integer, that outcome must in fact have $X \geq 4$. In particular, we conclude that some pair of committees has $\geq 4$ common m.

Problem [Russia 1999]

In a certain school, every boy likes at least one girl.

Prove that we can find a set $S$ of at least half the students in the school such that each boy in $S$ likes an odd number of girls in $S$.

Solution

Choose girls independently with probability $1/2$, and then let the set of boys be all of those who have an odd number of friends in the girl group.

Let $X$ be the number of boys and girls selected, and break this into the sum of indicators.

For each girl, obviously the indicator adds $1/2$ to the sum.

For each boy, the probability that he joins is precisely the probability that $Bin[k, 1/2]$ is odd, where $k$ was the number of girls he knew.

To see that this probability is $1/2$, note that it is the parity of the sum of $k$ independent coin flips.

In particular, the final flip independently flips or retains the final parity, hence odd with probability $1/2$.

Problem

In a $100 × 100$ array, each of the numbers $1, 2, \dots , 100$ appears exactly $100$ times.

Show that there is a row or a column in the array with at least 10 distinct numbers.

Solution

Let $n = 100$. Choose a random row or column ($2n$ choices). Let $X$ be the number of distinct entries in it.

Now $X = \sum_{i=1}^n I_i$ , where each $I_i$ is the indicator variable of $i$ appearing (possibly more than once) in our random row or column.

Clearly, each $\EE[I_i] = \PP[I_i \geq 1]$.

To lower-bound this,observe that the worst-case is if all $n$ appearances of $i$ are in some $\sqrt{n} × \sqrt{n}$ sub matrix, which gives $\PP[I_i \geq 1] \geq 2\sqrt{n}/(2n) = 1/\sqrt{n}$.

Hence by linearity, $\EE[X] \geq \sqrt{n}$.