Books · The Fiddler: Solutions
Chapter 101
Can You Paint by Number?
An infinitely long strip of canvas, cm wide, is divided into squares. Each square is independently labelled or with equal probability, then painted red for and blue for . A cluster is a maximal run of same-coloured squares. On average, how large is a cluster?
The Fiddler, Zach Wissner-Gross, April 5, 2024(original post)
Solution
A cluster is a run of identical fair coin flips. Reading along the strip, once a cluster has begun each following square continues it with probability and ends it with probability , so a cluster’s length is geometric with mean . The same count from the other side: each square starts a new cluster exactly when it differs from its left neighbour, which happens half the time, so there is one cluster for every two squares. Either way the average cluster size is
The computation
Lay down a long random strip and count clusters directly as one more than the number of colour changes between adjacent squares; the strip length divided by that count is the average size.
import numpy as np
rng = np.random.default_rng(0)
g = rng.integers(0, 2, size=4_000_000)
breaks = 1 + np.count_nonzero(g[1:] != g[:-1]) # number of clusters
print(len(g) / breaks) # ~2.0
Extra Credit
Now the strip is cm wide, a grid, with squares adjacent only across shared edges. What is the average cluster size?
Solution
In two dimensions a run-length count no longer works, because a cluster can branch. Euler’s formula does, though. For the graph whose vertices are the cells of one colour and whose edges join same-coloured edge-neighbours, the number of connected pieces is where counts the independent cycles, which on this grid are exactly the monochromatic blocks (a filled square encloses one face). Take expectations per column of the strip: cells; the same-colour edges average (the one vertical pair) plus (the two horizontal pairs reaching the next column); and a given block is all one colour with probability . So each column contributes clusters. With two cells per column, the average cluster size is :
The computation
Generate a tall random strip and flood-fill its four-connected same-colour clusters, then divide the cell count by the number of clusters.
import numpy as np
from collections import deque
def avg_cluster(L, seed): # 2 x L strip, 4-connected clusters
rng = np.random.default_rng(seed); g = rng.integers(0, 2, size=(2, L))
seen = np.zeros_like(g, bool); n = 0
for i in range(2):
for j in range(L):
if not seen[i, j]:
n += 1; c = g[i, j]; q = deque([(i, j)]); seen[i, j] = True
while q:
a, b = q.popleft()
for da, db in ((1, 0), (-1, 0), (0, 1), (0, -1)):
x, y = a + da, b + db
if 0 <= x < 2 and 0 <= y < L and not seen[x, y] and g[x, y] == c:
seen[x, y] = True; q.append((x, y))
return 2 * L / n
print(avg_cluster(3_000_000, 0)) # ~3.200