Pooled Testing

Replication of the results in the paper for my own understanding.

By Vamshi Jandhyala

June 8, 2020

Pooled Testing

Pooled testing increases the efficiency of virus testing, given that only a limited number of tests is available. The idea is to pool samples taken from several subjects and test the combined sample with a single test. If the test is negative all subjects are negative. If the test is positive all individuals are tested to find the infected ones. The optimal size depends on the fraction of infected in the population and the rates of false positives and false negatives of the tests in use. I derive the formula for the optimal pooling size, the effective number of tested persons per test, and the false negatives for the pooled test and provide python code for the simulations.

Model

Assumptions

  1. a fraction $\lambda$ of infected people in the population,
  2. tests have a false positive rate of $\gamma_+$ and a false negative rate of $\gamma_-$. It is assumed that testing a pooled
  3. sample does not change the false positive and false negative rates of the test,
  4. Samples are pooled into groups of size $\omega$
  5. to control false negatives each pooled test is replicated $r$ times. If the majority of the $r$ replicates are positive, the pooled sample is declared positive,
  6. if the pooled test is positive, each individual in the group tested separately.

Output

Under these assumptions the following will be derived

  1. the optimal group size $\omega^{opt}$
  2. the effective number of persons that can be tested with one test, $PPT$ (persons per test)
  3. an estimate for the upper bound for the fraction of infected individuals that are missed by the pooled
  4. testing procedure (applied to the population). We call it the false negative factor for pooled testing and denote it by $FNPT$
  5. the false negative rate, $FNR$, of pooled testing, which is the fraction of infected individuals the pooled test will miss.

Persons Per Test (PPT)

A group is positive if at least one of its members is positive. The probability of a group having at least one positive member

$$ \begin{equation} p = 1 - (1 - \lambda)^{\omega} \end{equation} $$

where $(1 - \lambda)^{\omega}$ gives the probability of all $\omega$ members in the group being negative. Here we are assuming that event of any person in the group testing positive/negative is independent of the others in the group.

Probability of false positive test i.e. the test showing positive in spite of no one in the group being positive is given by

$$ \begin{equation} (1 - \lambda)^{\omega} \gamma_+ = (1-p)\gamma_+ \end{equation} $$

Similarly probability of a false negative test i.e. the test showing negative in spite of at least one person in the group being positive is given by $p \gamma^-$.

False positives do not decrease the chances to capture a true positive but only decrease the efficiency in using the available tests. More importantly, tests will miss positive individuals in $p\gamma_-$ cases on average.

Hence, the probability, $P_+$, that a test shows positive i.e. when the pooled sample is actually positive and the test shows positive (false negative cases are missed in this case) or the pooled sample is negative and the test shows positive is given by

$$ \begin{align} P_+ &= p(1-\gamma_-)+(1-p)\gamma_+ \
&= p(1-\gamma_–\gamma_+)+\gamma_+ \end{align} $$

To see how replicates effect the false negatives of the pooled test we take $r$ replications and then apply the majority rule. The probability $P_+^*$ that the majority of $r$ replicates identifies the group as positive becomes

$$ \begin{equation} P_+^* = \sum_{i > r/2}^r{r \choose i} P_+^i (1-P_+)^{(r-i)} \end{equation} $$

The expected number of tests per group is given by

$$ \begin{equation} r\cdot(1 - P_+^*) + (r +\omega) P_+^* = r + \omega P_+^* \end{equation} $$

We would have carried out a total of $r$ tests when the group turns out to be a negative and $r+ \omega$ tests when the group turns out to be a positive after the initial $r$ tests. We carry out additional $\omega$ tests, i.e. one additional test for each person in the group when the group tests positive after the initial $r$ tests.

The expected number of tests per person $Q$ is given by

$$ \begin{equation} Q = \frac{1}{\omega}(r + \omega P_+^*) = \frac{r}{\omega} + P_+^* \end{equation} $$

The persons per test is simply, $PPT = 1/Q$.

False Negative Rate (FNR)

A person who actually has the infection is not caught in the following cases:

  1. The majority of the $r$ tests performed on the group the person belongs to are false negatives
  2. The group turns out to be positive after $r$ tests but the person tests negative because of an individual false negative test

The probability of missing the infection present in a tested person is given by the \emph{False Negative Rate}

$$ \begin{equation} FNR = \gamma_-^* + (1-\gamma_-^*)\gamma_- \end{equation} $$

where $\gamma_-^*$ is given by

$$ \begin{equation} \gamma_-^* = \left(\sum_{i \geq r/2}^r{r \choose i} \gamma_- (1-\gamma_-)^{(r-i)}\right) \end{equation} $$

This is the number of individuals one expects to miss in pooled testing on \emph{average} per infected person.

For $r=1$, we have $\gamma_-^*= \gamma_-$ and for small $\gamma_-$, we have $FNR = 2\gamma_- - \gamma_-^2 \sim 2\gamma_- $.

If there are no biases or correlations within or between groups, we get that the number of missed infections will be $\lambda FNR \sim 2\lambda \gamma_-$ which is independent of pooling size.

False Negative Factor Per Pooled Test (FNPT)

The upper bound for the fraction of infected individuals that are missed by the pooled testing procedure $FNPT$ is

$$ \begin{equation} FNPT \equiv p \cdot FNR = p(\gamma_-^* + (1-\gamma_-^*)\gamma_-) \end{equation} $$

This is the expected number of missed infections per tested person.

$$ \begin{align*} FNPT = p FNR &= (1-(1-\lambda)^{\omega})FNR \
&\sim (1-(1-\omega\lambda))FNR \
&= \omega\lambda FNR. \end{align*} $$

for small $\lambda$.

FNPT vs FNR

The advantage of $FNPT$ over $FNR$ is that in testings of biased groups one can be confronted with correlated cases with an increased chance of multiple infections within a group, w.r.t. the entire population. $FNPT$ therefore captures this situation by considering the upper bound rather than the average.

Simulation

from math import ceil, pow
from scipy.special import comb
import altair as alt
import pandas as pd

alt.renderers.enable('default')


def pooling_model(ps, r, lambd, gamma_neg, gamma_pos):
    p = 1 - pow(1-lambd, ps)
    tst_pos = p*(1-gamma_neg-gamma_pos) + gamma_pos
    rep_pos = sum([comb(r, i)*pow(tst_pos, i)*pow(1-tst_pos, r-i)
                   for i in range(ceil(r/2), r + 1)])
    Q = rep_pos + r/ps
    PPT = 1/Q
    gamma_star = sum([comb(r, i)*pow(gamma_neg, i)*pow(1-gamma_neg, r-i)
                      for i in range(ceil(r/2), r + 1)])
    FNPT = p*(gamma_star + (1-gamma_star)*gamma_neg)
    return {"Pool Size": ps, "PPT": PPT, "FNPT": FNPT,
            "Infection Rate": lambd, "r": "r=" + str(r)}


def model_run(pool_sizes,  rep_sizes, infection_levels, gamma_neg, gamma_pos):
    out = pd.DataFrame()
    for lambd in infection_levels:
        for ps in pool_sizes:
            for r in rep_sizes:
                out = out.append([pooling_model(ps, r, lambd,
                                                gamma_neg, gamma_pos)])
    return out


def pool_size_ppt_fnpt_chart(source):
    chart = alt.hconcat(data=source)
    for y_encoding in ['PPT:Q', 'FNPT:Q']:
        chart |= alt.Chart(source).mark_line().encode(
                    x='Pool Size',
                    y=y_encoding,
                    color=alt.Color('r',
                                    legend=alt.Legend(title="Replication"),
                                    scale=alt.Scale(scheme='dark2')
                                    )
                )
    return chart

"""Model params"""

pool_sizes = range(1, 100)
rep_sizes = [1, 3, 5]
infection_levels = [0.01]
gamma_neg = 0.02
gamma_pos = 0.0012

model1_output = model_run(pool_sizes, rep_sizes, infection_levels,
                          gamma_neg, gamma_pos)
pool_size_ppt_fnpt_chart(model1_output)

Results

(a) Increase of test efficiency in persons per test, $PPT$. The maximum of this curve indicates the optimal pool size, $\omega^{opt}$ for a given infection rate, and given false negative and positive rates of the test.The maximum efficiency gain is naturally found for $r=1$ and is about $5.1$ persons per test.

(b) False negative factor for the pooled sample, $FNPT$. The result shows that taking more replicates decreases the false negatives. However, note that this also decreases the efficiency to $4.3 PPT$. $\gamma_+ = 0.0012$ and $\gamma_− = 0.02$.

Results for the optimal pooling size, $\omega^{opt}$, and persons per test, $PPT$, are shown in figure (a) for a populationwide infection level of 1%. In figure (b) the increase of $FNPT$ with pooling size is seen. Here we use a false negative rate of $\gamma_- = 0.02$ and a false positive rate of $\gamma_+ = 0.0012$.

References

Boosting test-efficiency by pooled testing strategies for SARS-CoV-2, Rudolf Hanel and Stefan Thurner,2020