Unbiased Estimators

Powerful ideas in Statistical Inference.

By Vamshi Jandhyala in mathematics

November 20, 2020

Point estimation and unbiased estimators

Point estimation refers to providing a single “best guess” of some quantity of interest. The quantity of interest could be a parameter in a parametric model, a cdf $F$, a probability density function $f$, a regression function $r$, or a prediction for a future value $Y$ of some random variable. By convention, we denote a point estimate of $θ$ by $\hat{θ}$ or $\hat{θ}_n$. Remember that $\theta$ is a fixed, unknown quantity. The estimate $\hat{\theta}$ depends on the data so $\hat{\theta}$ is a random variable.

More formally, let $X_1,\dots,X_n$ be $n$ iid data points from some distribution $F$. A point estimator $\hat{\theta}_n$ of a parameter $\theta$ is some function of $X_1,…,X_n$:

$$ \hat{\theta}_n = g(X_1,\dots,X_n). $$

The bias of an estimator is defined by $bias(\hat{\theta_n}) = \mathbb{E}_{\theta}(\hat{\theta}_n) − \theta$.

We say that $\hat{\theta}_n$ is unbiased if

$\mathbb{E}_\theta(\hat{\theta_n}) = \theta$.


Atanu Das, the Indian archer, is interested in finding his chances of hitting the bull’s eye. Keeping the upcoming Tokyo Olympics in mind, he has been practising a lot these days. Every day, he practices for a fixed time and notes down how many times he is hitting the bull’s eye. Assume that the above number for each day is independent and identically distributed according to the Poisson probability mass function: $f(x) = e^{-\lambda}\lambda^x/x!$, for $x = 0, 1, 2, \dots$ and $f(x) = 0$ otherwise. Mr. Das wants to estimate the parameter $λ$. He has already collected data on the number of times he hit the bull’s eye from his last $n$ practice sessions. Denote them by $X_1, \dots , X_n$ and let the joint probability of $(X_1, \dots , X_n)$ be $f_{\mathbf{X}}(X_1, \dots , X_n)$.

(a) Find an expression for $\lambda$ (in terms of $X_1, \dots , X_n)$ for which $f_\mathbf{X}(X_1, \dots , X_n)$ is maximized. Let us call it $\hat{\lambda}_1$.

(b) Show that $\hat{\lambda}_1$ is an unbiased estimate for $\lambda$.

(c) Now consider the estimate

$\hat{\lambda_2} = \frac{1}{n-1} \sum_{i=1}^n (X_i − \bar{X})^2$.

Is it unbiased for $\lambda$ as well?

(d) Mr. Das has shared the following data from his last 15 practice sessions: $17, 20, 21, 18$, $20, 25, 24, 16$, $15, 18, 23, 28$, $26, 23, 21$. What will be your estimate for $\lambda$?


a) $f_\mathbf{X}(X_1, \dots , X_n) = e^{-n\lambda}\lambda^{\sum_{i=1}^n x_i}/\prod_{i=1}^n x_i!$. $\lambda$ which maximizes $f_\mathbf{X}$ also maximizes $\log f_\mathbf{X}$. We have

$$ \log f_\mathbf{X} = -n\lambda + {\sum_{i=1}^n x_i}\log \lambda - \sum_{i=1}^n \log {x_i} $$

Differentiating $\log f_\mathbf{X}$ wrt $\lambda$ and setting it equal to $0$, we get

$$ \hat{\lambda_1} = \frac{\sum_{i=1}^n x_i}{n} $$

b) $\mathbb{E}[\hat{\lambda_1}] = \frac{\sum_{i=1}^n \mathbb{E}[X_i]}{n} = \lambda$. Therefore $\hat{\lambda_1}$ is an unbiased estimator of $\lambda$.

c) Since sample variance is an unbiased estimator of the population variance when the mean and variance are finite, we can say that $\mathbb{E}[\hat{\lambda}_2] = \lambda$.

From $\mathbb{E}[X^2] = \lambda + \lambda^2$, we get

$$ \begin{align*} \mathbb{E}(\bar{X}^2) &= \frac{1}{n^2}(n\lambda + n\lambda^2 + n(n-1)\lambda^2) \\
&= \frac{1}{n^2}(n \lambda + n^2 \lambda^2) = \frac{\lambda}{n} + \lambda^2 \\
\mathbb{E}(\bar{X}X_i) &= \frac{\mathbb{E}(X_i^2) + \sum_{k=1,k\neq i}^n\mathbb{E}(X_iX_k) }{n}\\
&= \frac{\lambda + \lambda^2 + (n-1)\lambda^2 }{n} = \frac{\lambda}{n} + \lambda^2 \end{align*} $$.


$$ \mathbb{E}[\hat{\lambda_2}] = \frac{1}{n-1} \sum_{i=1}^n \left\{\lambda + \lambda^2 - 2(\lambda^2 + \frac{\lambda}{n}) + \frac{\lambda}{n} + \lambda^2 \right\} = \lambda $$

which indicates that $\hat{\lambda}_2$ is an unbiased estimator of $\lambda$.

d) $\hat{X}$ should be preferred as an estimate for $\lambda$, So, for the given data, $\hat{\lambda} = 21$.


Let $X$ be an observation from the pdf

$$ f(x|\theta) = \begin{cases} \left(\frac{\theta}{2}\right)^{|x|}(1-\theta)^{1-|x|} & \text{if $x \in {-1,0,1}$} \\
0 &\text{ otherwise}, \end{cases} $$

where $0 \leq \theta \leq 1$.

Define an estimator $T(X)$ by

$$ T(X) = \begin{cases} 2 & \text{if $X=1$} \\
0 &\text{ otherwise}. \end{cases} $$

Show that $T(X)$ is an unbiased estimator of $\theta$. Find a better estimator than T(X) and prove that it is better


We have $\mathbb{E}[T(X)] = 2f(1) = 2 \cdot \frac{\theta}{2} = \theta$. $Var[T(X)] = 2^2f(1) - \theta^2 = 2\theta - \theta^2$. Let a new estimator $U(X)$ be defined as follows

$$ U(X) = \begin{cases} 1 & \text{if $|X|=1$} \\
0 &\text{ otherwise}. \end{cases} $$

We have, $\mathbb{E}[U(X)] = 1 \cdot f(1) + 1 \cdot f(-1) = \frac{\theta}{2} + \frac{\theta}{2} = \theta$. $Var[U(X)] = 1^2f(1) + 1^2f(-1) - \theta^2 = \theta - \theta^2$.

$U(X)$ is an unbiased estimator of $\theta$ with a lower variance than $T(X)$, therefore, $U(X)$ is a better estimator than $T(X)$.