Stock return distributions

If we we assume that stock returns follow a normal distribution, there’s a very little probability that asset returns take on very large positive values or very large negative values. In this particular case, there is only \(0.5\) percent chance to get an outcome, a daily return higher than two percent or a daily return below minus two percent.

In reality we get much larger returns compared to the Gaussian assumption. The normal distribution assumption is a simplifying assumption that under estimates the magnitude of extreme returns.

Skewness

Skewness is a measure of asymmetry of the distribution. The normal bell-shaped distribution is symmetric. So the probability of getting an outcome below the mean is exactly the same as the probability of getting an outcome above the mean.

If distribution has a negative skewness, the probability of getting an outcome below the mean is higher than the probability of an outcome above the mean. The opposite is true for a positively skewed distribution.

Skewness is given by \(\frac{\mathbb{E}[(R - \overline{R})^3]}{(\mathbb{E}[(R - \overline{R})^2])^{3/2}}\).

Code for calculating skewness

def skewness1(r):
    demeaned_r = r - r.mean()
    # use the population standard deviation, so set ddof=0
    sigma_r = r.std(ddof=0)
    exp = (demeaned_r**3).mean()
    return exp/sigma_r**3

import scipy.stats
def skewness2(r):
    return scipy.stats.skew(r)

Kurtosis

Kurtosis is a measure of the thickness of the tail of the distribution. The Gaussian distribution has very thin tails, decreases very sharply to zero, which implies that precisely, the probability of getting very large negative or positive outcomes tend to be very small with a normal distribution. Now, in reality, the return distribution tend to have fatter tails.

Kurtosis is given by \(\frac{\mathbb{E}[(R - \overline{R})^4]}{(\mathbb{E}[(R - \overline{R})^2])^2}\).

For the Gaussian distribution, the kurtosis is equal to three. Any return distribution that has a kurtosis higher than three, we call that a fat tail distribution.

Code for calculating Kurtosis

def kurtosis1(r):
    demeaned_r = r - r.mean()
    # use the population standard deviation, so set dof=0
    sigma_r = r.std(ddof=0)
    exp = (demeaned_r**4).mean()
    return exp/sigma_r**4

import scipy.stats
def kurtosis2(r):
    return scipy.stats.kurtosis(r)

Hedge Fund Returns

Hedge fund returns for example, are not normally distributed. If you look at the skewness numbers, there are a lot of them that are actually negative and also strongly negative. More often than not, the probability for hedge funds return to be below the mean is actually higher than the probability of being above the mean.

Now, when we look at the kurtosis or the excess Kurtosis, which is kurtosis minus three, then we get numbers that are very positive. More often than not, the kurtosis of the hedge fund return distribution is much higher than three. This suggests that hedge fund returns are severely non-normally distributed.

Non normality tests

There are many tests that tells us whether a given distribution is or is not statistically different from the normal distribution. One of the most commonly used is called the Jarque-Bera test. Under the null hypothesis of a normally distributed returns, when the skewness is zero, excess Kurtosis is zero, the Jarque-Bera statistics then takes the value equal to zero. The test statistic is always nonnegative. If it is far from zero, it signals the data do not have a normal distribution.

Code for checking if returns follow Normal distribution

import scipy.stats
def is_normal(r, level=0.01):
    """
    Applies the Jarque-Bera test to determine if a Series is normal or not
    Test is applied at the 1% level by default
    Returns True if the hypothesis of normality is accepted, False otherwise
    """
    if isinstance(r, pd.DataFrame):
        return r.aggregate(is_normal)
    else:
        statistic, p_value = scipy.stats.jarque_bera(r)
        return p_value > level

Semi Volatility or Semi Deviation

A real concern for investors is the probability of large loss, the probability of large negative returns on the portfolios that they’re holding.
What really bothers investors is when there’s uncertainty or volatility on the downside. Semi-volatility or semi-deviation only looks at the volatility of the returns below mean. Semi deviation is given by \(\sqrt{\frac{\sum_{R_i < \overline{R}} (R_i-\overline{R})^2}{n-1}}\) where \(n\) is the number of observations where return is less than the mean.

This is very meaningful because it has a focus on losses as opposed to gains that investors are concerned about, but that does not tell us anything about the magnitude or the extreme losses that can occur when we are below the mean, it only tells us something about the average deviation or average uncertainty when returns are below the mean.

Code for calculating Semi Volatility

def semideviation(r):
    """
    Returns the semideviation aka negative semideviation of r
    r must be a Series or a DataFrame, else raises a TypeError
    """
    if isinstance(r, pd.Series):
        is_negative = r < 0
        return r[is_negative].std(ddof=0)
    elif isinstance(r, pd.DataFrame):
        return r.aggregate(semideviation)
    else:
        raise TypeError("Expected r to be a Series or DataFrame")

Value at risk (VaR)

Value at risk focuses on the extreme downside - the big losses that can happen and that can potentially wipe out the entire portfolio. Value at risk is something like the maximum loss that can occur to a portfolio holding with a given probability.

So for example, we first define a specified confidence level, say \(99\) percent. A \(99\) percent VaR means that we are looking at the worst possible outcome after excluding the one percent extreme losses. Typically, we also specify the time period that we’re looking at. So let’s call it a \(99\) percent monthly value at risk estimate, this tells us the maximum loss that you can take in \(99\) percent of the cases over a one month period of time.

\[ VaR_{\alpha}(R) = -F_R^{-1}(1-\alpha) \]

Estimating VaR using historic returns

For a \(99\) percent value at risk from a sample of historical returns, all except the one percent worse outcomes are thrown away.

Pros

Very simple to implement.

Another advantage is that there’s no assumption about asset return distributions.

Cons

The estimate might be pretty sensitive to the sample period that is there is a fair amount of sample risk.

Code for estimating VaR

def var_historic(r, level=5):
    """
    Returns the historic Value at Risk at a specified level
    i.e. returns the number such that "level" percent of the returns
    fall below that number, and the (100-level) percent are above
    """
    if isinstance(r, pd.DataFrame):
        return r.aggregate(var_historic, level=level)
    elif isinstance(r, pd.Series):
        return -np.percentile(r, level)
    else:
        raise TypeError("Expected r to be a Series or DataFrame")

Estimating VaR using parametric Gaussian

Approaches that involve making an assumption about the return distribution are known as parametric approaches. The focus is on estimating the parameters of that return distribution.

For a Gaussian distribution, only the mean and volatility of the distribution need to be estimated.

For a confidence level of \(\alpha = 95.05 \%\), the VaR under the Gaussian distribution is given by

\[ \begin{align*} \mathbb{P}[R \leq -R_{VaR}] &= 1-\alpha \\ \implies \mathbb{P}[\frac{R - \mu}{\sigma} \leq \frac{-R_{VaR} - \mu}{\sigma}] &= 1 - \alpha \\ \implies \Phi^{-1}(1 -\alpha) &= -\frac{R_{VaR} - \mu}{\sigma} \\ \implies R_{VaR} = \mu - \sigma \Phi^{-1}(1 -\alpha) \end{align*} \]

Pros

It is very simple to implement.

Cons

The Gaussian assumption is not a good assumption for asset returns as asset returns have much fatter tails so we would be underestimating value-at-risk by using this assumption.

Parametric vs Non Parametric approaches

There are many different methods that can be used to estimate value at risk and essentially in the end of the day, it’s a trade off between sample risk and model risk. If you start using parametric assumption, you’re taking on less sample risk, but you’re introducing model risk i.e. the risk of specifying a wrong model called specification risk.

Estimating Cornish-Fisher value at risk

Cornish-Fisher value at risk is some kind of semi-parametric approach that does not force you to assume any particular return distribution. The Cornish-Fisher expansion which is something that has been found that these two statistician, Cornish and Fisher in 1937. The expansion is pretty useful because it allows you to relate the Alpha quantile of non Gaussian distribution to the Alpha quantile of the Gaussian distribution.

The Cornish-Fisher value at risk has become a very commonly used methodology for calculation, computation of value at risk estimates in non Gaussian setting.

Code for estimating VaR

from scipy.stats import norm
def var_gaussian(r, level=5, modified=False):
    """
    Returns the Parametric Gauusian VaR of a Series or DataFrame
    If "modified" is True, then the modified VaR is returned,
    using the Cornish-Fisher modification
    """
    # compute the Z score assuming it was Gaussian
    z = norm.ppf(level/100)
    if modified:
        # modify the Z score based on observed skewness and kurtosis
        s = skewness(r)
        k = kurtosis(r)
        z = (z +
                (z**2 - 1)*s/6 +
                (z**3 -3*z)*(k-3)/24 -
                (2*z**3 - 5*z)*(s**2)/36
            )
    return -(r.mean() + z*r.std(ddof=0))

Conditional Value at Risk

Actually what happens beyond value at risk might even be of a greater concern to investors as opposed to just looking at the value at risk number. Conditional value at risk is the expected loss beyond value at risk - it is the expected return, conditional upon the return being less than the value at risk number. The value at risk and conditional value at risk are defined as positive numbers.

Conditional value at risk is given by \[ CVaR = -\mathbb{E}[R|R\leq -VaR] = -\frac{\int_{-\infty}^{-VaR}xf_R(x)dx}{F_R(-VaR)} \]

Code for Estimating VaR using historic returns

def cvar_historic(r, level=5):
    """
    Computes the Conditional VaR of Series or DataFrame
    """
    if isinstance(r, pd.Series):
        is_beyond = r <= -var_historic(r, level=level)
        return -r[is_beyond].mean()
    elif isinstance(r, pd.DataFrame):
        return r.aggregate(cvar_historic, level=level)
    else:
        raise TypeError("Expected r to be a Series or DataFrame")