In this sction we briefly discuss some distributions that often come up in Statistics.

Distributions Arising in Statistics

Chisquare Distribution

Definition (2.2.1)

A random variable X is said to have a chisquare distribution with n degrees of freedom, \(X\sim \chi^2(n)\), if it has density

\[f(x|n)=\frac1{\Gamma(n/2)2^{n/2}}x^{n/2-1}e^{-x/2};x>0\]

Of course we have \(X\sim\Gamma(n/2,2)\)

pushViewport(viewport(layout = grid.layout(2, 2)))
print(ggcurve(fun=function(x) dchisq(x, 1), A=0, B=5) ,
  vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggcurve(fun=function(x) dchisq(x, 3), A=0, B=7) ,
  vp=viewport(layout.pos.row=1, layout.pos.col=2))        
print(ggcurve(fun=function(x) dchisq(x, 5), A=0, B=10) ,
  vp=viewport(layout.pos.row=2, layout.pos.col=1))
print(ggcurve(fun=function(x) dchisq(x, 7), A=0, B=20) ,
  vp=viewport(layout.pos.row=2, layout.pos.col=2))        

Say \(Z\sim N(0,1)\) and let \(X=Z^2\), then if x>0

\[ \begin{aligned} &F_X(x) =P(X<x) = P(Z^2<x) = \\ &P(-\sqrt{x}<Z<\sqrt{x}) = \\ &\int_{-\sqrt{x}}^{\sqrt{x}} \frac1{\sqrt{2\pi}} e^{-t^2/2}dt \\ &f_X(x) = \frac{d F_x(x)}{dx} = \frac{d }{dx} \int_{-\sqrt{x}}^{\sqrt{x}} \frac1{\sqrt{2\pi}} e^{-t^2/2}dt=\\ &\frac1{\sqrt{2\pi}} e^{-(\sqrt{x})^2/2} \frac{1}{2\sqrt{x}}- \frac1{\sqrt{2\pi}} e^{-(-\sqrt{x})^2/2}\frac{-1}{2\sqrt{x}} = \\ &\frac1{\sqrt{2\pi}}\frac{1}{\sqrt{x}} e^{-x/2} =\\ &\frac{1}{\Gamma(1/2)2^{1/2}}x^{1/2-1}e^{-x/2} \end{aligned} \]

so \(X\sim \chi^2(1)\)

We have the following properties of a chi-square distribution:

Theorem (2.2.2)

Say \(X\sim \chi^2(n)\), \(Y\sim \chi^2(m)\) and X and Y are independent. Then

  • \(E[X]=n\)
  • \(var(X)=2n\)
  • \(X+Y\sim \chi^2(n+m)\)

proof

\[ \begin{aligned} &E[X^k] =\int_0^\infty x^k \frac{1}{\Gamma(n/2)2^{n/2}}x^{n/2-1}e^{-x/2} dx =\\ &\frac{1}{\Gamma(n/2)2^{n/2}}\int_0^\infty x^{k+n/2-1}e^{-x/2} dx = \\ &\frac{\Gamma((2k+n)/2)2^{(2k+n)/2}}{\Gamma(n/2)2^{n/2}}\int_0^\infty \frac{1}{\Gamma((2k+n)/2)2^{(2k+n)/2}} x^{(2k+n)/2-1}e^{-x/2} dx = \\ &\frac{\Gamma(k+n/2)2^{k+n/2}}{\Gamma(n/2)2^{n/2}} = \\ &\frac{(k+n/2-1)(k+n/2-2)..n/2\Gamma(n/2)2^{k}}{\Gamma(n/2)} = \\ &(k+n/2-1)(k+n/2-2)..(n/2)2^k\\ &E[X] = n/2\times 2 =n \\ &var(X) = E[X^2]-E[X]^2= \\ &(n/2+1)(n/2)2^2 - n^2 = n^2+2n-n^2=2n \end{aligned} \] For the last part we use the convolution formula:

\[ \begin{aligned} &f_{X+Y}(z) =\int_{-\infty}^\infty f_X(t)f_Y(z-t)dt= \\ &\int_{0}^z f_X(t)f_Y(z-t)dt= (0<t<z) \\ &\int_{0}^z \frac1{\Gamma(n/2)2^{n/2}} t^{n/2-1}e^{-t/2}\frac1{\Gamma(m/2)2^{m/2}} (z-t)^{m/2-1}e^{-(z-t)/2} dt = \\ &\frac1{\Gamma(n/2)2^{n/2}}\frac1{\Gamma(m/2)2^{m/2}}e^{-z/2} \int_{0}^z t^{n/2-1}(z-t)^{m/2-1}dt=(u=t/z, du=dt/z)\\ &\frac1{\Gamma(n/2)\Gamma(m/2)2^{(n+m)/2}}e^{-z/2} \int_{0}^1 (zu)^{n/2-1}(z-zu)^{m/2-1}zdu =\\ &\frac1{\Gamma(n/2)\Gamma(m/2)2^{(n+m)/2}}e^{-z/2} z^{n/2+m/2-1}\int_{0}^1 u^{n/2-1}(1-u)^{m/2-1}du =\\ &\frac1{\Gamma((n+m)/2)2^{(n+m)/2}}z^{(n+m)/2-1}e^{-z/2} \int_{0}^1\frac{\Gamma((n+m)/2)}{\Gamma(n/2)\Gamma(m/2)} u^{n/2-1}(1-u)^{m/2-1}du =\\ &\frac1{\Gamma((n+m)/2)2^{(n+m)/2}}z^{(n+m)/2-1}e^{-z/2} \end{aligned} \]

because the last integrand is a Beta density.

From this theorem it follows that if \(Z_1,..,Z_n\) are iid N(0,1), then \(\sum Z_i^2\sim \chi^2(n)\).

Definition (2.2.3)

Say X1, .., Xn are a sample, then the sample variance is defined by

\[s^2=\frac1{n-1}\sum_{i=1}^n (x_i-\bar{x})^2\]

Theorem (2.2.4)

Say \(X_1, ..., X_n are iid N(\mu,\sigma)\), then

\[(n-1)S^2/\sigma^2\sim \chi^2(n-1)\]

proof

First note that

\[ \begin{aligned} &\sum_{i=1}^n (x_i-\bar{x})^2 = \\ &\sum_{i=1}^n (x_i-\mu+\mu-\bar{x})^2 = \\ &\sum_{i=1}^n \left[(x_i-\mu)^2+2(x_i-\mu)(\mu-\bar{x})+(\mu-\bar{x})^2\right] = \\ &\sum_{i=1}^n(x_i-\mu)^2+2(\mu-\bar{x})\sum_{i=1}^n(x_i-\mu)+n(\mu-\bar{x})^2=\\ &\sum_{i=1}^n(x_i-\mu)^2+2(\mu-\bar{x})(\sum_{i=1}^nx_i-n\mu)+n(\mu-\bar{x})^2=\\ &\sum_{i=1}^n(x_i-\mu)^2+2(\mu-\bar{x})(n\bar{x}-n\mu)+n(\mu-\bar{x})^2=\\ &\sum_{i=1}^n(x_i-\mu)^2-2n(\mu-\bar{x})^2+n(\mu-\bar{x})^2=\\ &\sum_{i=1}^n(x_i-\mu)^2-n(\mu-\bar{x})^2 \end{aligned} \]

Now we know that \(\frac{X_i-\mu}{\sigma}\sim N(0,1)\), and so \(\sum_{i=1}^n\left(\frac{X_i-\mu}{\sigma}\right)^2\sim \chi^2(n)\). Also \(\bar{X}\sim N(\mu,\sigma/\sqrt{n})\), and so \(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim N(0,1)\) and \(\frac{n(\bar{X}-\mu)^2}{\sigma^2}\sim\chi^2(1)\). Finally

\[(n-1)s^2/\sigma^2=\sum_{i=1}^n\left(\frac{X_i-\mu}{\sigma}\right)^2-\frac{n(\bar{X}-\mu)^2}{\sigma^2}\sim\chi^2(n-1)\]

Note: we use “n-1” instead of “n” because then s2 is an unbiased estimator of \(\sigma^2\), that is \(E[s^2]=\sigma^2\).

Note: another important feature here is that \(\bar{x}\perp s^2\).

Student’s t Distribution (by W.S. Gosset)

Definition (2.2.5)

Say \(X\sim N(0,1), Y\sim\chi^2(n)\) and \(X \perp Y\). Then

\[T_n=X/\sqrt{Y/n}\]

has a Student’s t distribution with n degrees of freedom, \(T_n\sim t(n)\), that is

\[f(t|n)=\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi n}}\frac1{(1+t^2/n)^{(n+1)/2}}\]

Note

\[\frac1{(1+t^2/n)^{(n+1)/2}}=\frac1{(1+\frac{t^2/2}{n/2})^{n/2}}\frac1{(1+\frac{t^2/2}{n/2})^{1/2}}\rightarrow_{n\rightarrow\infty}e^{-t^2/2}\]

so \(T_n \rightarrow N(0,1)\) in distribution.

We have \(E[T_n]=0\) if n>1 (and does not exist if n=1) and \(var(T_n)=n/(n-2)\) if n>2 (and does not exist if \(n\le 2\)).

The importance of this distribution in Statistics comes from the following:

Theorem (2.2.6)

say \(X_1, ..., X_n\sim N(\mu,\sigma)\). Then

\[\sqrt n \frac{\bar{X}-\mu}{s}\sim t(n-1)\]

Note: \(s\) is of course an estimate of the population standard deviation, so this formula tries to standardize the sample mean without knowing the exact standard deviation.

Snedecor’s F distribution

Definition (2.2.7)

X is said to have an F distribution with n and m degrees of freedom, \(X\sim F(n,m)\), if

\[f(x;n,m)=\frac{\Gamma((n+m)/2)}{\Gamma(n/2)\Gamma(m/2)}(\frac{n}m)^{n/2}\frac{x^{n/2-1}}{(1+nx/m)^{(n+m)/2}}\]

if x>0

Theorem (2.2.8)

Say \(X\sim \chi^2(n),Y\sim \chi^2(m)\), independent, then the random variable \(F=\frac{X/n}{Y/m}\sim F(n,m)\).

We have \(E[F] = m/(m-2)\) (no mention of n!)

Theorem (2.2.9)

Say \(X_1, ..., X_n\sim N(\mu_x,\sigma_x)\) and \(Y_1, ..., Y_m\sim N(\mu_y,\sigma_y)\). Furthermore \(X_i,Y_j\) are independent for all i and j. Then

\[F=\frac{s_x^2/\sigma_x^2}{s_y^2/\sigma_y^2}\sim F(n-1,m-1)\]


A very nice tool describing these and many other distributions as well as their relationships was created by Lawrance M. Leemis and Raghu Pasupathy and is described in their Chance August 2019 article “The ties that bind” can be found at http://www.math.wm.edu/~leemis/chart/UDR/UDR.html.

Order Statistics

Many statistical methods, for example the median and the range, are based on an ordered data set. In this section we study some of the common distributions of order statistics.

One of the difficulties when dealing with order statistics are ties, that is the same observation appearing more than once. This should only occur for discrete data because for continuous data the probability of a tie is zero. They may happen anyway because of rounding, but we will ignore them in what follows.

Say X1, .., Xn are iid with density f. Then X(i) is the ith order statistics if X(1)< … < X(i) < … <X(n)

Note \(X_{(1)} = \min \{X_i\}\) and \(X_{(n)} = \max \{X_i\}\).

Let’s find the density of X(i). For this let Y be a r.v. that counts the number of \(X_j \le x\) for some fixed number x. We can think of Y as the number of “successes” of n independent Bernoulli trials with success probability \(p = P(X_i \le x) = F(x)\) for i=1,..,n.

So Y~Bin(n,F(x)). Note also that the event \(\{Y\ge i\}\) means that more than i observations are less or equal to x, so the ith largest is less or equal to x. Therefore

\[ \begin{aligned} &F_{X_{(i)}}(x) = P(X_{(i)}\le x) = \\ &P(Y\ge i) = \sum_{k=i}^n {n\choose k}F(x)^k(1-F(x))^{n-k} \end{aligned} \]

with that we find

\[ \begin{aligned} &\frac{d F_{X_{(i)}}(x)}{dx} = \frac{d}{dx} \sum_{k=i}^n {n\choose k}F(x)^k(1-F(x))^{n-k}\\ &\sum_{k=i}^n \frac{d}{dx}\left[ {n\choose k}F(x)^k(1-F(x))^{n-k} \right] = \\ &\sum_{k=i}^n {n\choose k} \left[ kF(x)^{k-1}f(x)(1-F(x))^{n-k}+F(x)^{k}(n-k)(1-F(x))^{n-k-1}(-f(x)) \right] = \\ &\sum_{k=i}^n {n\choose k} \left[ kF(x)^{k-1}(1-F(x))^{n-k}-(n-k)F(x)^{k}(1-F(x))^{n-k-1} \right]f(x) \end{aligned} \]

to simplify the notation for a while let’s set \(t=F(X)\). Also note that the last term \(f(x)\) does not depend on k, and so we have the sum

\[ \begin{aligned} &\sum_{k=i}^n {n\choose k} \left[ kt^{k-1}(1-t)^{n-k}-(n-k)t^{k}(1-t)^{n-k-1} \right] = \\ &\sum_{k=i}^n {n\choose k} kt^{k-1}(1-t)^{n-k}-\sum_{k=i}^{n-1}{n\choose k}(n-k)t^{k}(1-t)^{n-k-1} = \text{}\{n-n=0\}\\ &{n\choose i} it^{i-1}(1-t)^{n-i}+\\ &\sum_{k=i+1}^n {n\choose k} kt^{k-1}(1-t)^{n-k}-\sum_{k=i}^{n-1}{n\choose k}(n-k)t^{k}(1-t)^{n-k-1} = \\ &\frac{n!}{(n-i)!i!} it^{i-1}(1-t)^{n-i}+\\ &\sum_{k=i}^{n-1} {n\choose k+1} (k+1)t^{k}(1-t)^{n-k-1}-\sum_{k=i}^{n-1}{n\choose k}(n-k)t^{k}(1-t)^{n-k-1} \end{aligned} \]

where the last equality follows from a change of summation index.

Note that

\[ \begin{aligned} &{n\choose k+1} (k+1) = \frac{n!(k+1)}{(n-k-1)!(k+1)!}=\frac{n!}{(n-k-1)!k!}\\ &{n\choose k} (n-k) = \frac{n!(n-k)}{(n-k)!(n-(n-k))!}=\frac{n!}{(n-k-1)!k!} \end{aligned} \] and so the two sums are actually the same and therefore cancel out. So we find

\[f_{X_{(i)}}(x) = \frac{n!}{(n-i)!(i-1)!} F(x)^{i-1}(1-F(x))^{n-i}f(x)\]

Example (2.2.10)

Say X1, .., Xn are iid U[0,1]. Then for 0<x<1 we have f(x)=1 and F(x)=x. Therefore

\[f_{X_{(1)}}(x)=\frac{n!}{(i-1)!(n-i)!}x^{i-1}(1-x)^{n-i}=\] \[\frac{\Gamma(n+1)}{\Gamma(i)\Gamma(n-i+1)}x^{i-1}(1-x)^{(n-i+1)-1}\]

so \(X_{(1)}\sim Beta(i, n-i+1)\). Therefore

  • \(E[X_{(1)}]=\frac{i}{n+1}\)
  • \(var(X_{(1)})=\frac{i(n-i+1)}{(n+1)^2(n+2)}\)

Empirical Distibution Function

The empirical distribution function of a sample \(X_1, .., X_n\) is defined as follows:

\[\hat{F}(x)=\frac1n \sum_{i=1}^n I_{(-\infty, x]}(X_i)=\frac{\# X_i\le x}{n}\] so it is the sample equivalent of the regular distribution function:

  • \(F(x)=P(X\le x)\) is the probability that the rv \(X \le x\).

  • \(\hat{F}(x)\) is the proportion of \(X_1, .., X_n \le x\).

df <- data.frame(x = rnorm(10))
ggplot(df, aes(x)) + 
  stat_ecdf(geom = "step") +
  stat_function(fun=pnorm) +
  xlim(c(-3, 3))

df <- data.frame(x = rnorm(100))
ggplot(df, aes(x)) + 
  stat_ecdf(geom = "step") +
  stat_function(fun=pnorm)

Let \(Z_i=I_{(-\infty, x]}(X_i)\), then \(P(Z_i=1)=P(X_i\le x)=F(x)\), and so \(Z_i\sim Ber(F(x))\). \(X_1,...,X_n\) are independent, and therefore \(\sum_{i=1}^nZ_i\sim Bin(n,F(x))\). By the weak law of large number

\[\hat{F}(x)=\frac1n \sum_{i=1}^nZ_i\rightarrow E[Z_1]=F(x)\] in probability. By the central limit theorem

\[\sqrt{n}\frac{\hat{F}(x)-F(x)}{\sqrt{F(x)(1-F(x))}}\rightarrow N(0,1)\]

in distribution.

Exponential Family

Definition

A distribution is said to belong to the exponential family if its density can be written as

\[ f(x;\theta) =h(x)\exp \left\{ \theta^T T(x) -A(\theta) \right\} \] where

  • \(\theta\) is a vector of parameters
  • \(T(x)\) is a vector of sufficient statistics
  • A is a function of \(\theta\) alone and h is a function of x alone

we have

\[ \begin{aligned} &\int f(x;\theta) dx = \\ &\int h(x)\exp \left\{ \theta^T T(x) -A(\theta) \right\} dx = \\ & \exp \left\{ -A(\theta) \right\} \int h(x)\exp \left\{ \theta^T T(x) \right\} dx = 1\\ \end{aligned} \]

so

\[ A(\theta) = \log \left[ \int h(x)\exp \left\{ \theta^T T(x) \right\} dx\right] \]

Example (2.2.11)

  • Bernoulli

\[ \begin{aligned} & f(x;p) = p^x (1-p)^{1-x}\\ & \exp \left\{ x \log p + (1-x) \log (1-p) \right\} = \\ & \exp \left\{ x (\log p - \log (1-p)) + \log (1-p) \right\} =\\ & \exp \left\{ x \log \frac{p}{1-p} + \log (1-p) \right\}\\ &\exp \left\{ x\theta - \log(1+e^\theta)\right\} \end{aligned} \] where

\[ \begin{aligned} & \theta = \log \frac{p}{1-p}\\ & h(x) = 1 \\ & T(x) = x\\ &A(\theta) = -\log(1+e^\theta) \end{aligned} \]

because

\[ \begin{aligned} &\theta =\log \frac{p}{1-p} \\ &e^\theta = \frac{p}{1-p} \\ &p = \frac{e^\theta}{1+e^\theta}\\ &1-p = \frac{1}{1+e^\theta}\\ &\log(1-p) = -\log(1+e^\theta)\\ \end{aligned} \]

  • Normal

\[ \begin{aligned} &\frac{1}{\sqrt{2\pi \sigma^2}} \exp \left\{ -\frac1{2 \sigma^2} \left( x-\mu \right)^2 \right\} = \\ & \frac{1}{\sqrt{2\pi }}\exp \left\{- \frac1{2 \sigma^2} \left( x^2-2x \mu + \mu^2 \right)- \log \sigma \right \} = \\ &\frac{1}{\sqrt{2\pi }}\exp \left\{ -\frac{x^2}{ \sigma^2} + \frac{x \mu}{ \sigma^2} - \frac{\mu^2}{2 \sigma^2} - \log \sigma \right \} \end{aligned} \] so

\[ \begin{aligned} & \theta = (\mu/\sigma^2, -1/(2\sigma^2)^T\\ & h(x) = \frac{1}{\sqrt{2\pi }} \\ & T(x) = (x, x^2)^T\\ &A(\theta) = \frac{\mu^2}{2 \sigma^2} + \log \sigma = \\ &-\theta_1^2/(4\theta_2)-\frac12 \log (-2 \theta_2) \end{aligned} \]

Theorem (2.2.12)

The product of exponential families is an exponential family

proof

\[ \begin{aligned} &h_1(x)\exp \left\{ \theta_1^T T_1(x) -A_1(\theta) \right\}h_2(x)\exp \left\{ \theta_2^T T_2(x) -A_2(\theta) \right\} = \\ &h_1(x)h_2(x)\exp \left\{ \theta_1^T T_1(x) + \theta_2^T T_2(x) -A_1(\theta_1)-A_2(\theta_2) \right\} =\\ &h(x)\exp \left\{ \psi^T S(x) -A(\psi)\right\} \end{aligned} \]

where

\[ \begin{aligned} &h(x) = h_1(x)+h_2(x)\\ &\psi =(\theta_1, \theta_2)^T \\ &S = (T_1, T_2)^T\\ &A(\psi) = A_1(\theta_1)+A_2(\theta_2) \end{aligned} \] The importance of exponential families is that they share many properties and that many theorems can be proven for all of them simultaneously.