clt.knit

Central Limit Theorems

Recall: a random variable \(X\) is said to be normally distributed with mean \(\mu\) and variance \(\sigma^2\) if it has density:

\[f(x)=\frac1{\sqrt{2\pi\sigma^2}}\exp \left\{-\frac1{2\sigma^2} \left(x-\mu \right)^2\right\}\]

We use the symbol \(\Phi\) for the distribution function and \(\phi\) for the density of a standard normal r.v.

Let \(X_1 , X_2 , ...\) be a sequence of r.v.’s with means \(E[X_i ]=\mu_i\) and \(var(X_i )=\sigma_i^2\). Let \(\bar{X_n}\) be the sample mean of the first n observations. Then a central limit theorem would assert that

\[P \left(\frac1{\sqrt n}\sum_{i=1}^n \frac{X_i-\mu_i}{\sigma_i}\le x \right)\rightarrow \Phi(x)\] for all \(x\in \mathbb{R}\), or that this standardized sum converges to a standard normal in distribution.

Note that plural “s” in the title. As with the laws of large number there are many central limit theorems, all with different conditions on

dependence between the \(X_i\)’s
\(\mu_i\)’s
\(\sigma_i\)’s

as a rough guide we have to have some combination of

not to strong a dependence
\(\mu_i \rightarrow \mu\) finite
\(\sigma_i\) goes neither to 0 nor to \(\infty\) to fast

The version of the CLT for Bernoulli rv’s is famous all by itself, it is called the DeMoivre-Laplace theorem. It was the first CLT with a rigorous proof.

Theorem (3.4.1)

DeMoivre-Laplace

let \(\{X_n \}\) be independent rv’s with \(X_i\sim Ber(p)\). Let \(S_n = \sum_{i=1}^n X_i\) and let \(Z \sim N(0,1)\) then

\[\frac{S_n-np}{\sqrt{np(1-p)}}\rightarrow Z\] in distribution.

The theorem was proven by Abraham de Moivre in 1738 for the case \(p=1/2\), and generalized to \(p \ne 1/2\) by Pierre-Simon Laplace in his famous book Theorie Analytique des Probabilites, published in 1812.

proof

We begin by showing that if for a large n we let k be in the neighborhood of np, then \[{{n}\choose{k}}p^kq^{n-k}\approx \frac1{\sqrt{2npq}}\exp \left\{-\frac{(k-np)^2}{2npq} \right\}\]

We will make use Sterling’s formula for n!:

\[n!\approx n^ne^{-n}\sqrt{2\pi n}\] Now

\[ \begin{aligned} &{{n}\choose{k}}p^kq^{n-k} = \frac{n!}{(n-k)!k!}p^kq^{n-k}\approx \\ &\frac{n^ne^{-n}\sqrt{2\pi n}}{(n-k)^{n-k}e^{-(n-k)}\sqrt{2\pi (n-k)}k^ke^{-k}\sqrt{2\pi k}}p^kq^{n-k} = \\ &\sqrt{\frac{n}{2\pi(n-k)k}}\frac{n^n}{(n-k)^{n-k}k^k}p^kq^{n-k} = \\ &\sqrt{\frac{n}{2\pi(n-k)k}} \left(\frac{np}{k} \right)^k\left(\frac{nq}{n-k} \right)^{n-k}=\\ &\sqrt{\frac{n}{2\pi(n-k)k}} \left(\frac{k}{np} \right)^{-k}\left(\frac{n-k}{nq} \right)^{-(n-k)} \end{aligned} \]

Define \(x=\frac{k-np}{\sqrt{npq}}\), then

\[ \begin{aligned} &1+x\sqrt{\frac{q}{np}} = 1+\frac{k-np}{\sqrt{npq}}\sqrt{\frac{q}{np}}=1+\frac{k-np}{np}=\frac{k}{np}\\ &1-x\sqrt{\frac{p}{nq}} = 1-\frac{k-np}{\sqrt{npq}}\sqrt{\frac{p}{nq}}=\\ &1-\frac{k-np}{nq}=\frac{nq-k+n(1-q)}{nq}=\frac{n-k}{nq} \end{aligned} \]

and then

\[ \begin{aligned} &{{n}\choose{k}}p^kq^{n-k}\approx \\ &\sqrt{\frac{n}{2\pi(n-k)k}} \left(\frac{k}{np} \right)^{-k}\left(\frac{n-k}{nq} \right)^{-(n-k)} = \\ &\sqrt{\frac{n}{2\pi(n-k)k}} \left(1+x\sqrt{\frac{q}{np}} \right)^{-k}\left(1-x\sqrt{\frac{p}{nq}}\right)^{-(n-k)} = \\ &\sqrt{\frac{1}{2\pi n\frac{n-k}{n}\frac{k}{n}}} \left(1+x\sqrt{\frac{q}{np}} \right)^{-k}\left(1-x\sqrt{\frac{p}{nq}}\right)^{-(n-k)} = \\ &\sqrt{\frac{1}{2\pi n(1-\frac{k}{n})\frac{k}{n}}} \left(1+x\sqrt{\frac{q}{np}} \right)^{-k}\left(1-x\sqrt{\frac{p}{nq}}\right)^{-(n-k)} \approx \\ &\frac{1}{\sqrt{2\pi npq}} \left(1+x\sqrt{\frac{q}{np}} \right)^{-k}\left(1-x\sqrt{\frac{p}{nq}}\right)^{-(n-k)} \end{aligned} \]

where we use the fact that \(k\approx np\) or \(k/n\approx p,1-k/n\approx q\), and so we have the first term in the expression in the theorem.

\[ \begin{aligned} &\left(1+x\sqrt{\frac{q}{np}} \right)^{-k}\left(1-x\sqrt{\frac{p}{nq}}\right)^{-(n-k)} = \\ &\exp \left\{\log \left[\left(1+x\sqrt{\frac{q}{np}} \right)^{-k}\left(1-x\sqrt{\frac{p}{nq}}\right)^{-(n-k)} \right] \right\} = \\ &\exp \left\{(-k)\log \left(1+x\sqrt{\frac{q}{np}} \right)-(n-k)\log\left(1-x\sqrt{\frac{p}{nq}}\right) \right\} = \\ &\exp \left\{-(np+x\sqrt{npq})\log \left(1+x\sqrt{\frac{q}{np}} \right)-(nq-x\sqrt{npq})\log\left(1-x\sqrt{\frac{p}{nq}}\right) \right\} \end{aligned} \]

where the last expression follows \(x=\frac{k-np}{\sqrt{npq}}\), so \(k=np+x\sqrt{npq}\) and

\(n-k=n-np-x\sqrt{npq}=nq-x\sqrt{npq}\).

Next we will use the Taylor expansion of \(\log(1\pm x)\), which says if x is close to 0 then

\[\log(1\pm x) \approx \pm x-x^2/2\]

\[ \begin{aligned} &\exp \left\{-(np+x\sqrt{npq})\log \left(1+x\sqrt{\frac{q}{np}} \right)-(nq-x\sqrt{npq})\log\left(1-x\sqrt{\frac{p}{nq}}\right) \right\} \approx \\ &\exp \left\{-(np+x\sqrt{npq}) \left(x\sqrt{\frac{q}{np}}-\frac{qx^2}{2np} \right)-(nq-x\sqrt{npq})\left(-x\sqrt{\frac{p}{nq}}-\frac{px^2}{2nq}\right) \right\} \end{aligned} \]

Consider these terms one by one:

\[ \begin{aligned} &(np+x\sqrt{npq})\left(x\sqrt{\frac{q}{np}}-\frac{qx^2}{2np} \right) =\\ &npx\sqrt{\frac{q}{np}} + x\sqrt{npq}x\sqrt{\frac{q}{np}}-np\frac{qx^2}{2np}-x\sqrt{npq}\frac{qx^2}{2np}=\\ &x\sqrt{npq} + qx^2-\frac{qx^2}{2}-\frac{\sqrt{q^3}x^3}{2\sqrt{np}}=\\ &\sqrt{npq}x+\frac{qx^2}{2}-\frac{\sqrt{q^3}x^3}{2\sqrt{np}}\\ &\\ &(nq-x\sqrt{npq})\left(-x\sqrt{\frac{p}{nq}}-\frac{px^2}{2nq}\right)=\\ &-nqx\sqrt{\frac{p}{nq}}+x\sqrt{npq}x\sqrt{\frac{p}{nq}}-nq\frac{px^2}{2nq}+x\sqrt{npq}\frac{px^2}{2nq}= \\ &-\sqrt{npq}x+px^2-\frac{px^2}{2}+\frac{\sqrt{p^3}x^3}{2\sqrt{nq}}= \\ &-\sqrt{npq}x+\frac1{2}px^2+\frac{\sqrt{p^3}x^3}{2\sqrt{nq}} \end{aligned} \] Replacing this in the expression above yields

\[ \begin{aligned} &\exp \left\{ -\sqrt{npq}x-\frac1{2}qx^2+\frac{\sqrt{q^3}x^3}{2\sqrt{np}}+\sqrt{npq}x-\frac1{2}px^2-\frac{\sqrt{p^3}x^3}{2\sqrt{nq}}\right\} = \\ &\exp \left\{ -\frac1{2}(p+q)x^2+\frac{\sqrt{q^3}x^3}{2\sqrt{np}}-\frac{\sqrt{p^3}x^3}{2\sqrt{nq}}\right\} = \\ &\exp \left\{ -\frac1{2}x^2+\frac{\sqrt{q^3}x^3}{2\sqrt{np}}-\frac{\sqrt{p^3}x^3}{2\sqrt{nq}}\right\} \rightarrow_{n\rightarrow \infty} \\ &\exp \left\{ -\frac1{2}x^2\right\} = \\ &\exp \left\{ -\frac1{2}\left(\frac{k-np}{\sqrt{npq}}\right)^2\right\}=\\ &\exp \left\{ -\frac1{2}\frac{(k-np)^2}{npq}\right\} \end{aligned} \]

Finally we can finish the proof. For this let \(y\in\mathbb{R}\), \(n\ge 1\) and \(k(y)=\lfloor np+y\sqrt{npq}\rfloor\), and note that \(k(\frac{x-np}{\sqrt{npq}})=x\). Now

\[ \begin{aligned} &\sum_{j=0}^{k(y)} {{n}\choose{j}}p^jq^{n-j} = \\ &\sum_{j=0}^{k(y)} \frac1{\sqrt{2\pi npq}} \exp \left\{ -\frac1{2}\frac{(j-np)^2}{npq}\right\} = \\ &\sum_{j=0}^{k(y)} \frac1{\sqrt{2\pi}} \exp \left\{ -\frac1{2}\frac{(j-np)^2}{npq}\right\} \left( \frac{j+1}{\sqrt{ npq}}-\frac{j}{\sqrt{npq}}\right) \approx \\ &\int_{-\infty}^{k(y)} \frac1{\sqrt{2\pi}} \exp \left\{ -\frac1{2}\frac{(x-np)^2}{npq}\right\} dx =\\ &\int_{-\infty}^y \frac1{\sqrt{2\pi}} e^{-t^2/2}dt \end{aligned} \]

and we are done!

Notice that the proof here is still incomplete in two ways: first we did not discuss the remainder term of the Taylor polynomial and second we should have been more precise about the as \(n\rightarrow \infty\) \(x\rightarrow 0\) part.

Example (3.4.2)

\[ \begin{aligned} &P \left( \frac{\sum_{i=1}^nX_i-np}{\sqrt{npq}}\le x\right) = \\ &P \left(\sum_{i=1}^nX_i\le np+x\sqrt{npq} \right) = \\ &\sum_{i=0}^{\lfloor np+x\sqrt{npq}\rfloor} {{n}\choose{i}}p^iq^{n-i}=\\ &pbinom(\lfloor np+x\sqrt{npq}\rfloor, n, p) \end{aligned} \]

the following graph shows these probabilities for \(p=1/2,x=1\) together with \(\Phi(1)=0.8413\) for n=1:1:500:

Here is the most basic version of a general CLT:

Theorem (3.4.3)

Liapunov 1901

Say \(\{X_n;n=1,.2,... \}\) are independent and identically distributed with mean \(\mu\) and standard deviation \(\sigma\). Moreover the mgf of \(X_n\) exists in an open neighborhood of 0. Then

\[P \left( \sqrt n\frac{\bar{X}_n-\mu}{\sigma}\le x\right)\rightarrow \Phi(x)\] for all \(x \in \mathbb{R}\).

proof

We will show that the mgf’s of \(\sqrt{n}(\bar{X}_n -\mu)/\sigma\) converge to the mgf of a standard normal rv.

Let \(Y_n =(X_n -\mu)/\sigma\), then

\[\sqrt{n}(\bar{X}_n -\mu)/\sigma = 1/\sqrt{n} \sum Y_i\]

and so

\[ \begin{aligned} &\psi_{\sqrt n\frac{\bar{X}_n-\mu}{\sigma}}(t) = E \left[\exp \left\{ t\sqrt n\frac{\bar{X}_n-\mu}{\sigma}\right\} \right] =\\ &E \left[\exp \left\{ t/\sqrt{n} \sum Y_i\right\} \right]=\\ &\prod_{i=1}^n E \left[\exp \left\{ (\frac{t}{\sqrt{n}}) Y_i\right\} \right] = \\ & \left(\psi_{Y_1}(\frac{t}{\sqrt{n}}) \right)^n \end{aligned} \]

We now expand this into a Taylor series:

\[ \begin{aligned} &\psi_{Y_1}(\frac{t}{\sqrt{n}}) = \\ &\sum_{i=0}^\infty \psi_{Y_1}^{(i)}(0)\frac{(\frac{t}{\sqrt{n}})^i}{i!} = \\ &1+0\times(\frac{t}{\sqrt{n}})/1!+1\times (\frac{t}{\sqrt{n}})^2/2! +R(\frac{t}{\sqrt{n}}) = \\ &1+\frac{t^2}{2n}+R(\frac{t}{\sqrt{n}}) \end{aligned} \]

because \(\psi_{Y_1}(0) = E[Y_1^0]=E[1]=1]\), \(\psi^{(1)}_{Y_1}(0) = E[Y_1^1]=0\) and \(\psi^{(2)}_{Y_1}(0) = E[Y_1^2]=var(Y_1)=1]\).

An application of Taylor’s theorem shows the remainder term

\[nR(t/\sqrt{n}) \rightarrow 0\text{ as }n\rightarrow \infty\]

\[\left(\psi_{Y_1}(\frac{t}{\sqrt{n}}) \right)^n =\\ \left(1+\frac{t^2}{2n}+R(\frac{t}{\sqrt{n}}) \right)^n=\\\left(1+\frac{t^2/2+nR(\frac{t}{\sqrt{n}})}{n} \right)^n\rightarrow e^{t^2/2}\]

Example (3.4.4)

Maybe the most important quantity in Statistics is the sample mean \(\bar{X}=1/n\sum X_i\) . Here is an example: say the ages of people in a town have some distribution with mean 31.37 and standard deviation 12.34. If we randomly select a person, what is the probability that person is over 35 years old?

We have a rv X with \(\mu=31.37\) and \(\sigma=12.34\). We want \(P(X>35.0)\) but we don’t know the density of X, so there is no way to do this.

Let’s say we could sample 25 people, what is the probability that their mean age is over 35? Now we want

\[P(\bar{X}>35.0)\]

and we have

\[ \begin{aligned} &P(\bar{X}>35.0) = \\ &P\left(\sqrt{25}\frac{\bar{X}-31.37}{12.34}>\sqrt{25}\frac{35-31.37}{12.34}\right) = \\ &P(Z>1.47) = \\ &1-\Phi(1.47) \end{aligned} \]

round(1-pnorm(1.47), 4)

## [1] 0.0708

Example (3.4.5)

Say we want to do a mail survey, that is we send letters with questionnaires to randomly selected people and hope they fill it out and send it back. From long experience it is known that such surveys have a “return rate” of about \(25\%\), that is only 1 in 4 people send their survey back. How many surveys do we need to send out to be \(99\%\) sure to get more than 100 back?

Say we send out n questionnaires. Let the rv \(X\) be the number of questionnaires we get back, then \(X \sim Bin(n,0.25)\). We need to solve the equation \(P(X>100) = 0.99\).

How do we find n? Note that

\[\mu_X = np = 0.25n\] and
\[\sigma_X = \sqrt{npq} = \sqrt{n0.25\times 0.75} = 0.433\sqrt{n}\]

and so \(X \sim N(0.25n,0.433\sqrt{n})\)

We need n such that

\[0.99 = P(X>100) = 1-P(X\le 100)\]

\[P(X \le 100)=0.01\]

so by the Moivre-Laplace theorem

\[0.01=P \left( \frac{X-0.25n}{0.433\sqrt n}<\frac{100-0.25n}{0.433\sqrt n}\right)=\Phi(\frac{100-0.25n}{0.433\sqrt n})\]

and so

\[\frac{100-0.25n}{0.433\sqrt n} = \Phi^{-1}(0.01) = -2.326\]

now: \[ \begin{aligned} &\frac{100-0.25n}{0.433\sqrt n} = -2.326\\ &100-0.25n = -2.326\times 0.433\sqrt n = -1.0072\sqrt n\\ &(100-0.25n)^2 = (-1.0072\sqrt n)^2=1.0144n\\ &10000-50n+0.0625n^2 = 1.0144n\\ &10000-50n+0.0625n^2 = 1.0144n\\ &0.0625n^2-51.0144n+10000=0\\ &n_{1,2}= \left( 51.0144\pm\sqrt{51.0144^2-4\times10000\times0.0625}\right)/(2\times0.00625)=\\ &\frac{51.0144\pm10.12}{0.125} \end{aligned} \] which gives either \(n=(51.0144-10.12)/0.125=327\) or \(n=(51.0144+10.12)/0.125=489\).

So the quadratic equation gives us two possible solutions, so let’s check which one is right. We find

\[\Phi((327-100)/0.25)=0.9906\]
\[\Phi((489-100)/0.25)=0.0103\]

so we see \(n=489\) is the correct answer.

As we saw above, the CLT is really a family of theorems, all with the same conclusion but with different assumptions. In fact, there are probably a 1000 different CLT’s! Here is what is probably the most famous of them:

Theorem (3.4.6)

Lindeberg-Feller 1922

let \(X_n\) be independent random variables with \(E[X_n]=0\) and \(var(X_n)=\sigma_n^2< \infty\). Let \(S_n=\sum_{i=1}^n X_i\), \(s_n^2=\sum_{i=1}^n \sigma_i^2\) and for \(\epsilon>0\) define

\[\Lambda_n(\epsilon) = \frac1{s_n^2} \sum_{i=1}^n E\left\{ X_i^2I(|X_i|\ge \epsilon s_n)\right\}\]

then if

\[\Lambda_n (\epsilon)\rightarrow 0\] as \(n\rightarrow \infty\).

for any \(\epsilon >0\), \(S_n/s_n\) converges to a standard normal in distribution.

Note the condition \(E[X_i]=0\) is irrelevant because one can always change the problem to \(X_i=Y_i-E[Y_i]\).

Note: The condition on \(\Lambda_n(\epsilon)\) of the theorem is known as the Lindeberg condition. Feller showed that it is in some sense not only necessary but also sufficient. In that sense this is the ultimate CLT for independent rv’s.

Note: \(E\left\{ X_i^2 I(|X_i|\ge \epsilon s_n)\right\}\ge 0\), so \(\Lambda_n (\epsilon)\rightarrow 0\) implies \(s_n\rightarrow \infty\), which in turn implies \(\sigma_n\) can go to 0 but not to fast.

Note: if \(X\) is bounded, say \(|X|<M\) for some \(M>0\), and \(s_n\rightarrow \infty\), for some n \(I(|X_i|\ge \epsilon s_n)=0\) and the condition is fulfilled.

Example (3.4.7)

Say \(Y_1, Y_2,...\) are iid with mean \(\mu\) and sd \(\sigma\). Set \(X_i =Y_i-\mu\). Now

\[ \begin{aligned} &s_n^2=\sum_{i=1}^n \sigma^2 = n\sigma^2\\ &E\left\{ X_i^2I(|X_i|\ge \epsilon s_n)\right\} = \\ &E\left\{ X_1^2I(|X_1|\ge \epsilon \sqrt{n}\sigma)\right\} \\ &\Lambda_n(\epsilon) = \frac1{n\sigma^2} \sum_{i=1}^n E\left\{ X_i^2I(|X_i|\ge \epsilon \sqrt{n}\sigma)\right\} = \\ &\frac1{\sigma^2} E\left\{ X_1^2I(|X_1|\ge \epsilon \sqrt{n}\sigma)\right\} \\ &\sigma^2=EX_1^2=\int_{-\infty}^\infty x^2f(x)dx=\\ &\int_{|x|<\epsilon \sqrt{n}\sigma} x^2f(x)dx+\int_{|x|\ge\epsilon \sqrt{n}\sigma} x^2f(x)dx=\\ &E\left\{ X_1^2I(|X_1|<\epsilon \sqrt{n}\sigma)\right\}+E\left\{ X_1^2I(|X_1|\ge \epsilon \sqrt{n}\sigma)\right\} \end{aligned} \]

but the left term converges to \(\sigma^2\), so the right term has to converge to 0.

Example (3.4.8)

The CLT has found applications in just about any field of mathematics or science. Here is an application in number theory:

Theorem (3.4.9)

Erdos-Kac CLT

Say we pick an integer at random from {1,2,..,n}. Then the integer has about \(\log\log(n)+\Phi(\sqrt{\log\log(n)})\) prime divisors.

proof omitted

In all approximation theorems like the central limit theorem a major issue is always how good the approximation is for finite n, that is in a specific case how far we still are from the limit. The following theorem gives some answers:

Theorem (3.4.10)

Berry-Esseen

Let \(X_1, X_2, ..\) be iid rv with \(E[X_1]=0, var(X_1)=\sigma^2\) and \(E[|X_1|^3]=\rho<\infty\), then if \(F_n\) is the cdf of \(\sqrt{n}S_n /\sigma\) we have

\[\sup_x |F_n(x)-\Phi(x)|\le \frac{C\rho}{\sigma^3\sqrt n}\]

proof omitted

Calculated values of the constant \(C\) have decreased markedly over the years, from the original value of 7.59 by Esseen (1942), to 0.7882 by van Beek (1972), then 0.7655 by Shiganov (1986), then 0.7056 by Shevtsova (2007), then 0.7005 by Shevtsova (2008), then 0.5894 by Tyurin (2009), then 0.5129 by Korolev & Shevtsova (2009), then 0.4785 by Tyurin (2010). The detailed review can be found in the papers Korolev & Shevtsova (2009), Korolev & Shevtsova (2010). The best estimate as of 2012 is C=0.4748.

Example (3.4.11)

say \(Z_i \sim Ber(p)\), and let \(X_i =Z_i /p-1\), then

\[ \begin{aligned} &E[X_i] =E[Z_i/p-1]=p/p-1=0 \\ &E[X_i^2] =E[(Z_i/p-1)^2]=\frac1{p^2}E[(Z_i-p)^2] = \\ &\frac1{p^2}var(Z_i) = \frac1{p^2}p(1-p)=\frac{1-p}{p}\\ &\rho=E[|X_i|^3] =\frac1{p^3}E[|Z_i-p|^3] =\\ &\frac1{p^3} \left( |0-p|^3(1-p)+ |1-p|^3p\right)=\\ &\frac{1-p}{p^2} \left[p^2+(1-p)^2 \right]\\ &\frac{C\rho}{\sigma^3}=\frac{0.4748\frac{1-p}{p^2} \left[p^2+(1-p)^2 \right]}{(\frac{1-p}{p})^{3/2}}=\frac{0.4748\left[p^2+(1-p)^2 \right]}{\sqrt{p(1-p)}} \end{aligned} \]

If \(p=1/2\) the bound is 0.4748. As p gets close to 0, or 1 the bound goes to \(\infty\).

Say we have \(p=0.25\) and \(n=100\). The following graph draws \(F_n(x)-\Phi(x)\) and the bounds

\[\pm \frac{0.4748\left[(1/4)^2+(1-1/4)^2 \right]}{\sqrt{1/4(1-1/4)100}}=\pm 0.0685\]

p=0.25;n=100
bnd=0.4748*(p^2+(1-p)^2)/sqrt(n*p*(1-p))
bnd

## [1] 0.06853148

x=seq(-3,3,length=250)
x1=floor(n*p+x*sqrt(n*p*(1-p)))
Fn=pbinom(x1, n, p)
df=data.frame(x=x, y=Fn-pnorm(x))
ggplot(data=df, aes(x, y)) +
  geom_line(size=1.25) +
  geom_hline(yintercept = c(-1,1)*bnd,col="blue",size=2)