normal.knit

The Normal (Gaussian) Distribution

Normal Distribution

Definition (2.3.1)

\(X\) is said to have a normal distribution with mean \(\mu\) and variance \(\sigma^2\) (\(X \sim N(\mu,\sigma)\)) if it has density

\[f(x)=\frac1{\sqrt{2\pi \sigma^2}}\exp\left\{-\frac1{2\sigma^2}(x-\mu)^2 \right\}\]

If \(\mu=0\) and \(\sigma=1\) it is called a standard normal rv, and often denoted by \(Z\) instead of \(X\).

Careful: some papers and textbooks define the normal as \(X \sim N( \mu , \sigma^2)\), that is they use the variance instead of the standard deviation.

Theorem (2.3.2)

\(Z \sim N(0,1)\) then \(X= \mu + \sigma Z \sim N( \mu , \sigma )\)
\(X \sim N( \mu , \sigma )\), then \(Z=(X- \mu )/ \sigma \sim N(0,1)\)

proof

\[ \begin{aligned} &F_X(x) = P(X\le x) = \\ &P(\mu + \sigma Z\le x) = P(Z\le \frac{x-\mu}{\sigma})=F_Z\left(\frac{x-\mu}{\sigma}\right)\\ &f_X(x) = \frac{d}{dx}F_Z\left(\frac{x-\mu}{\sigma}\right)=f_Z\left(\frac{x-\mu}{\sigma}\right)\frac1\sigma=\\ &\frac1{\sqrt{2\pi}}\exp \left\{-\frac12 \left(\frac{x-\mu}{\sigma}\right)^2\right\}\frac1\sigma = \\ &\frac1{\sqrt{2\pi\sigma^2}}\exp \left\{-\frac1{2\sigma^2} \left(x-\mu\right)^2\right\} \end{aligned} \]

part b follows in the same way.

One consequence of this theorem is that we can often do a proof for the standard normal, and then quickly generalize it to all normals.

Theorem (2.3.3)

The function above is indeed a pdf for all \(\mu\) and \(\sigma >0\).

proof

\(f(x) \ge 0\) for all \(x\) is obvious.
\(\int_{-\infty}^\infty f(x)dx=1\)?

first we show this for a standard normal. We will show that

\[\int_0^\infty \exp \left\{-z^2/2 \right\}dz=\sqrt{\frac\pi2}\]

from which the result follows immediately by the symmetry of the standard normal density.

Now

\[ \begin{aligned} &\left[\int_0^\infty \exp \left\{-z^2/2 \right\}dz\right]^2 = \\ &\left[\int_0^\infty \exp \left\{-v^2/2 \right\}dv\right]\left[\int_0^\infty \exp \left\{-u^2/2 \right\}du\right] = \\ &\int_0^\infty \int_0^\infty \exp \left\{-(u^2+v^2)^2/2 \right\}d(u,v) = \\ \end{aligned} \] Now we change to polar coordinates: \(u=r\cos(\theta)\) and \(v=r\sin(\theta)\), so \(u^2+v^2=r^2\) and \(dudv=rd\theta dr\), and therefore

\[ \begin{aligned} &\left[\int_0^\infty \exp \left\{-z^2/2 \right\}dz\right]^2 = \\ & \int_0^\infty\int_0^{\pi/2} re^{-r^2/2}d\theta dr = \\ & \int_0^\infty \frac{\pi}{2} re^{-r^2/2} dr = \\ &-\frac{\pi}{2} e^{-r^2/2}|_0^\infty = \frac{\pi}{2} \end{aligned} \]

the general case now follows easily:

\[ \begin{aligned} &P(- \infty <X< \infty ) = \\ &P(- \infty <(X- \mu )/ \sigma < \infty ) = \\ &P(- \infty <Z< \infty ) = 1 \end{aligned} \]

Example (2.3.4)

we previously said that \(\Gamma (1/2)=\sqrt{\pi}\). Here is a proof that uses the standard normal distribution:

\[ \begin{aligned} &\Gamma(\frac12) =\int_0^\infty t^{\frac12-1}e^{-t}dt= \\ &\int_0^\infty \frac1{\sqrt{t}}e^{-t}dt = \text{ }(x=\sqrt{2t})\\ &\sqrt2\int_0^\infty e^{-x^2/2}dx = \\ &\sqrt2\frac12\int_{-\infty}^\infty e^{-x^2/2}dx = \\ &\sqrt\pi\int_{\infty}^\infty \frac1{\sqrt{2\pi}}e^{-x^2/2}dx = \sqrt\pi\\ \end{aligned} \]

Theorem (2.3.5)

Say \(X \sim N( \mu , \sigma )\) then

\(E[X]= \mu\) and \(var(X)= \sigma^2\)
\(\psi(t)=\exp( \mu t+ \sigma^2t^2/2)\)
\(P(X> \mu ) = P(X< \mu ) =1/2\) and \(P(X> \mu +x) = P(X< \mu -x)\)

proof

\[E[Z] = \int_{-\infty}^\infty x\frac1{\sqrt{2\pi}}e^{-x^2/2}dx = 0\] because the integrant is an odd function.

\[ \begin{aligned} &E[Z^2] = \int_{-\infty}^\infty x^2\frac1{\sqrt{2\pi}}e^{-x^2/2}dx = \\ &\int_{-\infty}^\infty x\left(x\frac1{\sqrt{2\pi}}e^{-x^2/2}\right)dx = \\ &x\left(-\frac1{\sqrt{2\pi}}e^{-x^2/2}\right)|_{-\infty}^\infty - \int_{-\infty}^\infty -\frac1{\sqrt{2\pi}}e^{-x^2/2}dx = 1\\ &\\ &var(Z)=1-0^2=1\\ &E[X]=E[\mu+\sigma Z]=\mu\\ &var(X)=var(\mu+\sigma Z)=\sigma^2 var(Z)=\sigma^2 \end{aligned} \] 2.

\[ \begin{aligned} &\psi_Z(t) = \int_{-\infty}^\infty e^{tx}\frac1{\sqrt{2\pi}}e^{-x^2/2}dx =\\ &\int_{-\infty}^\infty \frac1{\sqrt{2\pi}}e^{tx-x^2/2}dx = \\ &\int_{-\infty}^\infty \frac1{\sqrt{2\pi}}e^{-\frac12(x^2-2tx)}dx = \\ &\int_{-\infty}^\infty \frac1{\sqrt{2\pi}}e^{-\frac12(x^2-2tx+t^2)+t^2/2}dx = \\ &e^{t^2/2}\int_{-\infty}^\infty \frac1{\sqrt{2\pi}}e^{-\frac12(x-t)^2}dx = e^{t^2/2}\\ &\\ &\psi_X(t)=E[e^{tX}]=E[e^{t(\mu+\sigma Z)}]=\\ &e^{t\mu}E[e^{(t\sigma) Z}]=\\ &e^{t\mu}\psi_Z(\sigma t)=\\ &e^{t\mu}e^{(t\sigma)^2/2}=\exp \left\{ \mu t+\sigma^2t^2/2\right\} \end{aligned} \]

Note that the density of \(X\) is symmetric around \(\mu\), and so

\[P(X> \mu +x) = P(X< \mu -x)\] with \(x=0\) it follows that

\[P(X> \mu ) = P(X< \mu ) = 1-P(X> \mu )\]

and so \(P(X> \mu ) = 1/2\).

Example (2.3.6)

We have seen before that the Cauchy rv. has very thick tails, that is the probabilities \(P(X>t)\) are large. On the other hand the normal distribution has very thin tails. There is also a distribution that is somewhat in between, called the \(t\) distribution with n degrees of freedom. It has density

\[f(t)=\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi n}}\frac1{(1+t^2/n)^{(n+1)/2}}\]

For n=1 this is the Cauchy, as \(n \rightarrow \infty\) it approaches the standard normal distribution

Theorem (2.3.6a)

Say \(X\sim N(\mu, \sigma)\), then

the k^th moment is given by

\[\mu_k= \sum_{j=0}^k {k\choose j}\mu^j \sigma^{2k-j} \left(\prod_{i=0}^{(k-j)/2-1}(2i+1)\right)I\{k-j \text{ even}\}\]

the k^th central moment is given by

\[ \begin{aligned} &\kappa_k=\left\{ \begin{array} .0&\text{if}&k \text{ is odd}\\ \sigma^k \prod_{i=0}^{k/2-1}(2i+1)&\text{if}&k \text{ is even} \end{array}\right. \end{aligned} \]

the skewness and kurtosis are both 0.

proof

Say Z is standard normal, then

\[ \begin{aligned} &\sqrt{2\pi}E[Z^k] = \sqrt{2\pi}\int_{-\infty}^{\infty} x^k \frac1{\sqrt{2\pi}} \exp\{-x^2/2\} dx = \\ &\int_{-\infty}^{\infty} x^{k-1} \left[x\exp\{-x^2/2\}\right] dx = \\ &x^{k-1} \left[-\exp\{-x^2/2\}\right]|_{-\infty}^{\infty} - \int_{-\infty}^{\infty} (k-1)x^{k-2} \left[-\exp\{-x^2/2\}\right] dx = \\ &(k-1)\int_{-\infty}^{\infty} x^{k-2} \exp\{-x^2/2\} dx =\\ &(k-1)\sqrt{2\pi}E[Z^{k-2}] \end{aligned} \]

therefore

\[ \begin{aligned} &E[Z^{2k+1}] =(2k)E[Z^{2k-1}]=...=cE[Z]=0 \\ &E[Z^{2k}] =(2k-1)E[Z^{2k-2}]=...\\ &=\left[\prod_{i=0}^{k-1}(2i+1) \right]E[Z^2]=\\ &\prod_{i=0}^{k-1}(2i+1) =1\times 3\times... \times 2k-1 \end{aligned} \]

Now

\[ \begin{aligned} &\mu_k=E[X^k] = \\ &E[(\mu+\sigma Z)^k]=\\ &E\left[\sum_{j=0}^k {k\choose j}\mu^j (\sigma Z)^{k-j} \right] = \\ &\sum_{j=0}^k {k\choose j}\mu^j \sigma^{k-j} E[Z^{k-j}] = \\ &\sum_{j=0}^k {k\choose j}\mu^j \sigma^{2k-j} \prod_{i=0}^{(k-j)/2-1}(2i+1)I\{k-j \text{ even}\} \end{aligned} \] for the central moments we find

\[ \begin{aligned} &\kappa_k=E[(X-\mu)^k] = \\ &E[(\mu+\sigma Z-\mu)^k]=\sigma^kE[Z^k]=\\ &\\ &\left\{ \begin{array} .0&\text{if}&k \text{ is odd}\\ \sigma^k \prod_{i=0}^{k/2-1}(2i+1)&\text{if}&k \text{ is even} \end{array}\right. \end{aligned} \] and finally

\[ \begin{aligned} &\gamma_1 = \frac{\kappa_3}{\kappa_2^{3/2}} = \frac{0}{\sigma^3}=0 \\ &\gamma_2 = \frac{\kappa_4}{\kappa_2^2} - 3 = \frac{3\sigma^4}{(\sigma^2)^2}-3=0 \end{aligned} \]

Theorem (2.3.7)

Say \(X \sim N( \mu , \sigma ), Y \sim N(\nu, \tau )\) and X and Y are independent. Then \(X+Y\) and \(X-Y\) are also normal.

proof \[ \begin{aligned} &\psi _{X+Y} (t) = \\ &\psi _X (t) \psi _Y (t) = \\ &\exp( \mu t+ \sigma ^2t^2/2) \exp(\nu t+ \tau ^2t^2/2) = \\ &\exp(( \mu +\nu)t+( \sigma ^2+ \tau ^2)t^2/2) \end{aligned} \] and so \(X+Y \sim N\left( \mu +\nu, \sqrt{\sigma ^2+ \tau ^2}\right)\)

\[ \begin{aligned} &\psi _{-Y} (t) = \\ &E[\exp(t[-Y])] = \\ &E[\exp((-t)Y)] = \\ &\exp(\nu(-t)+ \tau ^2(-t)^2/2) =\\ &\exp(-\nu t+ \tau ^2t^2/2) = \\ \end{aligned} \]

and so \(-Y \sim N(-\nu, \tau ^2)\)

finally \(X-Y \sim N\left( \mu -\nu, \sqrt{\sigma ^2+ \tau ^2}\right)\)

Because of the importance of the normal distribution a number of theorems have been found to characterize it. Here is one such result:

Theorem (2.3.8)

Bernstein

If \(X\perp Y\) and \(X + Y \perp X-Y\), then \(X\) and \(Y\) are normal.

proof

We will do this proof through a couple of lemmas:

Lemma

If \(X\) and \(Y\) are iid normal, then \(X+Y\) and \(X-Y\) are also independent normal.

proof

We have already shown that \(X+Y\) and \(X-Y\) are normal. Now

\[ \begin{aligned} &cov(X-Y,X+Y) = \\ &cov(X,X) + cov(X,Y) + cov(-Y,X) + cov(-Y,Y) = \\ &var(X) + cov(X,Y) - cov(Y,X) -var(Y) = 0 \end{aligned} \]

and as we shall see in a little bit for a normal distribution this implies \(X-Y\) and \(X+Y\) are independent.

Lemma

If \(X\) and \(Z\) are independent such that \(Z\) and \(X+Z\) are normal, then \(X\) is normal as well.

proof

We use the mgf:

\[ \begin{aligned} &\psi_{X+Z}(t) = \psi_{X}(t)\psi_{Z}(t)\\ &\psi_{X}(t) = \psi_{X+Z}(t)/\psi_{Z}(t) =\\ &\exp( a t+ b t^2/2)/ \exp(c t+ dt^2/2) = \\ &\exp( (a-c) t+ (b-d) t^2/2) \end{aligned} \] this is the mgf of a normal random variable if \(b-d>0\), so it can be a variance. But \(X+Z\) and \(Z\) are independent, therefore

\[b-d=var(X+Z)-var(Z)=\\var(X)+var(Z)-var(Z)=var(X)>0\] Lemma

If \(X\), \(Z\) are independent random variables and \(Z\) is normal, then \(X+Z\) has a non-vanishing probability density function which has derivatives of all orders.

proof

wlog assume \(Z \sim N(0,1/\sqrt 2)\). Consider the function

\[f(x) = E[\exp\left(-(x-X)^2\right)]\]

Then \(f(x) \ne 0\) for each x because \(\exp\left(-(x-X)^2\right)>0\). Moreover all derivatives exist and are bounded uniformly because

\[\lim_{x\rightarrow \pm \infty} x^k\exp(-x^2) = 0\]

so \(x^k\exp(-x^2)\) has a finite minimum and maximum for all k, and therefore f has derivatives of all orders.

Now

\[ \begin{aligned} &\int_{-\infty}^{t} f_{X+Y}(x)dx = F_{X+Y}(t) = P(X+Y<t) = \\ &\int_{-\infty}^\infty F_X(t-z)f_Z(z) dz = \\ &\int_{-\infty}^\infty F_X(t-z)\frac1{\sqrt{2\pi\times 1/2}} \exp \{ -\frac{z^2}{2\times 1/2} \} dz = \\ &\frac1{\sqrt{\pi}}\int_{-\infty}^\infty F_X(t-z) \exp \{ -z^2 \} dz = \\ &\frac1{\sqrt{\pi}}\int_{-\infty}^\infty \left[ \int_{-\infty}^{t-z} f_X(y)dy \right]\exp \{ -z^2 \} dz = \\ &\frac1{\sqrt{\pi}}\int_{-\infty}^\infty \left[ \int_{-\infty}^{t} f_X(x-z)dx \right]\exp \{ -z^2 \} dz = \\ &\frac1{\sqrt{\pi}}\int_{-\infty}^\infty \int_{-\infty}^{t} f_X(x-z) \exp \{ -z^2 \} dxdz = \\ &\frac1{\sqrt{\pi}}\int_{-\infty}^\infty \int_{-\infty}^{t} f_X(y) \exp \{ -(x-y)^2 \} dxdy = \\ &\frac1{\sqrt{\pi}} \int_{-\infty}^{t} \left[ \int_{-\infty}^\infty\exp \{ -(x-y)^2 \} f_X(y) dy\right ] dx = \\ &\frac1{\sqrt{\pi}} \int_{-\infty}^{t} E\left[\exp \{ -(x-X)^2 \} \right ] dx \\ &\text{so}\\ &f_{X+Y}(t) =E \left[\exp \{ -(t-X)^2 \} \right ] \\ \end{aligned} \]

and we found that f is (proportional to) the density of X+Y.

Note that the fact that a mixture of any (discrete or continuous) random variable and a normal random variable has moments of all orders is an important fact used in many proofs.

Now for the finish of Bernstein’s theorem: First we change notation and use rv’s \(X_1\) and \(X_2\) . So we know \(X_1\) and \(X_2\) are independent and so are \(X_1 +X_2\) and \(X_1 -X_2\).

Let \(Z_1\) and \(Z_2\) be iid normal rv’s, independent of \(X_1\) and \(X_2\). Then define rv’s

\[Y_k = X_k +Z_k\]

By the third lemma each of the \(Y_k\)’s has a smooth non-zero pdf.

The joint density of the \((Y_1 +Y_2, Y_1 -Y_2)\) is found via the transformation

\[u=x+y,v=x-y\] which has inverse transform

\[x=(u+v)/2,y=(u-v)/2\]

and Jacobian \(J=-1/2\). Therefore

\[f_{Y_1+Y_2,Y_1-Y_2}(u,v)=\frac12 f_{Y_1}(\frac{u+v}2)f_{Y_2}(\frac{u-v}2)\]

\(Y_1 +Y_2\) and \(Y_1 -Y_2\) are independent by assumption, therefore this factors into two functions, one of \(x\) and the other of \(y\). Let’s call them \(a(x)\) and \(b(y)\).

Consider the functions

\[Q_k(x) = \log(f_k(x))\]

We will now show that \(Q_k\) is a quadratic function by showing that its second derivative is 0 everywhere:

First the \(Q_k\)’s are twice differentiable by the smoothness argument, and we have

\[ \begin{aligned} &Q_1 (x+y)+Q_2 (x-y) = \\ &\log(f_1 (x+y))+\log(f_2 (x-y)) =\\ &\log(f_1 (x+y)f_2 (x-y)) =\\ &\log (2a(2x)b(2y)) =\\ &\log 2+\log a(2x) + \log b(2y) \end{aligned} \]

\[ \begin{aligned} &\frac{d}{dx}\{Q_1 (x+y)+Q_2 (x-y)\} = \\ &\frac{d}{dx}\{\log 2+ \log a(2x) + \log b(2y)\} = \\ &\frac{d}{dx}\{\log a(2x)\} \\ &Q_1''(x+y)+Q_2''(x-y)=\frac{d^2}{dxdy}\{Q_1 (x+y)+Q_2 (x-y)\} = \\ &\frac{d}{dy}\frac{d}{dx}\{\log a(2x)\}) = 0 \end{aligned} \] but also

\[\frac{d^2}{dxdy}\{Q_1 (x+y)+Q_2 (x-y)\} = Q''_1 (x+y)-Q''_2 (x-y)\]

and so

\[Q''_1 (x+y) = Q''_2 (x-y)\]

taking x=y we have

\[Q''_1 (2x) = Q''_2 (0) = \text{const}\]

and taking x=-y we have

\[Q''_2 (2y) = Q''_1 (0) = \text{const}\]

Therefore the \(Q_k\) functions have vanishing second derivatives, and so have to be quadratic functions. Say

\[Q_k (x) = a_k x^2+b_k x+c_k\]

and so

\[f_k (x) = \exp(a_k x^2+b_k x+c_k )\]

as a pdf \(f_k\) has to be integrable, so \(a_k <0\), and by integrating \(f_k\) over \(\mathbb{R}\) we find \(c_k =-\frac12 \log(2 \pi a_k )\). Therefore \(f_k\) is a normal density, and so \(Y_1\) and \(Y_2\) are normal.

Now \(Y_1\) and \(Y_2\) are iid normal and the independence of \(Y_1+Y_2\) and \(Y_1-Y_2\) follows from the first lemma.

The theorem then follows from the second lemma.

Bivariate Normal RV

Definition (2.3.9)

Let \(\mu _1 , \mu _2 \in \mathbb{R}\), \(\sigma _1 , \sigma _2 \in \mathbb{R}^+\) and \(\rho \in [-1,1]\), then the random vector (X,Y) is said to have a bivariate normal distribution if

\[f(x,y)=\frac1{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[\frac{(x-\mu_x)^2}{\sigma_x^2}-\frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x\sigma_y}+\frac{(y-\mu_y)^2}{\sigma_y^2}\right]\right\}\]

Theorem (2.3.10)

Let \((X,Y)\) be a bivariate normal. Let \(U=(X- \mu _1 )/ \sigma _1\) and \(V=(Y- \mu _2 )/ \sigma _2\). Then \((U,V)\) is a bivariate normal random vector with

\[\mu _U = \mu _V = 0, \sigma _U = \sigma _V = 1,\rho _{UV} = \rho\]

Let \((U,V)\) be a bivariate normal random vector with \(\mu _U = \mu _V = 0, \sigma _U = \sigma _V = 1\). Let \(X= \mu _1 + \sigma _1 U\) and \(Y= \mu _2 + \sigma _2 V\). Then \((X,Y)\) a bivariate normal with parameters \(\mu _1 , \mu _2 , \sigma _1 , \sigma _2\) and \(\rho\)

proof

follows from a simple application of the transformation theorem

Theorem (2.3.11)

Let \((X,Y)\) be a bivariate normal. Then \(X \sim N( \mu _1 , \sigma _1 )\).

proof

Let \(U=(X- \mu _1 )/ \sigma _1\) and \(V=(Y- \mu _2 )/ \sigma _2\), then

\[ \begin{aligned} &f_U(x) = \int_{-\infty}^{\infty} f_{U,V}(x,y)dy=\\ &\int_{-\infty}^{\infty} \frac1{2\pi\sqrt{1-\rho^2}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[x^2-2\rho xy+y^2\right]\right\}dy = \\ &\int_{-\infty}^{\infty} \frac1{2\pi\sqrt{1-\rho^2}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[x^2-2\rho xy+y^2+x^2\rho^2-x^2\rho^2\right]\right\} dy = \\ &\frac1{\sqrt{2\pi}}e^{-x^2/2} \int_{-\infty}^{\infty} \frac1{\sqrt{2\pi(1-\rho^2)}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[y-\rho x\right]^2\right\}dy = \\ &\frac1{\sqrt{2\pi}}e^{-x^2/2} \end{aligned} \] and so

\[ \begin{aligned} &F_X(x) = P(X<x) = P(\mu_1+\sigma_1 U<x) = \\ &P(U<\frac{x-\mu_1}{\sigma_1}) = F_U(\frac{x-\mu_1}{\sigma_1})\\ &\\ &f_X(x) = f_U(\frac{x-\mu_1}{\sigma_1})\frac1{\sigma_1} = \\ &\frac1{\sqrt{2\pi \sigma_1^2}} \exp \left\{ -\frac12 \left(\frac{x-\mu_1}{\sigma_1}\right)^2\right\} \end{aligned} \] and so \(X\sim N(\mu_1,\sigma_1)\)

Theorem (2.3.12)

Let \((X,Y)\) be a bivariate normal. Then \(cor(X,Y) = \rho\).

proof

Let \(U=(X- \mu _1 )/ \sigma _1\) and \(V=(Y- \mu _2 )/ \sigma _2\), then

We already have \(E[U]=E[V]=0\). Now

\[ \begin{aligned} &cov(U,V) = E[UV] = \\ &\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} xy\frac1{2\pi\sqrt{1-\rho^2}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[x^2-2\rho xy+y^2\right]\right\}dxdy = \\ &\int_{-\infty}^{\infty} x\frac1{\sqrt{2\pi}}e^{-x^2/2} \left[ \int_{-\infty}^{\infty} y\frac1{\sqrt{2\pi(1-\rho^2)}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[y-\rho x\right]^2\right\}dy\right]dx = \\ &\int_{-\infty}^{\infty} x\frac1{\sqrt{2\pi}}e^{-x^2/2} \left[\rho x\right]dx = \\ &\rho E[U^2]=\rho \end{aligned} \]

and so

\[ \begin{aligned} &cov(X,Y) = cov(\mu_1+\sigma_1 U, \mu_2+\sigma_2 V) =\\ &\sigma_1\sigma_2cov(U,V) = \sigma_1\sigma_2\rho\\ &cor(X,Y) = \frac{cov(X,Y)}{\sigma_1\sigma_2}=\rho\\ \end{aligned} \]

Theorem (2.3.13)

let \((X, Y)\) be a bivariate normal rv, then

\[X\perp Y \text{ iff }cor(X,Y)=0\]

proof

one direction is always true. For the other we have if \(\rho =0\)

\[ \begin{aligned} &f(x,y)=\frac1{2\pi\sigma_1\sigma_2\sqrt{1-0^2}}\exp \left\{-\frac{1}{2(1-0^2)} \left[\frac{(x-\mu_x)^2}{\sigma_x^2}-\frac{2\times 0(x-\mu_x)(y-\mu_y)}{\sigma_x\sigma_y}+\frac{(y-\mu_y)^2}{\sigma_y^2}\right]\right\} = \\ &\frac1{2\pi\sigma_1\sigma_2}\exp \left\{-\frac{1}{2} \left[\frac{(x-\mu_x)^2}{\sigma_x^2}+\frac{(y-\mu_y)^2}{\sigma_y^2}\right]\right\} = \\ &\left(\frac1{\sqrt{2\pi \sigma_1^2}} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}\right)\left(\frac1{\sqrt{2\pi \sigma_2^2}} e^{-\frac{(y-\mu_2)^2}{2\sigma_2^2}}\right) = f_X(x)f_Y(y) \end{aligned} \]

Example (2.3.14)

It is not true that two normal rv’s \(X\) and \(Y\) with \(cor(X,Y)=0\) are always independent. Consider \(X\sim N(0,1)\), \(Z\) a Rademacher rv independent of \(X\) (that is \(P(Z=-1)=P(Z=1)=1/2\)), and \(Y=ZX\). Then

\[ \begin{aligned} &P(Y<y) =P(ZX<y) = \\ &P(ZX<y|Z=-1)P(Z=-1)+P(ZX<y|Z=1)P(Z=1) = \\ &P((-1)X<y)\frac12+P(1X<y)\frac12 = \\ &P(X>-y)\frac12+P(X<y)\frac12 = \\ &PX<y)\frac12+P(X<y)\frac12 = \\ &P(X<y) \end{aligned} \] so \(Y\sim N(0,1)\). Finally

\[ \begin{aligned} &cov(X,Y) =E[XY] -E[X][E[Y]= \\ &E[XZX] = E[ZX^2] = E[Z]E[X^2]=0 \end{aligned} \]

However clearly X and Y are not independent.

Example (2.3.15)

the joint distribution of two normal rv’s need not be bivariate normal.

Say \(X \sim N(0,1)\) and let \(Y=-X\) if \(|X|>1\) and \(Y=X\) if \(|X| < 1\), then

if \(y<-1\)

\[P(Y< y) = P(-X<y) = P(X>-y)=P(X<y)\]

if \(-1<y<1\)

\[P(Y< y) = P(Y<-1)+P(-1<Y<y)=P(-X<-1)+P(-1<X<y) =\] \[P(X>1)+P(-1<X<y)=P(X<-1)+P(-1<X<y) = P(X<y)\]

if \(y>1\)

\[P(Y< y) = P(Y<-1)+P(-1<Y<1)+P(1<Y<y)=\] \[P(-X<-1)+P(-1<X<1)+P(1<-X<y)=\] \[P(X<-1)+P(-1<X<1)+P(-y<X<-1)=\] \[P(X<-1)+P(-1<X<1)+P(1<X<y)=P(X<y)\] so \(Y \sim N(0,1)\) as well, but for example \(f(-2,-2)=0\).

Theorem (2.3.16)

say \((X,Y)\) is a bivariate normal rv, then

\(Z = X + Y \sim N( \mu _1 + \mu _2 ,\sqrt{ \sigma _1 ^2+ \sigma _2 ^2+2 \sigma _1 \sigma _2 \rho }))\)
\(Z = X|Y=y \sim N( \mu _1 + \rho ( \sigma _1 / \sigma _2 )(y- \mu _2 ), \sigma _1 \sqrt{1- \rho ^2)})\)

proof

is obvious: if \((X,Y)\) is a bivariate normal rv, then \(X \sim N( \mu _1 , \sigma _1 ), Y \sim N( \mu _2 , \sigma _2 )\) and cor(X,Y)=\(\rho\). Therefore \(X+Y\) has a normal distribution with

\[E[X+Y] = \mu _1 + \mu _2\]

and

\[var(X+Y) = var(X)+var(Y)+2cov(X,Y) = \sigma_1^2+ \sigma_2^2+2 \sigma_1 \sigma_2 \rho\].

Let \(U=(X- \mu _1 )/ \sigma _1\) and \(V=(Y- \mu _2 )/ \sigma _2\), then

\[ \begin{aligned} &f_{U|V=v}(x|y) = \frac{f_{U,V}(x,y)}{f_V(y)}=\\ & \frac{\frac1{2\pi\sqrt{1-\rho^2}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[x^2-2\rho xy+y^2\right]\right\}}{\frac1{\sqrt{2\pi}}\exp \left\{-y^2/2 \right\} } = \\ &\frac1{\sqrt{2\pi(1-\rho^2)}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[x^2-2\rho xy+y^2-(1-\rho^2)y^2\right]\right\} = \\ &\frac1{\sqrt{2\pi(1-\rho^2)}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[x^2-2\rho xy+\rho^2 y^2\right]\right\}=\\ &\frac1{\sqrt{2\pi(1-\rho^2)}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[x-\rho y\right]^2\right\} \end{aligned} \]

and so \(U|V=y \sim N( \rho y,\sqrt{1- \rho ^2})\)

Now

\[ \begin{aligned} &F_{X|Y=y}(x|y)=P(X<x|Y=y) = \\ &P(\mu_1+\sigma_1 U<x | \mu_2+\sigma_2 V = y) =\\ &P(U<\frac{x-\mu_1}{\sigma_1}| V=\frac{y-\mu_2}{\sigma_2}) \\ & \\ &f_{X|Y=y}(x|y) = \\ &\frac1{\sqrt{2\pi(1-\rho^2)}}\exp \left\{-\frac{1}{2(1-\rho^2)} \left[\frac{x-\mu_1}{\sigma_1}-\rho \frac{y-\mu_2}{\sigma_2}\right]^2\right\}\frac1{\sigma_1} = \\ &\frac1{\sqrt{2\pi\sigma_1(1-\rho^2)}}\exp \left\{-\frac{1}{2\sigma_1^2\sigma_2^2(1-\rho^2)} \left[\sigma_2x-\sigma_2\mu_1-\rho \sigma_1y+\rho\sigma_1\mu_2\right]^2\right\} = \\ &\frac1{\sqrt{2\pi\sigma_1(1-\rho^2)}}\exp \left\{-\frac{1}{2\sigma_1^2\sigma_2^2(1-\rho^2)}\sigma_2^2 \left[x-(\mu_1+\rho (\sigma_1/\sigma_2)(y-\mu_2))\right]^2\right\} =\\ &\frac1{\sqrt{2\pi\sigma_1(1-\rho^2)}}\exp \left\{-\frac{1}{2\sigma_1^2(1-\rho^2)} \left[x-(\mu_1+\rho (\sigma_1/\sigma_2)(y-\mu_2))\right]^2\right\} \end{aligned} \]

Multivariate Normal RV

Definition (2.3.17)

Let \(\mathbf{\mu}=( \mu_1 ,.., \mu_n )^T\) be a vector and \(\Sigma =[ \sigma _{ij} ]\) be a symmetric positive semi-definite matrix (ie \(x^T \Sigma x \ge 0\) for all x), then the random vector

\[\mathbf{X} = (X_1 ,..,X_n )^T\]

has a multivariate normal distribution if it has joint density

\[f(\pmb{x})=(2\pi)^{-n/2}|\Sigma|^{-1/2}\exp \left\{-\frac12(\pmb{x-\mu})^T\Sigma^{-1}(\pmb{x-\mu}) \right\}\]

where \(|\Sigma|\) is the determinant of \(\Sigma\).

Example (2.3.18)

\(n=1\):

\(\Sigma=[a]\), \(x^T \Sigma x = ax^2 \ge 0\) iff \(a \ge 0\)

\(|\Sigma|=a, \Sigma^{-1}=1/a\), and so

\[ \begin{aligned} &f(\pmb{x})=(2\pi)^{-n/2}|\Sigma|^{-1/2}\exp \left\{-\frac12(\pmb{x-\mu})^T\Sigma^{-1}(\pmb{x-\mu}) \right\} = \\ &(2\pi)^{-1/2}a^{-1/2}\exp \left\{-\frac12(x-\mu)\frac1{a}(x-\mu) \right\} = \\ &\frac1{\sqrt{2\pi a}}\exp \left\{-\frac1{2a}(x-\mu)^2 \right\} \end{aligned} \]

so \(a\) is the variance of \(X\).

Example

\(n=2\): we have a symmetric 2x2 matrix \(\Sigma\):

\[ \pmb{\Sigma} = \begin{pmatrix} a & b \\ b & c \end{pmatrix}\\ \] For \(\Sigma\) to be a covariance matrix it has to be positive definite, that is we need

\[ \begin{aligned} &\pmb{x^T\Sigma x} = \\ &(x \text{ }y) \begin{pmatrix} a & b \\ b & c \end{pmatrix} \begin{pmatrix}x \\y\end{pmatrix}= \\ &ax^2+2bxy+cy^2\ge 0 \end{aligned} \] What does this imply for \(a,b,c\)? First if \(y=0\) we need \(ax^2\ge 0\), or \(a\ge 0\). Similarly we need \(c\ge 0\).

Let \(u=\sqrt a x\), \(v=\sqrt c y\) and \(d=\frac{b}{\sqrt{ac}}\), then we need

\[u^2+2duv+v^2\ge 0\] this implies

\[ \begin{aligned} &u^2+2duv+v^2 = \\ &u^2+2duv+d^2v^2-d^2v^2+v^2 = \\ &(u+dv)^2+(1-d^2)v \ge 0 \end{aligned} \] and this means we need \(|d|\le 1\).

So in order for \(\Sigma\) to be positive semidefinite we need \(a,c \ge 0\) and \(|d| = |b/\sqrt{ac}| \le 1\) or \(|b| \le \sqrt{ac}\).

Inspired by the above calculation let’s write \(\Sigma\) as follows:

\[ \pmb{\Sigma} = \begin{pmatrix} \sigma_x^2 & \sigma_x\sigma_y\rho \\ \sigma_x\sigma_y\rho & \sigma_y^2 \end{pmatrix}\\ \]

Note that this is just as general as before, with \(a= \sigma _x ^2\) , \(c= \sigma _y ^2\) and \(b= \rho \sigma _x \sigma _y\).

Now

\[ \begin{aligned} &|\Sigma| = \sigma_x^2\sigma_y^2(1-\rho^2)\\ &\\ &\pmb{\Sigma}^{-1} = \frac1{\sigma_x^2\sigma_y^2(1-\rho^2)} \begin{pmatrix} \sigma_y^2 & -\sigma_x\sigma_y\rho \\ -\sigma_x\sigma_y\rho & \sigma_x^2 \end{pmatrix}\\ &\\ &\pmb{x^T\Sigma^{-1} x} = \\ &(x \text{ }y) \frac1{\sigma_x^2\sigma_y^2(1-\rho^2)} \begin{pmatrix} \sigma_y^2 & -\sigma_x\sigma_y\rho \\ -\sigma_x\sigma_y\rho & \sigma_x^2 \end{pmatrix} \begin{pmatrix}x \\y\end{pmatrix}=\\ &\frac1{\sigma_x^2\sigma_y^2(1-\rho^2)}(x \text{ }y) \begin{pmatrix} \sigma_y^2x -\sigma_x\sigma_y\rho y \\ \sigma_x^2 y-x\sigma_x\sigma_y\rho \end{pmatrix} =\\ &\frac1{\sigma_x^2\sigma_y^2(1-\rho^2)}\left(\sigma_y^2x^2 -\sigma_x\sigma_y\rho xy+\sigma_x^2 y^2-\sigma_x\sigma_y\rho xy\right) =\\ &\frac1{1-\rho^2}\left(\frac{x^2}{\sigma_x^2}-2\rho\frac{xy}{\sigma_x\sigma_y}+\frac{y^2}{\sigma_y^2} \right) \end{aligned} \]

and so we have a bivariate normal.

Theorem (2.3.19)

Say \(\mathbf{X}\) has a multivariate normal distribution. Then

\[\mathbf{Z} = ( (X_1 - \mu_1 )/ \sigma_{11} ,.., (X_n - \mu_n )/ \sigma_{nn})^T\]

has a multivariate normal distribution with mean vector \(\pmb{\mu}=(0,..,0)^T\) and covariance matrix \(diag(\Sigma_Z)=diag[ \sigma _{ij} ]\). Also

\(X_i \sim N( \mu_i , \sigma_{ii} )\)
\(cov(X_i ,X_j ) = \sigma_{ij}\)

without proof

Theorem (2.3.20)

Say \(\pmb{X}\sim N_p(\pmb{\mu}, \pmb{\Sigma})\), then the moment generating function is given by

\[\psi(\pmb{t})=\exp\{\pmb{t}'\pmb{\mu}+\frac12\pmb{t}'\pmb{\Sigma}\pmb{t}\}\]

proof

Let \(\pmb{Z}\sim N(\pmb{0}, \pmb{I})\), then \(\pmb{t'Z}\sim N(\pmb{t'0},\sqrt{\pmb{t'It}})=N(\pmb{0},\sqrt{\pmb{t't}})\). Let the random variable \(U\sim N(0, \pmb{t't})\), then

\[ \begin{aligned} &\psi_{\pmb{Z}}(\pmb{t}) = E\left[e^{t'Z}\right] = \\ &E\left[e^{1U}\right] = \psi_U(1) = \\ &e^{(\pmb{t't})1^2/2} =e^{\pmb{t't}/2} \end{aligned} \] and so

\[ \begin{aligned} &\psi_{\pmb{X}}(\pmb{t}) = E\left[e^{\pmb{t'X}}\right] = \\ &E\left[e^{\pmb{t}'(\pmb{\Sigma}^{1/2}\pmb{Z}+\pmb{\mu})}\right] = \\ &E\left[e^{(\pmb{t}'\pmb{\Sigma}^{1/2})\pmb{Z})}\right]e^{\pmb{t'\mu}} = \\ &e^{(\pmb{t}'\pmb{\Sigma}^{1/2})(\pmb{t}'\pmb{\Sigma}^{1/2})'/2}e^{\pmb{t'\mu}} = \\ &e^{\pmb{t}'\pmb{\Sigma}^{1/2}\pmb{\Sigma}^{1/2}\pmb{t}/2}e^{\pmb{t'\mu}} = \\ &\exp\{\pmb{t}'\pmb{\mu}+\frac12\pmb{t}'\pmb{\Sigma}\pmb{t}\} \end{aligned} \]

We have the following characterization of a multivariate normal distribution, in some ways a generalization of Bernstein’s theorem:

Theorem (2.3.21)

Let \(\pmb{X}=(X_1 ,..,X_n)^T\). Then \(\pmb{X}\) has a multivariate normal distribution if and only if every linear combination \(t_1 X_1 +..+t_n X_n\) has a normal distribution.

proof

one direction is obvious because the marginals of a multivariate rv are normal and the sum of normals is normal. The other direction can be shown using mgf’s similar to the proof of the last theorem.

Theory of Errors

In real life almost any measuring device makes some errors. Some instruments are lousy and make big ones, other instruments are excellent and make small ones. Example

You want to measure the length a certain streetlight is red. You ask 10 friends to go with you and everyone makes a guess. Example

You want to measure the length a certain streetlight is red. You ask 10 friends to go with you. You have a stopwatch that you give to each friend.

Clearly in the second case we expect to get much smaller errors.

Around 1800 Karl Friedrich Gauss was thinking about what one could say in great generality about such measurement errors. He came up with the following rules that (almost) all measurement errors should follow, no matter what the instrument:

Small errors are more likely than large errors.
an error of \(\epsilon\) is just as likely as an error of \(-\epsilon\)
In the presence of several measurements of the same quantity, the most likely value of the quantity being measured is their average.

Now it is quite astonishing that JUST FROM THESE THREE rules he was able to derive the normal distribution!