Sums of Squares, Mean and Variance of Quadratic Forms

Sums of Squares

Example (5.3.1)

Let \(\pmb{x}=(x_1,..,x_n)\) be a random sample from some population with mean \(\mu\) and standard deviation \(\sigma\). Then the total sum of squares is given by \(\sum\limits_{i=1}^n x_i^2\). Let \(\bar{x}=\frac1n \sum_{i=1}^n x_i\) be the sample mean, then

\[ \begin{aligned} &\sum_{i=1}^n (x_i-\bar{x})^2 = \\ &\sum_{i=1}^n \left( x_i^2-2x_i\bar{x}+\bar{x}^2 \right) = \\ &\sum_{i=1}^n x_i^2-\sum_{i=1}^n 2x_i\bar{x}+\sum_{i=1}^n \bar{x}^2 = \\ &\sum_{i=1}^n x_i^2-2\bar{x}\sum_{i=1}^n x_i+n\bar{x}^2 = \\ &\sum_{i=1}^n x_i^2-2\bar{x}n\bar{x}+n\bar{x}^2 = \\ &\sum_{i=1}^n x_i^2-n\bar{x}^2 \end{aligned} \]

and so we find that the total sum of squares can be partitioned into a sum of squares about the mean and the sum of squares due to the mean:

\[\sum_{i=1}^n x_i^2=\sum_{i=1}^n (x_i-\bar{x})^2+n\bar{x}^2\]

This can also be written as a quadratic form:

\[\sum_{i=1}^n x_i^2 = \pmb{x}'\pmb{x} = \pmb{x}'\pmb{I}\pmb{x}\]

Recall that \(\pmb{j}=\begin{pmatrix}1&...&1\end{pmatrix}'\) and

\[ \pmb{J} = \begin{pmatrix} 1 & ... & 1 \\ \vdots & & \vdots\\ 1 & ... & 1\\ \end{pmatrix} \]

then

\[\bar{x} = \frac1n \pmb{j}'\pmb{x}\] and

\[ \begin{aligned} &n\bar{x}^2 = n(\frac1n \pmb{j}'\pmb{x})^2 = \\ &\frac1n \pmb{x}'\pmb{j}\pmb{j}'\pmb{x} = \\ &\frac1n \pmb{x}'\pmb{J}\pmb{x} = \\ & \pmb{x}'(\frac1n\pmb{J})\pmb{x} \\ \end{aligned} \]

and so

\[\sum_{i=1}^n x_i^2 = \pmb{x}'(\pmb{I}-\frac1n\pmb{J})\pmb{x}+\pmb{x}'(\frac1n\pmb{J})\pmb{x}\]

Theorem (5.3.2)

  1. \(\pmb{I}= \left(\pmb{I}-\frac1n\pmb{J} \right)+\frac1n\pmb{J}\)
  2. \(\pmb{I}, \pmb{I}-\frac1n\pmb{J}, \frac1n\pmb{J}\) are idempotent
  3. \((\pmb{I}-\frac1n\pmb{J})(\frac1n\pmb{J})=\pmb{O}\)

proof

  1. obvious

\[ \begin{aligned} &(\pmb{I}-\frac1n\pmb{J})^2=\\ &(\pmb{I}-\frac1n\pmb{J})(\pmb{I}-\frac1n\pmb{J}) = \\ &\pmb{I}-\frac1n\pmb{J}-\frac1n\pmb{J}+(\frac1n\pmb{J})^2 = \\ &\pmb{I}-\frac2n\pmb{J}+\frac1{n^2}\pmb{J}\pmb{J} = \\ &\pmb{I}-\frac2n\pmb{J}+\frac1{n^2}(n\pmb{J}) = \\ &\pmb{I}-\frac2n\pmb{J}+\frac1n\pmb{J} = \\ &\pmb{I}-\frac1n\pmb{J} \end{aligned} \]

this proof also shows that \((\frac1n\pmb{J})^2 = \frac1n\pmb{J}\).

\[(\pmb{I}-\frac1n\pmb{J})(\frac1n\pmb{J}) = \frac1n\pmb{J}-\frac1n\pmb{J} =\pmb{0}\]

Mean and Variance of Quadratic Forms

Theorem (5.3.3)

If \(\pmb{X}\) is a random vector with mean \(\pmb{\mu}\) and covariance matrix \(\pmb{\Sigma}\) and if \(\pmb{A}\) is a symmetric matrix of constants, then

\[E[\pmb{X}'\pmb{A}\pmb{X}] = tr(\pmb{A}\pmb{\Sigma})+\pmb{\mu}'\pmb{A}\pmb{\mu}\]

proof

Note that \(\pmb{X}'\pmb{A}\pmb{X}\) is a scalar, so always \(\pmb{X}'\pmb{A}\pmb{X} = tr(\pmb{X}'\pmb{A}\pmb{X})\)

\[ \begin{aligned} &\pmb{\Sigma}=E[\pmb{X}\pmb{X}']' - \pmb{\mu}\pmb{\mu}'\\ &E[\pmb{X}\pmb{X}'] = \pmb{\Sigma}+ \pmb{\mu}\pmb{\mu}' \\ &E[\pmb{X}'\pmb{A}\pmb{X}] = \\ &E[tr(\pmb{X}'\pmb{A}\pmb{X})] = \text{ (by 4.2.11)} \\ &E[tr(\pmb{A}\pmb{X}\pmb{X}')] = \\ &tr(E[\pmb{A}\pmb{X}\pmb{X}'] = \\ &tr(\pmb{A}E[\pmb{X}\pmb{X}'] = \\ &tr(\pmb{A}[\pmb{\Sigma}+ \pmb{\mu}\pmb{\mu}']) = \\ &tr(\pmb{A}\pmb{\Sigma}+ \pmb{A}\pmb{\mu}\pmb{\mu}') = \\ &tr(\pmb{A}\pmb{\Sigma})+ tr(\pmb{A}\pmb{\mu}\pmb{\mu}') = \\ &tr(\pmb{A}\pmb{\Sigma})+ tr(\pmb{\mu}'\pmb{A}\pmb{\mu}) = \\ &tr(\pmb{A}\pmb{\Sigma})+\pmb{\mu}'\pmb{A}\pmb{\mu} \end{aligned} \]

Example (5.3.4)

Let \(X\sim N(\mu, \sigma^2); \pmb{A}=(a), a>0\), then \(\pmb{\Sigma}=(\sigma^2)\)

\[ \begin{aligned} &E[x'Ax] = E[aX^2]=aE[X^2]=\\ &a(var(X)+E[X]^2)=a(\sigma^2+\mu^2)=\\ &a\sigma^2+a\mu^2 = \\ &tr(\pmb{A}\pmb{\Sigma})+\pmb{\mu}'\pmb{A}\pmb{\mu} \\ \end{aligned} \]

Example (5.3.5)

Say \(\pmb{X} = \begin{pmatrix} X \\ Y \end{pmatrix}\sim N( \begin{pmatrix} 1 \\ 2 \end{pmatrix}, \begin{pmatrix} 3&1 \\ 1&2 \end{pmatrix})\) and we want to find \(E[X^2+2XY+2Y^2]\).

Direct solution:

\[ \begin{aligned} &E[X^2+2XY+2Y^2] = \\ &E[X^2]+2E[XY]+2E[Y^2] = \\ &var(X)+E[X]^2+2(cov(X,Y)+E[X]E[Y]) +2(var(Y)+E[Y]^2) = \\ &3+1^2+2(1+1*2)+2(2+2^2)=4+6+12=22 \end{aligned} \]

or using the formula above: say \(\pmb{A}=\begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}\), then

\[ \begin{aligned} &\pmb{X}'\pmb{A}\pmb{X} = \begin{pmatrix} X & Y \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} X \\ Y \end{pmatrix} =\\ & \begin{pmatrix} X & Y \end{pmatrix} \begin{pmatrix} X+Y \\ X+2Y \end{pmatrix} =\\ &X(X+Y)+Y(X+2Y) = X^2+2XY+2Y^2 \\ &\pmb{A}\pmb{\Sigma}= \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} 3&1 \\ 1&2 \end{pmatrix}=\begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix}\\ &tr(\pmb{A}\pmb{\Sigma})=4+5=9\\ &\pmb{\mu}'\pmb{A}\pmb{\mu}= \begin{pmatrix} 1 & 2 \end{pmatrix} \begin{pmatrix} 1&1 \\ 1&2 \end{pmatrix} \begin{pmatrix} 1 \\ 2 \end{pmatrix}= \begin{pmatrix} 1 & 2 \end{pmatrix} \begin{pmatrix} 3\\5 \end{pmatrix} = 13\\ &E[\pmb{X}'\pmb{A}\pmb{X}] = tr(\pmb{A}\pmb{\Sigma})+\pmb{\mu}'\pmb{A}\pmb{\mu} =9+13=22 \end{aligned} \]

Example (5.3.5a)

Say \(\pmb{X} = \begin{pmatrix}X_1 &...&X_n \end{pmatrix}'\) has a multivariate normal distribution with mean vector \(\pmb{\mu}\) and covariance matrix \(\Sigma\). We want to find \(E\left[\sum_{i=1}^n X_i^2\right]\). Of course we have

\[ \begin{aligned} &E\left[\sum_{i=1}^n X_i^2\right] = \\ &\sum_{i=1}^nE\left[ X_i^2\right] = \\ &\sum_{i=1}^n \left(var(X_i)+E[X_i]^2\right) = \\ &\sum_{i=1}^n var(X_i)+\sum_{i=1}^n E[X_i]^2 = \\ &\sum_{i=1}^n \sigma_{ii}+\sum_{i=1}^n \mu_i^2 = \\ &tr(\pmb{\Sigma})+\pmb{\mu'\mu} =\\ &tr(\pmb{I\Sigma})+\pmb{\mu'I\mu} \end{aligned} \]

but \[\sum_{i=1}^n X_i^2=\pmb{X'X}=\pmb{X'IX}\]

Definition (5.3.6)

Let \(\pmb{x}=(x_1,..,x_n)\) be a sample from a random vector \(\pmb{X}\) . The sample variance is defined by

\[s^2=\frac1{n-1} \sum_{i=1}^n (x_i-\bar{x})^2\]

By (5.3.1) we have

\[(n-1)s^2= \pmb{x}'(\pmb{I}-\frac1n\pmb{J})\pmb{x}\]

Say \(E[X_1]=\mu\) and \(var(X_1)=\sigma^2\). In a random sample the Xi’s are independent and identically distributed, so \(E[\pmb{X}] = \mu \pmb{j}\) and \(cov(\pmb{X})=\sigma^2\pmb{I}\). Set

\[\pmb{A}=\pmb{I}-\frac1n \pmb{J}, \pmb{\Sigma}=\sigma^2\pmb{I}\] and \(\pmb{\mu}=\mu\pmb{j}\), therefore

\[ \begin{aligned} &E[\sum_{i=1}^n (X_i-\bar{X})^2] = \\ &E[(n-1)s^2] =\\ &E[\pmb{X}'(\pmb{I}-\frac1n\pmb{J})\pmb{X}] = \text{(by 5.3.2) } \\ &tr\left[(\pmb{I}-\frac1n \pmb{J})\sigma^2\pmb{I} \right] + \mu\pmb{j}'(\pmb{I}-\frac1n \pmb{J})\mu\pmb{j} = \\ &\sigma^2tr\left[\pmb{I}-\frac1n \pmb{J} \right] + \mu^2\left(\pmb{j}'\pmb{j}- \pmb{j}'\pmb{j}\frac1n\pmb{j}'\pmb{j} \right)= \\ &\sigma^2(n-\frac{n}n)+\mu^2(n-\frac1n n^2) = \\ &\sigma^2(n-1) \end{aligned} \]

and so

\[E[s^2]=\frac1{n-1} E[\sum_{i=1}^n (x_i-\bar{x})^2] = \sigma^2\]

and we see that \(s^2\) is an unbiased estimator of \(\sigma^2\).

Theorem (5.3.7)

Let \(\pmb{X}\sim N_p(\pmb{\mu}, \pmb{\Sigma})\), then the moment generating function of \(\pmb{X}'\pmb{A}\pmb{X}\) is given by

\[\psi(t)=\vert \pmb{I}-2t\pmb{A}\pmb{\Sigma} \vert^{-1/2}\exp \left\{ - \pmb{\mu}'\left[ \pmb{I}- (\pmb{I}-2t\pmb{A}\pmb{\Sigma})^{-1} \right]\pmb{\Sigma}^{-1} \right]\pmb{\mu}/2\}\]

proof

\[ \begin{aligned} &\psi(t)= c_1\int .. \int \exp\{t\pmb{x}'\pmb{A}\pmb{x}\}\exp\{-(\pmb{x}-\pmb{\mu})'\pmb{\Sigma}^{-1}(\pmb{x}-\pmb{\mu})/2\} d\pmb{x} =\\ &\psi(\pmb{t})= c_1\int .. \int \exp\{-[(\pmb{x}-\pmb{\mu})'\pmb{\Sigma}^{-1}(\pmb{x}-\pmb{\mu})-t\pmb{x}'\pmb{A}\pmb{x}]/2 d\pmb{x} =\\ &c_1\int .. \int \exp\{-[\pmb{x}'(\pmb{I}-2t\pmb{A}\pmb{\Sigma})\pmb{\Sigma}^{-1}\pmb{x} -2\pmb{\mu}'\pmb{\Sigma}^{-1}\pmb{x} + \pmb{\mu}'\pmb{\Sigma}^{-1}\pmb{\mu}]/2 \} d\pmb{x} \end{aligned} \]

where

\[c_1=1/ \left[ (2\pi)^{p/2} \vert \pmb{\Sigma} \vert^{-1/2} \right]\]

Now if t is close to 0 \(\pmb{I}-2t\pmb{A}\pmb{\Sigma}\) is nonsingular. Let \(\pmb{\theta}'=\pmb{\mu}'(\pmb{I}-2t\pmb{A}\pmb{\Sigma})^{-1}\) and \(\pmb{V}^{-1}=(\pmb{I}-2t\pmb{A}\pmb{\Sigma})\pmb{\Sigma}^{-1}\), then we have

\[\psi(t)= c_1c_2\int .. \int c_3\exp\{-(\pmb{x}-\pmb{\theta})'\pmb{V}^{-1}(\pmb{x}-\pmb{\theta})/2\} d\pmb{x}\]

where

\[c_2=1/ \left[ (2\pi)^{p/2} \vert \pmb{V} \vert^{-1/2} \right]\exp\{-(\pmb{\mu}\pmb{\Sigma}^{-1}\pmb{\mu}-\pmb{\theta}'\pmb{V}^{-1}\pmb{\theta})/2 \}\]

and

\[c_3=1/ \left[ (2\pi)^{p/2} \vert \pmb{V} \vert^{-1/2} \right]\]

The integral is one because it is integrating out a multivariate normal, and therefore

\[\psi(\pmb{t})= c_1c_2=1/ \left[ (2\pi)^{p/2} \vert \pmb{\Sigma} \vert^{-1/2} \right]1/ \left[ (2\pi)^{p/2} \vert \pmb{V} \vert^{-1/2} \right]\exp\{-(\pmb{\mu}\pmb{\Sigma}^{-1}\pmb{\mu}-\pmb{\theta}'\pmb{V}^{-1}\pmb{\theta})/2\]

and replacing the terms yields the result.

Example (5.3.7a)

Say \(\pmb{X} \sim N_p(\pmb{0},\pmb{I})\) and let \(\pmb{Y}=\sum_{i=1}^p X_i^2\). Then \(\pmb{Y}=\pmb{X'X}=\pmb{X'IX}\), and so the mgf is given by

\[\psi_Y(t)=\vert \pmb{I}-2t\pmb{A}\pmb{\Sigma} \vert^{-1/2}\exp \left\{ - \pmb{\mu}'\left[ \pmb{I}- (\pmb{I}-2t\pmb{A}\pmb{\Sigma})^{-1} \right]\pmb{\Sigma}^{-1} \right]\pmb{\mu}/2\}\\ \vert \pmb{I}-2t\pmb{I} \vert^{-1/2}\exp \left\{ - \pmb{0}'\left[ \pmb{I}- (\pmb{I}-2t\pmb{I})^{-1} \right] \right]\pmb{0}/2\}=\\ \vert (1-2t)\pmb{I} \vert^{-1/2}=(1-2t)^{-n/2}\]

which is the mgf of \(\chi^2\) distribution with n degrees of freedom.

Theorem (5.3.8)

Say \(\pmb{X}\sim N_p( \pmb{\mu}, \pmb{\Sigma})\), then

\[var(\pmb{X}'\pmb{A}\pmb{X}) = 2tr[(\pmb{A}\pmb{\Sigma})^2]+4\pmb{\mu}'\pmb{A}\pmb{\Sigma}\pmb{A}\pmb{\mu}\]

proof

From probability theory we know that for any rv Y with mgf \(\psi\) we have

\[E[Y^k] = \frac{d^k \psi(t)}{dt^k}\vert_{t=0}\]

Therefore we have

\[ \begin{aligned} &\frac{d \log \psi(t)}{dt} = \frac{\psi'(t)}{\psi(t)}\\ &\frac{d^2 \log \psi(t)}{dt^2} = \frac{\psi''(t)\psi_Y(t)-(\psi'(t))^2}{(\psi(t))^2}\\ &\frac{d^2 \log \psi(t)}{dt^2}\vert_{t=0} = \frac{E[Y^2]E[Y^0]-(E[Y])^2}{(E[Y^0])^2} =\\ &E[Y^2]-(E[Y])^2=var(Y) \end{aligned} \]

By (5.3.7) we have

\[m(t) = \log \psi_{\pmb{X}'\pmb{A}\pmb{X}}(t)=-\frac12 \log \vert \pmb{C} \vert -\frac12 \pmb{\mu}'(\pmb{I}-\pmb{C}^{-1})\pmb{\Sigma}^{-1}\pmb{\mu}\]

where \(\pmb{C}=\pmb{I}-2t\pmb{A}\pmb{\Sigma}\).

For the first term we find

\[ \begin{aligned} &\frac{d^2\log \vert C\vert}{dt^2} = \\ &\frac{d}{dt}\frac{d \vert C \vert/dt}{\vert C \vert}=\\ &\frac{d^2 \vert C \vert/dt^2\vert C \vert-(d \vert C \vert/dt)^2}{\vert C \vert^2} = \\ &\frac1{\vert \pmb{C} \vert}\left[ \frac{d^2\vert \pmb{C} \vert}{dt^2} \right]-\frac1{\vert \pmb{C} \vert^2}\left[ \frac{d\vert \pmb{C} \vert}{dt} \right]^2 \end{aligned} \]

For the second term we have

Now \(\frac{\partial \pmb{C}}{\partial t}=-2\pmb{A}\pmb{\Sigma}\). By (4.3.28) we have

\[\frac{\partial \pmb{A}^{-1}}{\partial x} = -\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\pmb{A}^{-1}\] and so

\[\frac{\partial \pmb{C}^{-1}}{\partial t} = -\pmb{C}^{-1}\frac{\partial \pmb{C}}{\partial t}\pmb{C}^{-1}=2\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1}\]

For the second derivative we find

\[ \begin{aligned} &\frac{\partial^2 \pmb{C}^{-1}}{\partial t^2} = \\ &\frac{d}{dt}\left\{2\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1}\right\} = \\ &2\frac{d}{dt}\left\{\pmb{C}^{-1}\right\}\pmb{A\Sigma}\pmb{C}^{-1} +2\pmb{C}^{-1}\pmb{A\Sigma}\left\{\pmb{C}^{-1}\right\} = \\ &4\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1} +4\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1} = \\ &\pmb{C}^{-1}\left\{\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma} +\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\right\}\pmb{C}^{-1} = \\ &8\pmb{C}^{-1}\left\{\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\right\}\pmb{C}^{-1} \\ \end{aligned} \]

so now we have

\[ \begin{aligned} &m''(t) = \frac{d^2}{dt^2}\left\{ -\frac12 \ln \vert \pmb{C} \vert -\frac12 \pmb{\mu}'(\pmb{I}-\pmb{C}^{-1})\pmb{\Sigma}^{-1}\pmb{\mu}\right\} = \\ &-\frac12 \frac1{\vert \pmb{C} \vert}\left[ \frac{d^2\vert \pmb{C} \vert}{dt^2} \right]+\frac12 \frac1{\vert \pmb{C} \vert^2}\left[ \frac{d\vert \pmb{C} \vert}{dt} \right]^2 -\frac12\pmb{\mu}'\left(-8\pmb{C}^{-1}\left\{\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\right\}\pmb{C}^{-1}\right)\pmb{\Sigma}^{-1}\pmb{\mu} = \\ &-\frac12 \frac1{\vert \pmb{C} \vert}\left[ \frac{d^2\vert \pmb{C} \vert}{dt^2} \right]+\frac12 \frac1{\vert \pmb{C} \vert^2}\left[ \frac{d\vert \pmb{C} \vert}{dt} \right]^2 +4\pmb{\mu}'\pmb{C}^{-1}\left\{\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\right\}\pmb{C}^{-1}\pmb{\Sigma}^{-1}\pmb{\mu} \\ \end{aligned} \]

If t=0 we have \(\pmb{C}=\pmb{I}\) and the second term becomes \(4\pmb{\mu}'\pmb{A\Sigma A}\pmb{\mu}\), as required.

For the first term recall that the determinant of a matrix is equal to the product of the eigenvalues, so if the eigenvalues of \(\pmb{A}\pmb{\Sigma}\) are \(\lambda_1,..,\lambda_p\) we have

\[ \begin{aligned} &\vert \pmb{C} \vert = \prod_{i=1}^p (1-t2\lambda_i) ; \vert \pmb{C} \vert_{t=0} =1\\ &1-2t\sum_{i=1}^p \lambda_i +4t^2\sum_{i\ne j}^p \lambda_i \lambda_j -+ ... (-1)^p 2^p t^p \prod_{i=1}^p \lambda_i \\ &\frac{d \vert \pmb{C} \vert}{dt} = -2\sum_{i=1}^p \lambda_i +8t\sum_{i\ne j}^p \lambda_i \lambda_j +\text{ higher order terms in t}\\ &\frac{d \vert \pmb{C} \vert}{dt}\vert_{t=0} = -2\sum_{i=1}^p \lambda_i = -2tr(\pmb{A}\pmb{\Sigma})\\ &\frac{d^2 \vert \pmb{C} \vert}{dt^2} = 8\sum_{i\ne j}^p \lambda_i \lambda_j +\text{ higher order terms in t}\\ &\frac{d^2 \vert \pmb{C} \vert}{dt^2}\vert_{t=0} =8\sum_{i\ne j}^p \lambda_i \lambda_j \end{aligned} \]

so the first term at t=0 is

\[ \begin{aligned} &-\frac12 \frac1{\vert \pmb{C} \vert}\left[ \frac{d^2\vert \pmb{C} \vert}{dt^2}\vert_{t=0} \right]+\frac12 \frac1{\vert \pmb{C} \vert^2}\left[ \frac{d\vert \pmb{C} \vert}{dt}\vert_{t=0} \right]^2 = \\ &-\frac12 8\sum_{i\ne j}^p \lambda_i \lambda_j+\frac12 [-2tr(\pmb{A}\pmb{\Sigma})]^2 =\\ &2[tr(\pmb{A}\pmb{\Sigma})]^2-4\sum_{i\ne j}^p \lambda_i \lambda_j \end{aligned} \]

and it can be shown that

\[[tr(\pmb{A}\pmb{\Sigma})]^2-4\sum_{i\ne j}^p \lambda_i \lambda_j = tr([\pmb{A}\pmb{\Sigma}]^2)\]

Example (5.3.9)

Let \(X\sim N(\mu, \sigma^2); \pmb{A}=(a), a>0\), then \(\pmb{\Sigma}=(\sigma^2)\). Let \(Z\sim N(0,1)\) and recall that \(Z^2\sim \chi^2(1)\), and therefore E[Z]=0, E[Z2]=1, E[Z3]=0 and E[Z4]=3 (=2+1=var + mean2 of a \(\chi^2(1))\). So

\[ \begin{aligned} &var(x'Ax) = var(aX^2)=a^2var(X^2)= a^2(E[X^4]-E[X^2]^2)\\ &E[X^2] = var(X)+E[X]^2 = \sigma^2+\mu^2 \\ &E[X^4] = E[(\sigma Z+\mu)^4]=\\ &\sigma^4 E[Z^4]+4\sigma^3 E[Z^3]\mu+6\sigma^2 E[Z^2]\mu^2+4\sigma E[Z]\mu^3+\mu^4 =\\ &\sigma^4\times 3+4\sigma^3\times 0\mu+6\sigma^2\times 1\mu^2+4\sigma\times0 \mu^3+\mu^4 =\\ &3\sigma^4+6\sigma^2 \mu^2+\mu^4\\ &var(x'Ax) = a^2\left(3\sigma^4+6\sigma^2 \mu^2+\mu^4-(\sigma^2+\mu^2)^2\right) = \\ &a^2\left(3\sigma^4+6\sigma^2 \mu^2+\mu^4-\sigma^4-2\sigma^2\mu^2-\mu^4\right) =\\ &a^2\left(2\sigma^4+4\sigma^2 \mu^2\right) = 2a^2\sigma^2\left(\sigma^2+2 \mu^2\right) \end{aligned} \]

but also

\[ \begin{aligned} &2tr((\pmb{A}\pmb{\Sigma})^2)+4\pmb{\mu}'\pmb{A}\pmb{\Sigma}\pmb{A}\pmb{\mu} =\\ &2a^2\sigma^4+4\mu a\sigma^2a\mu = 2a^2\sigma^2(\sigma^2+2\mu^2) \end{aligned} \]

Example (5.3.10)

Say \(\pmb{X} = \begin{pmatrix} X \\ Y \end{pmatrix}\sim N( \begin{pmatrix} 1 \\ 2 \end{pmatrix}, \begin{pmatrix} 3&1 \\ 1&2 \end{pmatrix})\) and we want to find \(var(X^2+2XY+2Y^2)\).

Let’s try this directly:

\[ \begin{aligned} &var(X^2+2XY+2Y^2) = \\ &var(X^2)+var(2XY)+var(Y^2)+\\ &2\left[ cov(X^2 2XY)+cov(X^22Y^2)+cov(2XY2Y^2) \right] = \\ &var(X^2)+4var(XY)+var(Y^2)+\\ &2\left[ 2cov(X^3Y)+2cov(X^2Y^2)+4cov(XY^3) \right] \end{aligned} \]

and many of these terms are not easy to calculate. For example, to find \(var(XY)\) we would need the distribution of the product of two correlated normal random variables. But we can use the formula above:

recall that \(\pmb{A}=\begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}\) and \(\pmb{A}\pmb{\Sigma}=\begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix}\\\), so

\[ \begin{aligned} &(\pmb{A}\pmb{\Sigma})^2= \begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix} \begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix}= \begin{pmatrix} 31&27 \\ 45&40 \end{pmatrix}\\ &tr((\pmb{A}\pmb{\Sigma})^2)=31+40=71 \end{aligned} \]

\[ \begin{aligned} &\pmb{\mu}'\pmb{A}\pmb{\Sigma}\pmb{A}\pmb{\mu} = \\ & \begin{pmatrix} 1& 2 \end{pmatrix} \begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} 1\\ 2 \end{pmatrix}=\\ & \begin{pmatrix} 1& 2 \end{pmatrix} \begin{pmatrix} 7&10 \\ 10&15 \end{pmatrix} \begin{pmatrix} 1\\ 2 \end{pmatrix}=107 \\ &var(\pmb{X}'\pmb{A}\pmb{X}) = 2tr[(\pmb{A}\pmb{\Sigma})^2]+4\pmb{\mu}'\pmb{A}\pmb{\Sigma}\pmb{A}\pmb{\mu} = \\ &2\times 71+4\times 107 = 570 \end{aligned} \]

let’s check this with R:

library(mvtnorm)
x=rmvnorm(2e5, c(1, 2), matrix(c(3, 1, 1, 2), 2, 2))
var(x[,1]^2+2*x[, 1]*x[, 2]+2*x[, 2]^2)
## [1] 570.5925

Theorem (5.3.11)

Say \(\pmb{X}\sim N_p( \pmb{\mu}, \pmb{\Sigma})\), then

\[cov(\pmb{X},\pmb{X}'\pmb{A}\pmb{X}) = 2\pmb{\Sigma}\pmb{A}\pmb{\mu}\]

proof

by definition

\[ \begin{aligned} &cov(\pmb{X},\pmb{X}'\pmb{A}\pmb{X})=\\ &E[(\pmb{X}-E[\pmb{X}])(\pmb{X}'\pmb{A}\pmb{X}-E[\pmb{X}'\pmb{A}\pmb{X}])] = \\ &E[(\pmb{X}-\pmb{\mu})(\pmb{X}'\pmb{A}\pmb{X}-tr(\pmb{A}\pmb{\Sigma})-\pmb{\mu}'\pmb{A}\pmb{\mu})] = \\ &E[(\pmb{X}-\pmb{\mu})\left((\pmb{X}-\pmb{\mu})'\pmb{A}(\pmb{X}-\pmb{\mu})-tr(\pmb{A}\pmb{\Sigma})+2(\pmb{X}-\pmb{\mu})'\pmb{A}\pmb{\mu}\right)] = \\ &E[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})'\pmb{A}(\pmb{X}-\pmb{\mu})]-E[(\pmb{X}-\pmb{\mu})tr(\pmb{A}\pmb{\Sigma})+2E[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})'\pmb{A}\pmb{\mu}] = \\ &E[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})'\pmb{A}(\pmb{X}-\pmb{\mu})]-E[\pmb{X}-\pmb{\mu}]tr(\pmb{A}\pmb{\Sigma})+2E[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})']\pmb{A}\pmb{\mu} = \\ &\pmb{0}-\pmb{0}+2\pmb{\Sigma}\pmb{A}\pmb{\mu} \end{aligned} \]

the first term is 0 because it is the third central moment of a multivariate normal.

Corollary (5.3.12)

Say \(\pmb{X}\sim N_p( \pmb{\mu}, \pmb{\Sigma})\), and let \(\pmb{B}\) be a \(n\times p\) matrix of constants, then

\[cov(\pmb{BX},\pmb{X}'\pmb{A}\pmb{X}) = 2\pmb{B\Sigma}\pmb{A}\pmb{\mu}\]

Theorem (5.3.13)

Let \(\pmb{V}= \begin{pmatrix} \pmb{X} \\ \pmb{Y} \end{pmatrix}\) be a partitioned in the usual way, with \(\pmb{\Sigma}_{xy}\) a \(p\times q\) matrix. Let \(\pmb{A}\) be a \(q\times p\) matrix of constants, then

\[E[\pmb{x'Ay}]=tr(\pmb{A\Sigma}_{xy})+ \pmb{\mu}'_y \pmb{A\mu}_x\]

proof similar to proof earlier

Example (5.3.13a)

Say \(E[X]=\mu_x\), \(var(X)=\sigma_x^2\), \(E[Y]=\mu_y\), \(var(Y)=\sigma_y^2\) and \(cor(X,Y)=\rho\). If \(\pmb{A}=(a)\) then \(\pmb{x'Ay}=axy\), \(\pmb{\Sigma}_{xy}=\sigma_x\sigma_y\rho\) and so

\[ \begin{aligned} &E[aXY] =aE[XY]=c\left(cov(X,Y)+E[X]E[Y]\right)=a\sigma_x\sigma_y\rho+a\mu_x\mu_y \\ &tr(\pmb{A\Sigma}_{xy})+ \pmb{\mu}'_y \pmb{A\mu}_x = a\sigma_x\sigma_y\rho+a\mu_x\mu_y \end{aligned} \]

Definition (5.3.14)

We have a random sample

\[\begin{pmatrix} X \\ Y \end{pmatrix}= \left(\begin{pmatrix} x_1 \\ y_1 \end{pmatrix},..,\begin{pmatrix} x_n \\ y_n \end{pmatrix} \right)\]

from some bivariate distribution with means \(\mu_x\) and \(\mu_y\), variances \(\sigma_x^2\), \(\sigma_y^2\). A standard estimator of the population covariance \(\pmb{\sigma}_{xy}\) is the sample covariance defined by

\[s_{xy}=\frac1{n-1}\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})\]

We find

\[ \begin{aligned} &s_{xy}=\frac1{n-1}\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})=\\ &\frac1{n-1}\sum_{i=1}^n x_iy_i-n\bar{x}\bar{y}= \\ &\frac1{n-1}\pmb{x}'[\pmb{I}-(1/n)\pmb{J}]\pmb{y} \end{aligned} \]

\((x_i, y_i)\) is independent of \((x_j, y_j)\) if \(i\ne j\), we can define a random vector \(\pmb{V} = \begin{pmatrix} X \\ Y \end{pmatrix}\) with \(E[\pmb{V}] = \begin{pmatrix} \mu_x\pmb{j} \\ \mu_y\pmb{j} \end{pmatrix}\) and covariance matrix

\[\pmb{\Sigma}= \begin{pmatrix} \sigma_x^2\pmb{I} &\sigma_{xy}\pmb{I} \\ \sigma_{xy}\pmb{I} & \sigma_y^2\pmb{I} \end{pmatrix}\]

Let \(\pmb{A}=\pmb{I}-(1/n)\pmb{J}\) and so

\[ \begin{aligned} &E[\pmb{x}'[\pmb{I}-(1/n)\pmb{J}]\pmb{y}] = \\ &tr\left[(\pmb{I}-(1/n)\pmb{J})\sigma_{xy}\pmb{I}\right]+\mu_y\pmb{j}'(\pmb{I}-(1/n)\pmb{J})\mu_x\pmb{j} = \\ &\sigma_{xy}tr[\pmb{I}-(1/n)\pmb{J}]+\mu_x\mu_y(\pmb{j}'\pmb{j}-(1/n)\pmb{j}'\pmb{j}\pmb{j}'\pmb{j}) = \\ &\sigma_{xy}(n-1)+0=(n-1)\sigma_{xy} \end{aligned} \]

and so \(s_{xy}\) is an unbiased estimator of \(\sigma_{xy}\).

Let’s check with R:

library(mvtnorm)
S=matrix(c(2, 0.5, 0.5, 4), 2, 2)
x=rmvnorm(1e5, c(1, 2), S)
cov(x)
##          [,1]     [,2]
## [1,] 1.996217 0.488235
## [2,] 0.488235 4.025144
sum( (x[,1]-mean(x[,1]))*(x[,2]-mean(x[,2])))/(nrow(x)-1)
## [1] 0.488235