Let xx=(x1,..,xn) be a random sample from some population with mean μ and standard deviation σ. Then the total sum of squares is given by n∑i=1x2i. Let ˉx=1n∑ni=1xi be the sample mean, then
n∑i=1(xi−ˉx)2=n∑i=1(x2i−2xiˉx+ˉx2)=n∑i=1x2i−n∑i=12xiˉx+n∑i=1ˉx2=n∑i=1x2i−2ˉxn∑i=1xi+nˉx2=n∑i=1x2i−2ˉxnˉx+nˉx2=n∑i=1x2i−nˉx2
and so we find that the total sum of squares can be partitioned into a sum of squares about the mean and the sum of squares due to the mean:
n∑i=1x2i=n∑i=1(xi−ˉx)2+nˉx2
This can also be written as a quadratic form:
n∑i=1x2i=xx′xx=xx′IIxx
Recall that jj=(1...1)′ and
JJ=(1...1⋮⋮1...1)
then
ˉx=1njj′xx and
nˉx2=n(1njj′xx)2=1nxx′jjjj′xx=1nxx′JJxx=xx′(1nJJ)xx
and so
n∑i=1x2i=xx′(II−1nJJ)xx+xx′(1nJJ)xx
proof
obvious
(II−1nJJ)2=(II−1nJJ)(II−1nJJ)=II−1nJJ−1nJJ+(1nJJ)2=II−2nJJ+1n2JJJJ=II−2nJJ+1n2(nJJ)=II−2nJJ+1nJJ=II−1nJJ
this proof also shows that (1nJJ)2=1nJJ.
(II−1nJJ)(1nJJ)=1nJJ−1nJJ=00
If XX is a random vector with mean μμ and covariance matrix ΣΣ and if AA is a symmetric matrix of constants, then
E[XX′AAXX]=tr(AAΣΣ)+μμ′AAμμ
proof
Note that XX′AAXX is a scalar, so always XX′AAXX=tr(XX′AAXX)
ΣΣ=E[XXXX′]′−μμμμ′E[XXXX′]=ΣΣ+μμμμ′E[XX′AAXX]=E[tr(XX′AAXX)]= (by 4.2.11)E[tr(AAXXXX′)]=tr(E[AAXXXX′]=tr(AAE[XXXX′]=tr(AA[ΣΣ+μμμμ′])=tr(AAΣΣ+AAμμμμ′)=tr(AAΣΣ)+tr(AAμμμμ′)=tr(AAΣΣ)+tr(μμ′AAμμ)=tr(AAΣΣ)+μμ′AAμμ
Let X∼N(μ,σ2);AA=(a),a>0, then ΣΣ=(σ2)
E[x′Ax]=E[aX2]=aE[X2]=a(var(X)+E[X]2)=a(σ2+μ2)=aσ2+aμ2=tr(AAΣΣ)+μμ′AAμμ
Say XX=(XY)∼N((12),(3112)) and we want to find E[X2+2XY+2Y2].
Direct solution:
E[X2+2XY+2Y2]=E[X2]+2E[XY]+2E[Y2]=var(X)+E[X]2+2(cov(X,Y)+E[X]E[Y])+2(var(Y)+E[Y]2)=3+12+2(1+1∗2)+2(2+22)=4+6+12=22
or using the formula above: say AA=(1112), then
XX′AAXX=(XY)(1112)(XY)=(XY)(X+YX+2Y)=X(X+Y)+Y(X+2Y)=X2+2XY+2Y2AAΣΣ=(1112)(3112)=(4355)tr(AAΣΣ)=4+5=9μμ′AAμμ=(12)(1112)(12)=(12)(35)=13E[XX′AAXX]=tr(AAΣΣ)+μμ′AAμμ=9+13=22
Say XX=(X1...Xn)′ has a multivariate normal distribution with mean vector μμ and covariance matrix Σ. We want to find E[∑ni=1X2i]. Of course we have
E[n∑i=1X2i]=n∑i=1E[X2i]=n∑i=1(var(Xi)+E[Xi]2)=n∑i=1var(Xi)+n∑i=1E[Xi]2=n∑i=1σii+n∑i=1μ2i=tr(ΣΣ)+μ′μμ′μ=tr(IΣIΣ)+μ′Iμμ′Iμ
but n∑i=1X2i=X′XX′X=X′IXX′IX
Let xx=(x1,..,xn) be a sample from a random vector XX . The sample variance is defined by
s2=1n−1n∑i=1(xi−ˉx)2
By (5.3.1) we have
(n−1)s2=xx′(II−1nJJ)xx
Say E[X1]=μ and var(X1)=σ2. In a random sample the Xi’s are independent and identically distributed, so E[XX]=μjj and cov(XX)=σ2II. Set
AA=II−1nJJ,ΣΣ=σ2II and μμ=μjj, therefore
E[n∑i=1(Xi−ˉX)2]=E[(n−1)s2]=E[XX′(II−1nJJ)XX]=(by 5.3.2) tr[(II−1nJJ)σ2II]+μjj′(II−1nJJ)μjj=σ2tr[II−1nJJ]+μ2(jj′jj−jj′jj1njj′jj)=σ2(n−nn)+μ2(n−1nn2)=σ2(n−1)
and so
E[s2]=1n−1E[n∑i=1(xi−ˉx)2]=σ2
and we see that s2 is an unbiased estimator of σ2.
Let XX∼Np(μμ,ΣΣ), then the moment generating function of XX′AAXX is given by
ψ(t)=|II−2tAAΣΣ|−1/2exp{−μμ′[II−(II−2tAAΣΣ)−1]ΣΣ−1]μμ/2}
proof
ψ(t)=c1∫..∫exp{txx′AAxx}exp{−(xx−μμ)′ΣΣ−1(xx−μμ)/2}dxx=ψ(tt)=c1∫..∫exp{−[(xx−μμ)′ΣΣ−1(xx−μμ)−txx′AAxx]/2dxx=c1∫..∫exp{−[xx′(II−2tAAΣΣ)ΣΣ−1xx−2μμ′ΣΣ−1xx+μμ′ΣΣ−1μμ]/2}dxx
where
c1=1/[(2π)p/2|ΣΣ|−1/2]
Now if t is close to 0 II−2tAAΣΣ is nonsingular. Let θθ′=μμ′(II−2tAAΣΣ)−1 and VV−1=(II−2tAAΣΣ)ΣΣ−1, then we have
ψ(t)=c1c2∫..∫c3exp{−(xx−θθ)′VV−1(xx−θθ)/2}dxx
where
c2=1/[(2π)p/2|VV|−1/2]exp{−(μμΣΣ−1μμ−θθ′VV−1θθ)/2}
and
c3=1/[(2π)p/2|VV|−1/2]
The integral is one because it is integrating out a multivariate normal, and therefore
ψ(tt)=c1c2=1/[(2π)p/2|ΣΣ|−1/2]1/[(2π)p/2|VV|−1/2]exp{−(μμΣΣ−1μμ−θθ′VV−1θθ)/2
and replacing the terms yields the result.
Say XX∼Np(00,II) and let YY=∑pi=1X2i. Then YY=X′XX′X=X′IXX′IX, and so the mgf is given by
ψY(t)=|II−2tAAΣΣ|−1/2exp{−μμ′[II−(II−2tAAΣΣ)−1]ΣΣ−1]μμ/2}|II−2tII|−1/2exp{−00′[II−(II−2tII)−1]]00/2}=|(1−2t)II|−1/2=(1−2t)−n/2
which is the mgf of χ2 distribution with n degrees of freedom.
Say XX∼Np(μμ,ΣΣ), then
var(XX′AAXX)=2tr[(AAΣΣ)2]+4μμ′AAΣΣAAμμ
proof
From probability theory we know that for any rv Y with mgf ψ we have
E[Yk]=dkψ(t)dtk|t=0
Therefore we have
dlogψ(t)dt=ψ′(t)ψ(t)d2logψ(t)dt2=ψ″
By (5.3.7) we have
m(t) = \log \psi_{\pmb{X}'\pmb{A}\pmb{X}}(t)=-\frac12 \log \vert \pmb{C} \vert -\frac12 \pmb{\mu}'(\pmb{I}-\pmb{C}^{-1})\pmb{\Sigma}^{-1}\pmb{\mu}
where \pmb{C}=\pmb{I}-2t\pmb{A}\pmb{\Sigma}.
For the first term we find
\begin{aligned} &\frac{d^2\log \vert C\vert}{dt^2} = \\ &\frac{d}{dt}\frac{d \vert C \vert/dt}{\vert C \vert}=\\ &\frac{d^2 \vert C \vert/dt^2\vert C \vert-(d \vert C \vert/dt)^2}{\vert C \vert^2} = \\ &\frac1{\vert \pmb{C} \vert}\left[ \frac{d^2\vert \pmb{C} \vert}{dt^2} \right]-\frac1{\vert \pmb{C} \vert^2}\left[ \frac{d\vert \pmb{C} \vert}{dt} \right]^2 \end{aligned}
For the second term we have
Now \frac{\partial \pmb{C}}{\partial t}=-2\pmb{A}\pmb{\Sigma}. By (4.3.28) we have
\frac{\partial \pmb{A}^{-1}}{\partial x} = -\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\pmb{A}^{-1} and so
\frac{\partial \pmb{C}^{-1}}{\partial t} = -\pmb{C}^{-1}\frac{\partial \pmb{C}}{\partial t}\pmb{C}^{-1}=2\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1}
For the second derivative we find
\begin{aligned} &\frac{\partial^2 \pmb{C}^{-1}}{\partial t^2} = \\ &\frac{d}{dt}\left\{2\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1}\right\} = \\ &2\frac{d}{dt}\left\{\pmb{C}^{-1}\right\}\pmb{A\Sigma}\pmb{C}^{-1} +2\pmb{C}^{-1}\pmb{A\Sigma}\left\{\pmb{C}^{-1}\right\} = \\ &4\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1} +4\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\pmb{C}^{-1} = \\ &\pmb{C}^{-1}\left\{\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma} +\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\right\}\pmb{C}^{-1} = \\ &8\pmb{C}^{-1}\left\{\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\right\}\pmb{C}^{-1} \\ \end{aligned}
so now we have
\begin{aligned} &m''(t) = \frac{d^2}{dt^2}\left\{ -\frac12 \ln \vert \pmb{C} \vert -\frac12 \pmb{\mu}'(\pmb{I}-\pmb{C}^{-1})\pmb{\Sigma}^{-1}\pmb{\mu}\right\} = \\ &-\frac12 \frac1{\vert \pmb{C} \vert}\left[ \frac{d^2\vert \pmb{C} \vert}{dt^2} \right]+\frac12 \frac1{\vert \pmb{C} \vert^2}\left[ \frac{d\vert \pmb{C} \vert}{dt} \right]^2 -\frac12\pmb{\mu}'\left(-8\pmb{C}^{-1}\left\{\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\right\}\pmb{C}^{-1}\right)\pmb{\Sigma}^{-1}\pmb{\mu} = \\ &-\frac12 \frac1{\vert \pmb{C} \vert}\left[ \frac{d^2\vert \pmb{C} \vert}{dt^2} \right]+\frac12 \frac1{\vert \pmb{C} \vert^2}\left[ \frac{d\vert \pmb{C} \vert}{dt} \right]^2 +4\pmb{\mu}'\pmb{C}^{-1}\left\{\pmb{A\Sigma}\pmb{C}^{-1}\pmb{A\Sigma}\right\}\pmb{C}^{-1}\pmb{\Sigma}^{-1}\pmb{\mu} \\ \end{aligned}
If t=0 we have \pmb{C}=\pmb{I} and the second term becomes 4\pmb{\mu}'\pmb{A\Sigma A}\pmb{\mu}, as required.
For the first term recall that the determinant of a matrix is equal to the product of the eigenvalues, so if the eigenvalues of \pmb{A}\pmb{\Sigma} are \lambda_1,..,\lambda_p we have
\begin{aligned} &\vert \pmb{C} \vert = \prod_{i=1}^p (1-t2\lambda_i) ; \vert \pmb{C} \vert_{t=0} =1\\ &1-2t\sum_{i=1}^p \lambda_i +4t^2\sum_{i\ne j}^p \lambda_i \lambda_j -+ ... (-1)^p 2^p t^p \prod_{i=1}^p \lambda_i \\ &\frac{d \vert \pmb{C} \vert}{dt} = -2\sum_{i=1}^p \lambda_i +8t\sum_{i\ne j}^p \lambda_i \lambda_j +\text{ higher order terms in t}\\ &\frac{d \vert \pmb{C} \vert}{dt}\vert_{t=0} = -2\sum_{i=1}^p \lambda_i = -2tr(\pmb{A}\pmb{\Sigma})\\ &\frac{d^2 \vert \pmb{C} \vert}{dt^2} = 8\sum_{i\ne j}^p \lambda_i \lambda_j +\text{ higher order terms in t}\\ &\frac{d^2 \vert \pmb{C} \vert}{dt^2}\vert_{t=0} =8\sum_{i\ne j}^p \lambda_i \lambda_j \end{aligned}
so the first term at t=0 is
\begin{aligned} &-\frac12 \frac1{\vert \pmb{C} \vert}\left[ \frac{d^2\vert \pmb{C} \vert}{dt^2}\vert_{t=0} \right]+\frac12 \frac1{\vert \pmb{C} \vert^2}\left[ \frac{d\vert \pmb{C} \vert}{dt}\vert_{t=0} \right]^2 = \\ &-\frac12 8\sum_{i\ne j}^p \lambda_i \lambda_j+\frac12 [-2tr(\pmb{A}\pmb{\Sigma})]^2 =\\ &2[tr(\pmb{A}\pmb{\Sigma})]^2-4\sum_{i\ne j}^p \lambda_i \lambda_j \end{aligned}
and it can be shown that
[tr(\pmb{A}\pmb{\Sigma})]^2-4\sum_{i\ne j}^p \lambda_i \lambda_j = tr([\pmb{A}\pmb{\Sigma}]^2)
Let X\sim N(\mu, \sigma^2); \pmb{A}=(a), a>0, then \pmb{\Sigma}=(\sigma^2). Let Z\sim N(0,1) and recall that Z^2\sim \chi^2(1), and therefore E[Z]=0, E[Z2]=1, E[Z3]=0 and E[Z4]=3 (=2+1=var + mean2 of a \chi^2(1)). So
\begin{aligned} &var(x'Ax) = var(aX^2)=a^2var(X^2)= a^2(E[X^4]-E[X^2]^2)\\ &E[X^2] = var(X)+E[X]^2 = \sigma^2+\mu^2 \\ &E[X^4] = E[(\sigma Z+\mu)^4]=\\ &\sigma^4 E[Z^4]+4\sigma^3 E[Z^3]\mu+6\sigma^2 E[Z^2]\mu^2+4\sigma E[Z]\mu^3+\mu^4 =\\ &\sigma^4\times 3+4\sigma^3\times 0\mu+6\sigma^2\times 1\mu^2+4\sigma\times0 \mu^3+\mu^4 =\\ &3\sigma^4+6\sigma^2 \mu^2+\mu^4\\ &var(x'Ax) = a^2\left(3\sigma^4+6\sigma^2 \mu^2+\mu^4-(\sigma^2+\mu^2)^2\right) = \\ &a^2\left(3\sigma^4+6\sigma^2 \mu^2+\mu^4-\sigma^4-2\sigma^2\mu^2-\mu^4\right) =\\ &a^2\left(2\sigma^4+4\sigma^2 \mu^2\right) = 2a^2\sigma^2\left(\sigma^2+2 \mu^2\right) \end{aligned}
but also
\begin{aligned} &2tr((\pmb{A}\pmb{\Sigma})^2)+4\pmb{\mu}'\pmb{A}\pmb{\Sigma}\pmb{A}\pmb{\mu} =\\ &2a^2\sigma^4+4\mu a\sigma^2a\mu = 2a^2\sigma^2(\sigma^2+2\mu^2) \end{aligned}
Say \pmb{X} = \begin{pmatrix} X \\ Y \end{pmatrix}\sim N( \begin{pmatrix} 1 \\ 2 \end{pmatrix}, \begin{pmatrix} 3&1 \\ 1&2 \end{pmatrix}) and we want to find var(X^2+2XY+2Y^2).
Let’s try this directly:
\begin{aligned} &var(X^2+2XY+2Y^2) = \\ &var(X^2)+var(2XY)+var(Y^2)+\\ &2\left[ cov(X^2 2XY)+cov(X^22Y^2)+cov(2XY2Y^2) \right] = \\ &var(X^2)+4var(XY)+var(Y^2)+\\ &2\left[ 2cov(X^3Y)+2cov(X^2Y^2)+4cov(XY^3) \right] \end{aligned}
and many of these terms are not easy to calculate. For example, to find var(XY) we would need the distribution of the product of two correlated normal random variables. But we can use the formula above:
recall that \pmb{A}=\begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} and \pmb{A}\pmb{\Sigma}=\begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix}\\, so
\begin{aligned} &(\pmb{A}\pmb{\Sigma})^2= \begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix} \begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix}= \begin{pmatrix} 31&27 \\ 45&40 \end{pmatrix}\\ &tr((\pmb{A}\pmb{\Sigma})^2)=31+40=71 \end{aligned}
\begin{aligned} &\pmb{\mu}'\pmb{A}\pmb{\Sigma}\pmb{A}\pmb{\mu} = \\ & \begin{pmatrix} 1& 2 \end{pmatrix} \begin{pmatrix} 4&3 \\ 5&5 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} 1\\ 2 \end{pmatrix}=\\ & \begin{pmatrix} 1& 2 \end{pmatrix} \begin{pmatrix} 7&10 \\ 10&15 \end{pmatrix} \begin{pmatrix} 1\\ 2 \end{pmatrix}=107 \\ &var(\pmb{X}'\pmb{A}\pmb{X}) = 2tr[(\pmb{A}\pmb{\Sigma})^2]+4\pmb{\mu}'\pmb{A}\pmb{\Sigma}\pmb{A}\pmb{\mu} = \\ &2\times 71+4\times 107 = 570 \end{aligned}
let’s check this with R:
library(mvtnorm)
x=rmvnorm(2e5, c(1, 2), matrix(c(3, 1, 1, 2), 2, 2))
var(x[,1]^2+2*x[, 1]*x[, 2]+2*x[, 2]^2)
## [1] 570.5925
Say \pmb{X}\sim N_p( \pmb{\mu}, \pmb{\Sigma}), then
cov(\pmb{X},\pmb{X}'\pmb{A}\pmb{X}) = 2\pmb{\Sigma}\pmb{A}\pmb{\mu}
proof
by definition
\begin{aligned} &cov(\pmb{X},\pmb{X}'\pmb{A}\pmb{X})=\\ &E[(\pmb{X}-E[\pmb{X}])(\pmb{X}'\pmb{A}\pmb{X}-E[\pmb{X}'\pmb{A}\pmb{X}])] = \\ &E[(\pmb{X}-\pmb{\mu})(\pmb{X}'\pmb{A}\pmb{X}-tr(\pmb{A}\pmb{\Sigma})-\pmb{\mu}'\pmb{A}\pmb{\mu})] = \\ &E[(\pmb{X}-\pmb{\mu})\left((\pmb{X}-\pmb{\mu})'\pmb{A}(\pmb{X}-\pmb{\mu})-tr(\pmb{A}\pmb{\Sigma})+2(\pmb{X}-\pmb{\mu})'\pmb{A}\pmb{\mu}\right)] = \\ &E[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})'\pmb{A}(\pmb{X}-\pmb{\mu})]-E[(\pmb{X}-\pmb{\mu})tr(\pmb{A}\pmb{\Sigma})+2E[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})'\pmb{A}\pmb{\mu}] = \\ &E[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})'\pmb{A}(\pmb{X}-\pmb{\mu})]-E[\pmb{X}-\pmb{\mu}]tr(\pmb{A}\pmb{\Sigma})+2E[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})']\pmb{A}\pmb{\mu} = \\ &\pmb{0}-\pmb{0}+2\pmb{\Sigma}\pmb{A}\pmb{\mu} \end{aligned}
the first term is 0 because it is the third central moment of a multivariate normal.
Say \pmb{X}\sim N_p( \pmb{\mu}, \pmb{\Sigma}), and let \pmb{B} be a n\times p matrix of constants, then
cov(\pmb{BX},\pmb{X}'\pmb{A}\pmb{X}) = 2\pmb{B\Sigma}\pmb{A}\pmb{\mu}
Let \pmb{V}= \begin{pmatrix} \pmb{X} \\ \pmb{Y} \end{pmatrix} be a partitioned in the usual way, with \pmb{\Sigma}_{xy} a p\times q matrix. Let \pmb{A} be a q\times p matrix of constants, then
E[\pmb{x'Ay}]=tr(\pmb{A\Sigma}_{xy})+ \pmb{\mu}'_y \pmb{A\mu}_x
proof similar to proof earlier
Say E[X]=\mu_x, var(X)=\sigma_x^2, E[Y]=\mu_y, var(Y)=\sigma_y^2 and cor(X,Y)=\rho. If \pmb{A}=(a) then \pmb{x'Ay}=axy, \pmb{\Sigma}_{xy}=\sigma_x\sigma_y\rho and so
\begin{aligned} &E[aXY] =aE[XY]=c\left(cov(X,Y)+E[X]E[Y]\right)=a\sigma_x\sigma_y\rho+a\mu_x\mu_y \\ &tr(\pmb{A\Sigma}_{xy})+ \pmb{\mu}'_y \pmb{A\mu}_x = a\sigma_x\sigma_y\rho+a\mu_x\mu_y \end{aligned}
We have a random sample
\begin{pmatrix} X \\ Y \end{pmatrix}= \left(\begin{pmatrix} x_1 \\ y_1 \end{pmatrix},..,\begin{pmatrix} x_n \\ y_n \end{pmatrix} \right)
from some bivariate distribution with means \mu_x and \mu_y, variances \sigma_x^2, \sigma_y^2. A standard estimator of the population covariance \pmb{\sigma}_{xy} is the sample covariance defined by
s_{xy}=\frac1{n-1}\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})
We find
\begin{aligned} &s_{xy}=\frac1{n-1}\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})=\\ &\frac1{n-1}\sum_{i=1}^n x_iy_i-n\bar{x}\bar{y}= \\ &\frac1{n-1}\pmb{x}'[\pmb{I}-(1/n)\pmb{J}]\pmb{y} \end{aligned}
(x_i, y_i) is independent of (x_j, y_j) if i\ne j, we can define a random vector \pmb{V} = \begin{pmatrix} X \\ Y \end{pmatrix} with E[\pmb{V}] = \begin{pmatrix} \mu_x\pmb{j} \\ \mu_y\pmb{j} \end{pmatrix} and covariance matrix
\pmb{\Sigma}= \begin{pmatrix} \sigma_x^2\pmb{I} &\sigma_{xy}\pmb{I} \\ \sigma_{xy}\pmb{I} & \sigma_y^2\pmb{I} \end{pmatrix}
Let \pmb{A}=\pmb{I}-(1/n)\pmb{J} and so
\begin{aligned} &E[\pmb{x}'[\pmb{I}-(1/n)\pmb{J}]\pmb{y}] = \\ &tr\left[(\pmb{I}-(1/n)\pmb{J})\sigma_{xy}\pmb{I}\right]+\mu_y\pmb{j}'(\pmb{I}-(1/n)\pmb{J})\mu_x\pmb{j} = \\ &\sigma_{xy}tr[\pmb{I}-(1/n)\pmb{J}]+\mu_x\mu_y(\pmb{j}'\pmb{j}-(1/n)\pmb{j}'\pmb{j}\pmb{j}'\pmb{j}) = \\ &\sigma_{xy}(n-1)+0=(n-1)\sigma_{xy} \end{aligned}
and so s_{xy} is an unbiased estimator of \sigma_{xy}.
Let’s check with R:
library(mvtnorm)
S=matrix(c(2, 0.5, 0.5, 4), 2, 2)
x=rmvnorm(1e5, c(1, 2), S)
cov(x)
## [,1] [,2]
## [1,] 1.996217 0.488235
## [2,] 0.488235 4.025144
sum( (x[,1]-mean(x[,1]))*(x[,2]-mean(x[,2])))/(nrow(x)-1)
## [1] 0.488235