anova-estimation

Estimation in ANOVA

Quite generally the main interest in regression is in estimating the parameter vector \(\pmb{\beta}\) and/or in prediction. On the contrary in an ANOVA problem the interest is most often in hypothesis tests for the \(\pmb{\beta}\). However, as always in frequentist statistics, estimation and testing are closely related, and so we will first discuss estimation.

Estimation of \(\pmb{\beta}\)

We consider the model

\[\pmb{y}=\pmb{X\beta}+\pmb{\epsilon}\]

where \(\pmb{X}\) is \(n\times p\) with rank \(k<p\le n\). We also have the assumptions \(E[\pmb{y}]=\pmb{X\beta}\), \(cov(\pmb{y})=\sigma^2\pmb{I}\).

Using least squares we need to find \(\pmb{\hat{\beta}}\) that minimizes

\[\pmb{\epsilon'\epsilon}= (\pmb{y}-\pmb{X\hat{\beta}})'(\pmb{y}-\pmb{X\hat{\beta}})\]

as before we can expand this and differentiate and arrive at the normal equations:

\[\pmb{X'X\hat{\beta}}=\pmb{X'y}\]

but now \(\pmb{X'X}\) is singular and has no inverse. In fact this system of equations has infinitely many solutions:

Theorem (7.2.1)

If \(\pmb{X}\) is \(n\times p\) with rank \(k<p\le n\), the system of equations

\[\pmb{X'X\hat{\beta}}=\pmb{X'y}\]

is consistent (aka has solutions).

proof omitted

Because the system of equations is consistent, by (4.2.14) a solution is given by

\[\pmb{\hat{\beta}}=(\pmb{X'X})^{-}\pmb{X'y}\]

where \((\pmb{X'X})^{-}\) is a generalized inverse of \(\pmb{X'X}\).

For any generalized inverse we have

\[E[\pmb{\hat{\beta}}]=(\pmb{X'X})^{-}\pmb{X'}E[\pmb{y}] = (\pmb{X'X})^{-}\pmb{X'X\beta}\]

and so \(\pmb{\hat{\beta}}\) is an unbiased estimator of \((\pmb{X'X})^{-}\pmb{X'X\beta}\). However, since \((\pmb{X'X})^{-}\pmb{X'X}\ne\pmb{I}\), \(\pmb{\hat{\beta}}\) is not an unbiased estimator of \(\pmb{\beta}\). In fact, \(E[\pmb{\hat{\beta}}]\) depends on the particular choice of \((\pmb{X'X})^{-}\).

Is there a matrix \(\pmb{A}\) such that \(E[\pmb{Ay}]=\pmb{\beta}\)? If so, then

\[\pmb{\beta}=E[\pmb{Ay}]=E[\pmb{A(X\beta+\epsilon})]=E[\pmb{AX\beta}]+\pmb{A}E[\pmb{\epsilon}]=\pmb{AX\beta}\]

This must hold for all \(\pmb{\beta}\), and so we must have \(\pmb{AX}=\pmb{I}\). But \(rank(\pmb{AX})<p\), and so no such matrix \(\pmb{A}\) can exist.

Example (7.2.2)

Let’s consider a simple oneway model with two groups and three repeated measurements:

\[y_{ij}=\mu+\alpha_i+\epsilon_{ij}\]

with i=1,2 and j=1,2,3. So

\[\pmb{\beta} = \begin{pmatrix} \mu \\ \alpha_1 \\ \alpha_2 \end{pmatrix}\] \[ \pmb{X} = \begin{pmatrix} 1 & 1& 0\\ 1 & 1& 0\\ 1 & 1& 0\\ 1 & 0& 1\\ 1 & 0& 1\\ 1 & 0& 1\\ \end{pmatrix} \]

\[ \pmb{X'X} = \begin{pmatrix} 6 & 3& 3\\ 3 & 3& 0\\ 3 & 0 & 3\\ \end{pmatrix} \]

A generalized inverse is given by

\[ (\pmb{X'X})^{-} = \begin{pmatrix} 0 & 0& 0\\ 0 & 1/3& 0\\ 0 & 0 & 1/3\\ \end{pmatrix} \] let’s check:

X=cbind(1, c(1,1,1,0,0,0), c(0,0,0,1,1,1))
X

##      [,1] [,2] [,3]
## [1,]    1    1    0
## [2,]    1    1    0
## [3,]    1    1    0
## [4,]    1    0    1
## [5,]    1    0    1
## [6,]    1    0    1

X.X=t(X)%*%X
X.X

##      [,1] [,2] [,3]
## [1,]    6    3    3
## [2,]    3    3    0
## [3,]    3    0    3

g.X=diag(c(0,1,1)/3)
g.X

##      [,1]      [,2]      [,3]
## [1,]    0 0.0000000 0.0000000
## [2,]    0 0.3333333 0.0000000
## [3,]    0 0.0000000 0.3333333

X.X%*%g.X%*%X.X

##      [,1] [,2] [,3]
## [1,]    6    3    3
## [2,]    3    3    0
## [3,]    3    0    3

so this is indeed a generalized inverse. So now

\[ \pmb{X'y} = \begin{pmatrix} \sum_{i,j} y_{ij}\\ \sum_{j} y_{1j} \\ \sum_{j} y_{2j} \end{pmatrix} = \begin{pmatrix} y_{..}\\ y_{1.} \\ y_{2.} \end{pmatrix}\\ \text{ }\\ \pmb{\hat{\beta}} = (\pmb{X'X})^{-}\pmb{X'y} = \\ \begin{pmatrix} 0 & 0& 0\\ 0 & 1/3& 0\\ 0 & 0 & 1/3\\ \end{pmatrix} \begin{pmatrix} \sum_{i,j} y_{ij}\\ \sum_{j} y_{1j} \\ \sum_{j} y_{2j} \end{pmatrix} = \begin{pmatrix} 0\\ \frac13 \sum_{j} y_{1j} \\ \frac13 \sum_{j} y_{2j} \end{pmatrix} = \begin{pmatrix} 0\\ \bar{y}_{1.} \\ \bar{y}_{2.} \end{pmatrix} \]

Note

\[ \begin{aligned} &E[\bar{y}_{i.}] = \\ &E[\frac13 \sum_{j} y_{ij}] = \\ &\frac13 \sum_{j} E[y_{ij}] = \\ &\frac13 \sum_{j} E[\mu+\alpha_i+\epsilon_{ij}] = \\ &\frac13 \sum_{j}\left[\mu+\alpha_i+0\right] = \\ &\frac13 3\left[\mu+\alpha_i\right] = \mu+\alpha_i \end{aligned} \]

Example (7.2.3)

Let’s return to the hearing aid data set, see (7.1.2). Here we have a model of the form

\[y_i=\mu+\alpha_i+\beta_j+\epsilon_{ij}\] i=1,..,4 and j=1,..,24.

X=make.X(4, 24)
y=as.matrix(hearingaid[, 1, drop=FALSE])
X.X=t(X)%*%X
library(MASS)
gX=ginv(X.X)
betahat=gX%*%t(X)%*%y
round(c(betahat), 3)

##  [1]   9.677   2.419   2.419   2.419   2.419 -11.097 -10.097  -9.097  -8.097
## [10]  -7.097  -6.097  -5.097  -4.097  -3.097  -2.097  -1.097  -0.097   0.903
## [19]   1.903   2.903   3.903   4.903   5.903   6.903   7.903   8.903   9.903
## [28]  10.903  11.903

Estimable Functions

If we can not estimate \(\pmb{\beta}\), can we instead estimate a linear function of \(\pmb{\beta}\), say \(\pmb{\lambda'\beta}\)?

Definition (7.2.4)

A linear function of parameters \(\pmb{\lambda'\beta}\) is said to be estimable if there exists a vector \(\pmb{a}\) such that \(E[\pmb{a'y}]=\pmb{\lambda'\beta}\).

Theorem (7.2.5)

Say \(\pmb{y}=\pmb{X\beta}+\pmb{\epsilon}\) with \(E[\pmb{y}]=\pmb{X\beta}\) and \(\pmb{X}\) is \(n\times p\) of rank \(k<p\le n\). A linear function \(\pmb{\lambda'\beta}\) is estimable if and only if one of the following conditions holds

\(\pmb{\lambda}'\) is a linear combination of the rows of \(\pmb{X}\).
\(\pmb{\lambda}'\) is a linear combination of the rows of \(\pmb{X'X}\) or \(\pmb{\lambda}\) is a linear combination of the columns of \(\pmb{X'X}\).
Either \(\pmb{X'X(X'X)^{-}\lambda}=\pmb{\lambda}\) or \(\pmb{\lambda'(X'X)^{-}X'X}=\pmb{\lambda}'\)

proof

say \(\pmb{\lambda}'\) is a linear combination of the rows of \(\pmb{X}\), then there exists \(\pmb{a}\) such that \(\pmb{\lambda}'=\pmb{a'X}\) and so

\[E[\pmb{a'y}]=\pmb{a'}E[\pmb{y}]=\pmb{a'X\beta}=\pmb{\lambda'\beta}\]

Conversely, if \(\pmb{\lambda'\beta}\) is estimable there exists \(\pmb{a}\) such that \(E[\pmb{a'y}]=\pmb{\lambda'\beta}\). Therefore \(\pmb{a'X\beta}=\pmb{\lambda'\beta}\) for all \(\pmb{\beta}\), and therefore \(\pmb{a'X}=\pmb{\lambda'}\), and so \(\pmb{\lambda}'\) is a linear combination of the rows of \(\pmb{X}\).

proofs of ii and iii omitted.

Example (7.2.6)

Consider again the simple oneway model with two groups and three repeated measurements from (7.2.2). That is \(\pmb{\beta} = \begin{pmatrix} \mu & \alpha_1 &\alpha_2 \end{pmatrix}'\) and

\[ \pmb{X} = \begin{pmatrix} 1 & 1&0 \\ 1 & 1&0 \\ 1 & 1&0 \\ 1 & 0&1 \\ 1 & 0&1 \\ 1 & 0&1 \\ \end{pmatrix} \]

We want to show that \(\alpha_1-\alpha_2\) is estimable. Note that

\[\alpha_1-\alpha_2=\begin{pmatrix} 0 & 1 & -1 \end{pmatrix}\begin{pmatrix} \mu \\ \alpha_1 \\\alpha_2 \end{pmatrix}=\pmb{\lambda'\beta}\] we see that \(\pmb{\lambda}'=\begin{pmatrix} 0 & 1 & -1 \end{pmatrix}\).

Using (7.2.4i):

We need to find \(\pmb{a}\) such that \(\pmb{a'X}=\pmb{\lambda}'\). In fact, if \(\pmb{a}'= \begin{pmatrix} 0 & 0 & 1 & -1 & 0 & 0 \end{pmatrix}\), then we have \(\pmb{a'X}=\pmb{\lambda}'\).

\(\pmb{a}\) here is not unique, there are many other choices.

Using (7.2.4ii):

We have

\[ \pmb{X'X} = \begin{pmatrix} 6 & 3& 3\\ 3 & 3& 0\\ 3 & 0& 3\\ \end{pmatrix} \]

Now we need a vector \(\pmb{a}\) such that \(\pmb{X'Xa}=\begin{pmatrix} 0 & 1 & -1 \end{pmatrix}'\). One such vector is \(\pmb{a} = \begin{pmatrix} 0 &1/3 & -1/3 \end{pmatrix}'\). Again there are other possibilities.

Using (7.2.4iii):

We saw before that a generalized inverse is given by

\[ (\pmb{X'X})^{-} = \begin{pmatrix} 0 & 0& 0\\ 0 & 1/3& 0\\ 0 & 0 & 1/3\\ \end{pmatrix} \]

and we see easily that for \(\pmb{\lambda}'=\begin{pmatrix} 0 & 1 & -1 \end{pmatrix}\) we have \(\pmb{X'X}(\pmb{X'X})^{-}\pmb{\lambda}=\pmb{\lambda}\).

Theorem (7.2.7)

The number of estimable linear functions of \(\pmb{\beta}\) is equal to the rank of \(\pmb{X}\).

proof omitted

From the two theorems above it is clear that we can check the rows of \(\pmb{X}\) or of \(\pmb{X'X}\) to see which functions are estimable.

Example (7.2.8)

Consider a twoway model with two groups and no repeated measurements. That is \(\pmb{\beta} = \begin{pmatrix} \mu & \alpha_1 &\alpha_2 &\beta_1 &\beta_2 \end{pmatrix}'\) and

\[ \pmb{X} = \begin{pmatrix} 1 & 1&0 & 1 & 0 \\ 1 & 1&0 & 0 & 1 \\ 1 & 0&1 & 1 & 0 \\ 1 & 0&1 & 0 & 1 \\ \end{pmatrix} \]

To get to a matrix with only linearly independent rows we can proceed as follows:

subtract the first row from all others:

\[ \begin{pmatrix} 1 & 1&0 & 1 & 0 \\ 0 & 0&0 & -1 & 1 \\ 0 & -1&1 & 0 & 0 \\ 0 & -1&1 & -1 & 1 \\ \end{pmatrix} \]

subtract second and third from fourth:

\[ \begin{pmatrix} 1 & 1&0 & 1 & 0 \\ 0 & 0&0 & -1 & 1 \\ 0 & -1&1 & 0 & 0 \\ 0 & 0&0 & 0 & 0 \\ \end{pmatrix} \]

taking the first three rows as \(\pmb{\lambda}_1'\),\(\pmb{\lambda}_2'\) and \(\pmb{\lambda}_3'\), we find

\[ \begin{aligned} &\pmb{\lambda}_1'\pmb{\beta} = \mu+\alpha_1+\beta_1\\ &\pmb{\lambda}_2'\pmb{\beta} = \beta_2-\beta_1 \\ &\pmb{\lambda}_3'\pmb{\beta} = \alpha_2-\alpha_1\\ \end{aligned} \]

Definition (7.2.9)

Let \(\pmb{a}\) be a vector such \(\sum a_i=0\). The \(\pmb{a'\beta}\) is called a contrast.

Example (7.2.10)

In the example above the second and third linear combination are contrasts.

Theorem (7.2.11)

Let \(\pmb{\lambda'\beta}\) be an estimable function. Let \(\pmb{\hat{\beta}}\) be any solution to the normal equations \(\pmb{X'X\beta}=\pmb{X'y}\) and let \(\pmb{a}\) be any solution to \(\pmb{X'Xa}=\pmb{\lambda}\). Then the two estimators \(\pmb{\lambda'\hat{\beta}}\) and \(\pmb{a'X'y}\) have the following properties:

\(E[\pmb{\lambda'\hat{\beta}}]=E[\pmb{a'X'y}]=\beta\)
\(\pmb{\lambda'\hat{\beta}}=\pmb{a'X'y}\)
\(\pmb{\lambda'\hat{\beta}}\) and \(\pmb{a'X'y}\) are invariant to the choice of \(\pmb{\hat{\beta}}\) or \(\pmb{a}\)

proof

By (7.2.1) we have

\[E[\pmb{\lambda'\hat{\beta}}]= \pmb{\lambda'}(\pmb{X'X})^{-}\pmb{X'X\beta}\]

by (7.2.3)iii \(\lambda'(\pmb{X'X})^{-}\pmb{X'X\beta}=\pmb{\lambda'}\) and so

\[E[\pmb{\lambda'\hat{\beta}}]=\pmb{\lambda'\beta}\]

by (7.2.3)ii

\[E[\pmb{a'X'y}]=\pmb{a'X'}E[\pmb{y}]]=\pmb{a'X'X\beta}=\pmb{\lambda'\beta}\]

Example (7.2.12)

In example (7.2.5) we saw that the linear function \(\pmb{\lambda'\beta} = \alpha_1-\alpha_2\) was estimable with \(\pmb{a} = \begin{pmatrix} 0 & 1/3 & -1/3 \end{pmatrix}'\), so now

\[ \begin{aligned} &\pmb{a'X'y} = \\ &\begin{pmatrix} 0 & 1/3 & -1/3 \end{pmatrix}' \begin{pmatrix} 1 & 1& 1 & 1 & 1& 1\\ 1 & 1& 1 & 0&0&0\\ 0&0&0&1&1&1 \end{pmatrix} \begin{pmatrix} y_{11} \\ y_{12} \\y_{13} \\y_{21} \\y_{22} \\y_{23} \\ \end{pmatrix}=\\ &\begin{pmatrix} 0 & 1/3 & -1/3 \end{pmatrix}' \begin{pmatrix} y_{..} \\ y_{1.}\\y_{2.} \end{pmatrix}=\\ &y_{1.}/3-y_{2.}/3 = \bar{y}_{1.}-\bar{y}_{2.} \end{aligned} \]

or using the other solution we need a solution to the normal equation \(\pmb{X'X\beta}=\pmb{X'y}\).

\[ \begin{aligned} &\begin{pmatrix} 6 & 3& 3\\ 3 & 3& 0\\ 3 & 0& 3\\ \end{pmatrix} \begin{pmatrix} \hat{\mu} \\ \hat{\alpha}_1 \\ \hat{\alpha}_2 \end{pmatrix}= \begin{pmatrix} y_{..} \\ y_{1.}\\y_{2.} \end{pmatrix}\\ &6\hat{\mu} + 3\hat{\alpha}_1 +3 \hat{\alpha}_2 = y_{..} \\ &3\hat{\mu}+ 3\hat{\alpha}_1 = y_{1.} \\ &3\hat{\mu} + \hat{\alpha}_2 = y_{2.} \\ \end{aligned} \]

Now we have two equations in three unknowns, so we can set \(\hat{\mu}\) equal to some constant and obtain

\[\pmb{\hat{\beta}} =\begin{pmatrix} \hat{\mu} \\ \hat{\alpha}_1 \\ \hat{\alpha}_2 \end{pmatrix} = \begin{pmatrix} 0 \\ \bar{y}_{1.}\\ \bar{y}_{2.} \end{pmatrix}+\hat{\mu}\begin{pmatrix} 0 \\ -1\\ -1 \end{pmatrix}\]

Finally

\[\pmb{\lambda'\hat{\beta}} = \begin{pmatrix} 0 & 1 & -1 \end{pmatrix} \begin{pmatrix} 0 \\ \bar{y}_{1.}-\hat{\mu}\\ \bar{y}_{2.}-\hat{\mu}\end{pmatrix}=\bar{y}_{1.}-\bar{y}_{2.}\]

Theorem (7.2.13)

Let \(\pmb{\lambda'\beta}\) be an estimable function. Let \(\pmb{\hat{\beta}}\) be any solution to the normal equations \(\pmb{X'X\beta}=\pmb{X'y}\) and let \(\pmb{a}\) be any solution to \(\pmb{X'Xa}=\pmb{\lambda}\). Then the variances of the two estimators \(\pmb{\lambda'\hat{\beta}}\) and \(\pmb{a'X'y}\) have the following properties:

\(var(\pmb{a'X'y})=\sigma^2\pmb{a'X'Xa}=\sigma^2\pmb{a'\lambda}\)
\(var(\pmb{\lambda'\hat{\beta}})=\sigma^2\pmb{\lambda}'(\pmb{X'X})^{-}\pmb{\lambda}\)
\(var(\pmb{\lambda'\hat{\beta}})\) is unique, hat is invariant under the choice of \(\pmb{a}\) or \((\pmb{X'X})^{-}\)

proof

\[ \begin{aligned} &var(\pmb{a'X'y}) = \\ &\pmb{a'X'}cov(\pmb{y})\pmb{Xa} = \\ &\pmb{a'X'}(\sigma^2\pmb{I})\pmb{Xa} = \\ &\sigma^2\pmb{a'X'}\pmb{Xa} = \\ &\sigma^2\pmb{a'}\pmb{\lambda} \end{aligned} \]

ii and iii omitted

Theorem (7.2.14)

Let \(\pmb{\lambda_1'\beta}\) and \(\pmb{\lambda_2'\beta}\) be two estimable function, then

\[cov(\pmb{\lambda_1'\hat{\beta}},\pmb{\lambda_2'\hat{\beta}})=\sigma^2\pmb{\lambda}_1'(\pmb{X'X})^{-}\pmb{\lambda}_2\]

proof similar to proof of theorem above

Theorem (7.2.15)

Let \(\pmb{\lambda'\beta}\) be an estimable function. Then the two estimators \(\pmb{\lambda'\hat{\beta}}\) and \(\pmb{a'X'y}\) are BLUE.

proof omitted

Estimation of \(\sigma^2\)

Again we define

\[\text{SSE}= (\pmb{y}-\pmb{X\hat{\beta}})'(\pmb{y}-\pmb{X\hat{\beta}})\]

where \(\pmb{\hat{\beta}}\) is any solution of the normal equations. As before we have alternatively

\[\text{SSE}= \pmb{y}'\pmb{y}-\pmb{\hat{\beta}}'\pmb{X'y} = \pmb{y}'\left[\pmb{I}-\pmb{X(X'X)^{-}X'}\right]\pmb{y}\]

and we define

\[s^2=\text{SSE}/(n-k)\]

Theorem (7.2.16)

\(E[s^2]=\sigma^2\)
\(s^2\) is invariant under the choice of \(\pmb{\hat{\beta}}\) or the choice of \(\pmb{(X'X)^{-}}\).

proof

Normal Model

Theorem (7.2.17)

If \(\pmb{y}\sim N(\pmb{X\beta}, \sigma^2\pmb{I})\), the maximum likelihood estimators are

\[ \begin{aligned} &\pmb{\hat{\beta}} = (\pmb{X'X})^{-}\pmb{X'y} \\ &\hat{\sigma}^2 = \frac1{n}(\pmb{y}-\pmb{X\hat{\beta}})'(\pmb{y}-\pmb{X\hat{\beta}}) \end{aligned} \]

proof omitted

Theorem (7.2.18)

Under the normal model

\(\pmb{\hat{\beta}} \sim N_p\left[ (\pmb{X'X})^{-}\pmb{X'X\beta}, \sigma^2(\pmb{X'X})^{-}\pmb{X'X}(\pmb{X'X})^{-} \right]\)
\((n-k)s^2/\sigma^2\sim \chi^2(n-k)\)
\(\pmb{\hat{\beta}}\) and \(s^2\) are independent.

proof omitted

Reparametrization

We discussed before that one often can change the parameters in order to make the problem solvable. Here is a formal discussion of this issue.

A reparametrization is a transformation of the non-full rank model \(\pmb{y}=\pmb{X\beta}+\pmb{\epsilon}\) to a full-rank model \(\pmb{y}=\pmb{Z\gamma}+\pmb{\epsilon}\), where \(\pmb{\gamma}=\pmb{U\beta}\) is a set of k linearly independent functions of \(\pmb{\beta}\). So we can write

\[\pmb{Z\gamma}=\pmb{ZU\beta}=\pmb{X\beta}\]

This holds for all \(\pmb{\beta}\), and so we have \(\pmb{ZU}=\pmb{X}\). Since \(\pmb{U}\) is \(k\times p\) of rank \(k<p\), the matrix \(\pmb{UU'}\) is nonsingular and we find \(\pmb{ZUU'}=\pmb{XU'}\) or

\[\pmb{Z}=\pmb{XU'}(\pmb{UU'})^{-1}\]

It can be seen that \(\pmb{Z}\) is full-rank and that therefore the normal equations have the solution

\[\pmb{\hat{\gamma}} = (\pmb{Z'Z})^{-1}\pmb{Z'y}\]

Since \(\pmb{Z\gamma}=\pmb{X\beta}\), the estimators \(\pmb{Z\hat{\gamma}}\) and \(\pmb{X\hat{\beta}}\) are also equal

\[\pmb{Z\hat{\gamma}}=\pmb{X\hat{\beta}}\]

Theorem (7.2.19)

\[s^2=\frac1{n-k}(\pmb{y-Z\hat{\gamma}})(\pmb{y-Z\hat{\gamma}})\]

\[\text{SSE}=(\pmb{y-X\hat{\beta}})(\pmb{y-X\hat{\beta}})=(\pmb{y-Z\hat{\gamma}})(\pmb{y-Z\hat{\gamma}})\]

proof omitted

Example (7.2.20)

Consider the model

\[\pmb{y}=\pmb{X\beta}+\pmb{\epsilon}= \begin{pmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1\\ 1 & 0 & 1 \end{pmatrix}\begin{pmatrix} \mu \\ \alpha_1 \\\alpha_2 \end{pmatrix}+ \begin{pmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{21} \\ \epsilon_{22} \end{pmatrix}\]

\(\pmb{X}\) has rank 2, so there are two linearly independent estimable functions. These can be chosen in any number of ways, for example \(\mu+\alpha_1\) and \(\mu+\alpha_2\). With this choice we have

\[\pmb{\gamma} = \begin{pmatrix} \mu+\alpha_1 \\ \mu+\alpha_2\end{pmatrix}=\begin{pmatrix} 1&1&0 \\ 1&0&1 \end{pmatrix}\begin{pmatrix} \mu \\ \alpha_1 \\ \alpha_2 \end{pmatrix}=\pmb{U\beta}\]

Let

\[\pmb{Z}=\begin{pmatrix} 1 & 0 \\ 1 & 0 \\ 0 & 1\\ 0 & 1 \end{pmatrix}\]

then \(\pmb{Z\gamma}=\pmb{X\beta}\)

Side Conditions

Definition (7.2.21)

A side condition is an \((p-k)\times k\) matrix \(\pmb{T}\) of rank p-k such that \(\pmb{T\beta}=0\) and \(\pmb{T\beta}\) are nonestimable functions.

Note that if one of the \(\pmb{T\beta}\) were an estimable function it would be a linear combination of \(\pmb{X'X\beta}\) and would therefore not add to the rank.

Theorem (7.2.22)

If \(\pmb{y=X\beta+\epsilon}\) and \(\pmb{T}\) is a side condition, then

\[\pmb{\hat{\beta}} = \left( \pmb{X'X} + \pmb{T'T} \right)^{-1}\pmb{X'y}\]

is the unique vector \(\pmb{\hat{\beta}}\) such that \(\pmb{X'X\hat{\beta}=X'y}\) and \(\pmb{T\hat{\beta}}=0\)

proof the two equation can be combined into

\[\begin{pmatrix} \pmb{y} \\ \pmb{0} \end{pmatrix}=\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\pmb{\beta}+\begin{pmatrix} \pmb{\epsilon} \\ \pmb{0} \end{pmatrix}\]

and by the conditions of the theorem the matrix \(\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\) is full-rank. Therefore \(\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\) has an inverse, and we find

\[ \begin{aligned} &\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\pmb{\hat{\beta}} = \begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{y} \\ \pmb{0} \end{pmatrix}\\ \\ &\pmb{\hat{\beta}} = \left(\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\right)^{-1}\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{y} \\ \pmb{0} \end{pmatrix}\\ \\ &\pmb{\hat{\beta}} = \left(\begin{pmatrix} \pmb{X}' & \pmb{T}' \end{pmatrix}\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\right)^{-1}\begin{pmatrix} \pmb{X'} & \pmb{T'} \end{pmatrix}'\begin{pmatrix} \pmb{y} \\ \pmb{0} \end{pmatrix}\\ \\ &\pmb{\hat{\beta}} = \left( \pmb{X'X} + \pmb{T'T} \right)^{-1}\pmb{X'y} \end{aligned} \]

Example (7.2.23)

Let’s return to example (7.2.18), where we used the model

\[y_{ij}=\mu+\alpha_i+\epsilon_{ij}\]

with i=1, 2.

Using theorem (7.2.4) we can easily see that \(\alpha_1+\alpha_2\) is not an estimable function. This can be written as \((0\text{ }1\text{ }1)\pmb{\beta}=0\) and so \(\pmb{T}=(0\text{ }1\text{ }1)\).

Now

\[ \begin{aligned} &\pmb{X'X+T'T} = \begin{pmatrix} 4 & 2& 2\\ 2 & 2& 0\\ 2 & 0& 2\\ \end{pmatrix}\begin{pmatrix} 0 \\ 1 \\1 \end{pmatrix}(0\text{ }1\text{ }1) = \\ &\begin{pmatrix} 4 & 2& 2\\ 2 & 2& 0\\ 2 & 0& 2\\ \end{pmatrix} \begin{pmatrix} 0 & 0 &0 \\ 0 &1 & 1\\ 0 &1 & 1 \end{pmatrix}= \begin{pmatrix} 4 & 2& 2\\ 2 & 3& 1\\ 2 & 1& 3\\ \end{pmatrix}\\ &(\pmb{X'X+T'T})^{-1} = \frac14\begin{pmatrix} 2 & -1& -1\\ -1 & 2& 0\\ -1 & 0& 2\\ \end{pmatrix}\\ &\pmb{\hat{\beta}} =(\pmb{X'X+T'T})^{-1}\pmb{X'y}=\\ &\frac14\begin{pmatrix} 2 & -1& -1\\ -1 & 2& 0\\ -1 & 0& 2\\ \end{pmatrix} \begin{pmatrix} 1 & 1& 1 & 1\\ 1 & 1& 0 & 0\\ 0 & 0& 1 & 1\\ \end{pmatrix} \begin{pmatrix} y_{11} \\ y_{12} \\ y_{21} \\ y_{22}\end{pmatrix}=\\ &\frac14\begin{pmatrix} 2 & -1& -1\\ -1 & 2& 0\\ -1 & 0& 2\\ \end{pmatrix} \begin{pmatrix} y_{..}\\ y_{1.}\\ y_{2.}\\ \end{pmatrix} = \\ &\frac14\begin{pmatrix} 2y_{..} - y_{1.}-y_{2.} \\ 2y_{1.}-y_{..} \\ 2y_{2.}-y_{..}\end{pmatrix}= \begin{pmatrix} \bar{y}_{..} \\ \bar{y}_{1.}-\bar{y}_{..}\\\bar{y}_{2.}-\bar{y}_{..} \end{pmatrix} \end{aligned} \]

because \(\bar{y}_{1.}+\bar{y}_{.}=\bar{y}_{..}\)