Quite generally the main interest in regression is in estimating the parameter vector \(\pmb{\beta}\) and/or in prediction. On the contrary in an ANOVA problem the interest is most often in hypothesis tests for the \(\pmb{\beta}\). However, as always in frequentist statistics, estimation and testing are closely related, and so we will first discuss estimation.
We consider the model
\[\pmb{y}=\pmb{X\beta}+\pmb{\epsilon}\]
where \(\pmb{X}\) is \(n\times p\) with rank \(k<p\le n\). We also have the assumptions \(E[\pmb{y}]=\pmb{X\beta}\), \(cov(\pmb{y})=\sigma^2\pmb{I}\).
Using least squares we need to find \(\pmb{\hat{\beta}}\) that minimizes
\[\pmb{\epsilon'\epsilon}= (\pmb{y}-\pmb{X\hat{\beta}})'(\pmb{y}-\pmb{X\hat{\beta}})\]
as before we can expand this and differentiate and arrive at the normal equations:
\[\pmb{X'X\hat{\beta}}=\pmb{X'y}\]
but now \(\pmb{X'X}\) is singular and has no inverse. In fact this system of equations has infinitely many solutions:
If \(\pmb{X}\) is \(n\times p\) with rank \(k<p\le n\), the system of equations
\[\pmb{X'X\hat{\beta}}=\pmb{X'y}\]
is consistent (aka has solutions).
proof omitted
Because the system of equations is consistent, by (4.2.14) a solution is given by
\[\pmb{\hat{\beta}}=(\pmb{X'X})^{-}\pmb{X'y}\]
where \((\pmb{X'X})^{-}\) is a generalized inverse of \(\pmb{X'X}\).
For any generalized inverse we have
\[E[\pmb{\hat{\beta}}]=(\pmb{X'X})^{-}\pmb{X'}E[\pmb{y}] = (\pmb{X'X})^{-}\pmb{X'X\beta}\]
and so \(\pmb{\hat{\beta}}\) is an unbiased estimator of \((\pmb{X'X})^{-}\pmb{X'X\beta}\). However, since \((\pmb{X'X})^{-}\pmb{X'X}\ne\pmb{I}\), \(\pmb{\hat{\beta}}\) is not an unbiased estimator of \(\pmb{\beta}\). In fact, \(E[\pmb{\hat{\beta}}]\) depends on the particular choice of \((\pmb{X'X})^{-}\).
Is there a matrix \(\pmb{A}\) such that \(E[\pmb{Ay}]=\pmb{\beta}\)? If so, then
\[\pmb{\beta}=E[\pmb{Ay}]=E[\pmb{A(X\beta+\epsilon})]=E[\pmb{AX\beta}]+\pmb{A}E[\pmb{\epsilon}]=\pmb{AX\beta}\]
This must hold for all \(\pmb{\beta}\), and so we must have \(\pmb{AX}=\pmb{I}\). But \(rank(\pmb{AX})<p\), and so no such matrix \(\pmb{A}\) can exist.
Let’s consider a simple oneway model with two groups and three repeated measurements:
\[y_{ij}=\mu+\alpha_i+\epsilon_{ij}\]
with i=1,2 and j=1,2,3. So
\[\pmb{\beta} = \begin{pmatrix} \mu \\ \alpha_1 \\ \alpha_2 \end{pmatrix}\] \[ \pmb{X} = \begin{pmatrix} 1 & 1& 0\\ 1 & 1& 0\\ 1 & 1& 0\\ 1 & 0& 1\\ 1 & 0& 1\\ 1 & 0& 1\\ \end{pmatrix} \]
\[ \pmb{X'X} = \begin{pmatrix} 6 & 3& 3\\ 3 & 3& 0\\ 3 & 0 & 3\\ \end{pmatrix} \]
A generalized inverse is given by
\[ (\pmb{X'X})^{-} = \begin{pmatrix} 0 & 0& 0\\ 0 & 1/3& 0\\ 0 & 0 & 1/3\\ \end{pmatrix} \] let’s check:
X=cbind(1, c(1,1,1,0,0,0), c(0,0,0,1,1,1))
X
## [,1] [,2] [,3]
## [1,] 1 1 0
## [2,] 1 1 0
## [3,] 1 1 0
## [4,] 1 0 1
## [5,] 1 0 1
## [6,] 1 0 1
X.X=t(X)%*%X
X.X
## [,1] [,2] [,3]
## [1,] 6 3 3
## [2,] 3 3 0
## [3,] 3 0 3
g.X=diag(c(0,1,1)/3)
g.X
## [,1] [,2] [,3]
## [1,] 0 0.0000000 0.0000000
## [2,] 0 0.3333333 0.0000000
## [3,] 0 0.0000000 0.3333333
X.X%*%g.X%*%X.X
## [,1] [,2] [,3]
## [1,] 6 3 3
## [2,] 3 3 0
## [3,] 3 0 3
so this is indeed a generalized inverse. So now
\[ \pmb{X'y} = \begin{pmatrix} \sum_{i,j} y_{ij}\\ \sum_{j} y_{1j} \\ \sum_{j} y_{2j} \end{pmatrix} = \begin{pmatrix} y_{..}\\ y_{1.} \\ y_{2.} \end{pmatrix}\\ \text{ }\\ \pmb{\hat{\beta}} = (\pmb{X'X})^{-}\pmb{X'y} = \\ \begin{pmatrix} 0 & 0& 0\\ 0 & 1/3& 0\\ 0 & 0 & 1/3\\ \end{pmatrix} \begin{pmatrix} \sum_{i,j} y_{ij}\\ \sum_{j} y_{1j} \\ \sum_{j} y_{2j} \end{pmatrix} = \begin{pmatrix} 0\\ \frac13 \sum_{j} y_{1j} \\ \frac13 \sum_{j} y_{2j} \end{pmatrix} = \begin{pmatrix} 0\\ \bar{y}_{1.} \\ \bar{y}_{2.} \end{pmatrix} \]
Note
\[ \begin{aligned} &E[\bar{y}_{i.}] = \\ &E[\frac13 \sum_{j} y_{ij}] = \\ &\frac13 \sum_{j} E[y_{ij}] = \\ &\frac13 \sum_{j} E[\mu+\alpha_i+\epsilon_{ij}] = \\ &\frac13 \sum_{j}\left[\mu+\alpha_i+0\right] = \\ &\frac13 3\left[\mu+\alpha_i\right] = \mu+\alpha_i \end{aligned} \]
Let’s return to the hearing aid data set, see (7.1.2). Here we have a model of the form
\[y_i=\mu+\alpha_i+\beta_j+\epsilon_{ij}\] i=1,..,4 and j=1,..,24.
X=make.X(4, 24)
y=as.matrix(hearingaid[, 1, drop=FALSE])
X.X=t(X)%*%X
library(MASS)
gX=ginv(X.X)
betahat=gX%*%t(X)%*%y
round(c(betahat), 3)
## [1] 9.677 2.419 2.419 2.419 2.419 -11.097 -10.097 -9.097 -8.097
## [10] -7.097 -6.097 -5.097 -4.097 -3.097 -2.097 -1.097 -0.097 0.903
## [19] 1.903 2.903 3.903 4.903 5.903 6.903 7.903 8.903 9.903
## [28] 10.903 11.903
If we can not estimate \(\pmb{\beta}\), can we instead estimate a linear function of \(\pmb{\beta}\), say \(\pmb{\lambda'\beta}\)?
A linear function of parameters \(\pmb{\lambda'\beta}\) is said to be estimable if there exists a vector \(\pmb{a}\) such that \(E[\pmb{a'y}]=\pmb{\lambda'\beta}\).
Say \(\pmb{y}=\pmb{X\beta}+\pmb{\epsilon}\) with \(E[\pmb{y}]=\pmb{X\beta}\) and \(\pmb{X}\) is \(n\times p\) of rank \(k<p\le n\). A linear function \(\pmb{\lambda'\beta}\) is estimable if and only if one of the following conditions holds
proof
\[E[\pmb{a'y}]=\pmb{a'}E[\pmb{y}]=\pmb{a'X\beta}=\pmb{\lambda'\beta}\]
Conversely, if \(\pmb{\lambda'\beta}\) is estimable there exists \(\pmb{a}\) such that \(E[\pmb{a'y}]=\pmb{\lambda'\beta}\). Therefore \(\pmb{a'X\beta}=\pmb{\lambda'\beta}\) for all \(\pmb{\beta}\), and therefore \(\pmb{a'X}=\pmb{\lambda'}\), and so \(\pmb{\lambda}'\) is a linear combination of the rows of \(\pmb{X}\).
proofs of ii and iii omitted.
Consider again the simple oneway model with two groups and three repeated measurements from (7.2.2). That is \(\pmb{\beta} = \begin{pmatrix} \mu & \alpha_1 &\alpha_2 \end{pmatrix}'\) and
\[ \pmb{X} = \begin{pmatrix} 1 & 1&0 \\ 1 & 1&0 \\ 1 & 1&0 \\ 1 & 0&1 \\ 1 & 0&1 \\ 1 & 0&1 \\ \end{pmatrix} \]
We want to show that \(\alpha_1-\alpha_2\) is estimable. Note that
\[\alpha_1-\alpha_2=\begin{pmatrix} 0 & 1 & -1 \end{pmatrix}\begin{pmatrix} \mu \\ \alpha_1 \\\alpha_2 \end{pmatrix}=\pmb{\lambda'\beta}\] we see that \(\pmb{\lambda}'=\begin{pmatrix} 0 & 1 & -1 \end{pmatrix}\).
Using (7.2.4i):
We need to find \(\pmb{a}\) such that \(\pmb{a'X}=\pmb{\lambda}'\). In fact, if \(\pmb{a}'= \begin{pmatrix} 0 & 0 & 1 & -1 & 0 & 0 \end{pmatrix}\), then we have \(\pmb{a'X}=\pmb{\lambda}'\).
\(\pmb{a}\) here is not unique, there are many other choices.
Using (7.2.4ii):
We have
\[ \pmb{X'X} = \begin{pmatrix} 6 & 3& 3\\ 3 & 3& 0\\ 3 & 0& 3\\ \end{pmatrix} \]
Now we need a vector \(\pmb{a}\) such that \(\pmb{X'Xa}=\begin{pmatrix} 0 & 1 & -1 \end{pmatrix}'\). One such vector is \(\pmb{a} = \begin{pmatrix} 0 &1/3 & -1/3 \end{pmatrix}'\). Again there are other possibilities.
Using (7.2.4iii):
We saw before that a generalized inverse is given by
\[ (\pmb{X'X})^{-} = \begin{pmatrix} 0 & 0& 0\\ 0 & 1/3& 0\\ 0 & 0 & 1/3\\ \end{pmatrix} \]
and we see easily that for \(\pmb{\lambda}'=\begin{pmatrix} 0 & 1 & -1 \end{pmatrix}\) we have \(\pmb{X'X}(\pmb{X'X})^{-}\pmb{\lambda}=\pmb{\lambda}\).
The number of estimable linear functions of \(\pmb{\beta}\) is equal to the rank of \(\pmb{X}\).
proof omitted
From the two theorems above it is clear that we can check the rows of \(\pmb{X}\) or of \(\pmb{X'X}\) to see which functions are estimable.
Consider a twoway model with two groups and no repeated measurements. That is \(\pmb{\beta} = \begin{pmatrix} \mu & \alpha_1 &\alpha_2 &\beta_1 &\beta_2 \end{pmatrix}'\) and
\[ \pmb{X} = \begin{pmatrix} 1 & 1&0 & 1 & 0 \\ 1 & 1&0 & 0 & 1 \\ 1 & 0&1 & 1 & 0 \\ 1 & 0&1 & 0 & 1 \\ \end{pmatrix} \]
To get to a matrix with only linearly independent rows we can proceed as follows:
\[ \begin{pmatrix} 1 & 1&0 & 1 & 0 \\ 0 & 0&0 & -1 & 1 \\ 0 & -1&1 & 0 & 0 \\ 0 & -1&1 & -1 & 1 \\ \end{pmatrix} \]
\[ \begin{pmatrix} 1 & 1&0 & 1 & 0 \\ 0 & 0&0 & -1 & 1 \\ 0 & -1&1 & 0 & 0 \\ 0 & 0&0 & 0 & 0 \\ \end{pmatrix} \]
taking the first three rows as \(\pmb{\lambda}_1'\),\(\pmb{\lambda}_2'\) and \(\pmb{\lambda}_3'\), we find
\[ \begin{aligned} &\pmb{\lambda}_1'\pmb{\beta} = \mu+\alpha_1+\beta_1\\ &\pmb{\lambda}_2'\pmb{\beta} = \beta_2-\beta_1 \\ &\pmb{\lambda}_3'\pmb{\beta} = \alpha_2-\alpha_1\\ \end{aligned} \]
Let \(\pmb{a}\) be a vector such \(\sum a_i=0\). The \(\pmb{a'\beta}\) is called a contrast.
In the example above the second and third linear combination are contrasts.
Let \(\pmb{\lambda'\beta}\) be an estimable function. Let \(\pmb{\hat{\beta}}\) be any solution to the normal equations \(\pmb{X'X\beta}=\pmb{X'y}\) and let \(\pmb{a}\) be any solution to \(\pmb{X'Xa}=\pmb{\lambda}\). Then the two estimators \(\pmb{\lambda'\hat{\beta}}\) and \(\pmb{a'X'y}\) have the following properties:
proof
\[E[\pmb{\lambda'\hat{\beta}}]= \pmb{\lambda'}(\pmb{X'X})^{-}\pmb{X'X\beta}\]
by (7.2.3)iii \(\lambda'(\pmb{X'X})^{-}\pmb{X'X\beta}=\pmb{\lambda'}\) and so
\[E[\pmb{\lambda'\hat{\beta}}]=\pmb{\lambda'\beta}\]
by (7.2.3)ii
\[E[\pmb{a'X'y}]=\pmb{a'X'}E[\pmb{y}]]=\pmb{a'X'X\beta}=\pmb{\lambda'\beta}\]
In example (7.2.5) we saw that the linear function \(\pmb{\lambda'\beta} = \alpha_1-\alpha_2\) was estimable with \(\pmb{a} = \begin{pmatrix} 0 & 1/3 & -1/3 \end{pmatrix}'\), so now
\[ \begin{aligned} &\pmb{a'X'y} = \\ &\begin{pmatrix} 0 & 1/3 & -1/3 \end{pmatrix}' \begin{pmatrix} 1 & 1& 1 & 1 & 1& 1\\ 1 & 1& 1 & 0&0&0\\ 0&0&0&1&1&1 \end{pmatrix} \begin{pmatrix} y_{11} \\ y_{12} \\y_{13} \\y_{21} \\y_{22} \\y_{23} \\ \end{pmatrix}=\\ &\begin{pmatrix} 0 & 1/3 & -1/3 \end{pmatrix}' \begin{pmatrix} y_{..} \\ y_{1.}\\y_{2.} \end{pmatrix}=\\ &y_{1.}/3-y_{2.}/3 = \bar{y}_{1.}-\bar{y}_{2.} \end{aligned} \]
or using the other solution we need a solution to the normal equation \(\pmb{X'X\beta}=\pmb{X'y}\).
\[ \begin{aligned} &\begin{pmatrix} 6 & 3& 3\\ 3 & 3& 0\\ 3 & 0& 3\\ \end{pmatrix} \begin{pmatrix} \hat{\mu} \\ \hat{\alpha}_1 \\ \hat{\alpha}_2 \end{pmatrix}= \begin{pmatrix} y_{..} \\ y_{1.}\\y_{2.} \end{pmatrix}\\ &6\hat{\mu} + 3\hat{\alpha}_1 +3 \hat{\alpha}_2 = y_{..} \\ &3\hat{\mu}+ 3\hat{\alpha}_1 = y_{1.} \\ &3\hat{\mu} + \hat{\alpha}_2 = y_{2.} \\ \end{aligned} \]
Now we have two equations in three unknowns, so we can set \(\hat{\mu}\) equal to some constant and obtain
\[\pmb{\hat{\beta}} =\begin{pmatrix} \hat{\mu} \\ \hat{\alpha}_1 \\ \hat{\alpha}_2 \end{pmatrix} = \begin{pmatrix} 0 \\ \bar{y}_{1.}\\ \bar{y}_{2.} \end{pmatrix}+\hat{\mu}\begin{pmatrix} 0 \\ -1\\ -1 \end{pmatrix}\]
Finally
\[\pmb{\lambda'\hat{\beta}} = \begin{pmatrix} 0 & 1 & -1 \end{pmatrix} \begin{pmatrix} 0 \\ \bar{y}_{1.}-\hat{\mu}\\ \bar{y}_{2.}-\hat{\mu}\end{pmatrix}=\bar{y}_{1.}-\bar{y}_{2.}\]
Let \(\pmb{\lambda'\beta}\) be an estimable function. Let \(\pmb{\hat{\beta}}\) be any solution to the normal equations \(\pmb{X'X\beta}=\pmb{X'y}\) and let \(\pmb{a}\) be any solution to \(\pmb{X'Xa}=\pmb{\lambda}\). Then the variances of the two estimators \(\pmb{\lambda'\hat{\beta}}\) and \(\pmb{a'X'y}\) have the following properties:
\(var(\pmb{a'X'y})=\sigma^2\pmb{a'X'Xa}=\sigma^2\pmb{a'\lambda}\)
\(var(\pmb{\lambda'\hat{\beta}})=\sigma^2\pmb{\lambda}'(\pmb{X'X})^{-}\pmb{\lambda}\)
\(var(\pmb{\lambda'\hat{\beta}})\) is unique, hat is invariant under the choice of \(\pmb{a}\) or \((\pmb{X'X})^{-}\)
proof
\[ \begin{aligned} &var(\pmb{a'X'y}) = \\ &\pmb{a'X'}cov(\pmb{y})\pmb{Xa} = \\ &\pmb{a'X'}(\sigma^2\pmb{I})\pmb{Xa} = \\ &\sigma^2\pmb{a'X'}\pmb{Xa} = \\ &\sigma^2\pmb{a'}\pmb{\lambda} \end{aligned} \]
ii and iii omitted
Let \(\pmb{\lambda_1'\beta}\) and \(\pmb{\lambda_2'\beta}\) be two estimable function, then
\[cov(\pmb{\lambda_1'\hat{\beta}},\pmb{\lambda_2'\hat{\beta}})=\sigma^2\pmb{\lambda}_1'(\pmb{X'X})^{-}\pmb{\lambda}_2\]
proof similar to proof of theorem above
Let \(\pmb{\lambda'\beta}\) be an estimable function. Then the two estimators \(\pmb{\lambda'\hat{\beta}}\) and \(\pmb{a'X'y}\) are BLUE.
proof omitted
Again we define
\[\text{SSE}= (\pmb{y}-\pmb{X\hat{\beta}})'(\pmb{y}-\pmb{X\hat{\beta}})\]
where \(\pmb{\hat{\beta}}\) is any solution of the normal equations. As before we have alternatively
\[\text{SSE}= \pmb{y}'\pmb{y}-\pmb{\hat{\beta}}'\pmb{X'y} = \pmb{y}'\left[\pmb{I}-\pmb{X(X'X)^{-}X'}\right]\pmb{y}\]
and we define
\[s^2=\text{SSE}/(n-k)\]
\(E[s^2]=\sigma^2\)
\(s^2\) is invariant under the choice of \(\pmb{\hat{\beta}}\) or the choice of \(\pmb{(X'X)^{-}}\).
proof
If \(\pmb{y}\sim N(\pmb{X\beta}, \sigma^2\pmb{I})\), the maximum likelihood estimators are
\[ \begin{aligned} &\pmb{\hat{\beta}} = (\pmb{X'X})^{-}\pmb{X'y} \\ &\hat{\sigma}^2 = \frac1{n}(\pmb{y}-\pmb{X\hat{\beta}})'(\pmb{y}-\pmb{X\hat{\beta}}) \end{aligned} \]
proof omitted
Under the normal model
\(\pmb{\hat{\beta}} \sim N_p\left[ (\pmb{X'X})^{-}\pmb{X'X\beta}, \sigma^2(\pmb{X'X})^{-}\pmb{X'X}(\pmb{X'X})^{-} \right]\)
\((n-k)s^2/\sigma^2\sim \chi^2(n-k)\)
\(\pmb{\hat{\beta}}\) and \(s^2\) are independent.
proof omitted
We discussed before that one often can change the parameters in order to make the problem solvable. Here is a formal discussion of this issue.
A reparametrization is a transformation of the non-full rank model \(\pmb{y}=\pmb{X\beta}+\pmb{\epsilon}\) to a full-rank model \(\pmb{y}=\pmb{Z\gamma}+\pmb{\epsilon}\), where \(\pmb{\gamma}=\pmb{U\beta}\) is a set of k linearly independent functions of \(\pmb{\beta}\). So we can write
\[\pmb{Z\gamma}=\pmb{ZU\beta}=\pmb{X\beta}\]
This holds for all \(\pmb{\beta}\), and so we have \(\pmb{ZU}=\pmb{X}\). Since \(\pmb{U}\) is \(k\times p\) of rank \(k<p\), the matrix \(\pmb{UU'}\) is nonsingular and we find \(\pmb{ZUU'}=\pmb{XU'}\) or
\[\pmb{Z}=\pmb{XU'}(\pmb{UU'})^{-1}\]
It can be seen that \(\pmb{Z}\) is full-rank and that therefore the normal equations have the solution
\[\pmb{\hat{\gamma}} = (\pmb{Z'Z})^{-1}\pmb{Z'y}\]
Since \(\pmb{Z\gamma}=\pmb{X\beta}\), the estimators \(\pmb{Z\hat{\gamma}}\) and \(\pmb{X\hat{\beta}}\) are also equal
\[\pmb{Z\hat{\gamma}}=\pmb{X\hat{\beta}}\]
\[s^2=\frac1{n-k}(\pmb{y-Z\hat{\gamma}})(\pmb{y-Z\hat{\gamma}})\]
\[\text{SSE}=(\pmb{y-X\hat{\beta}})(\pmb{y-X\hat{\beta}})=(\pmb{y-Z\hat{\gamma}})(\pmb{y-Z\hat{\gamma}})\]
proof omitted
Consider the model
\[\pmb{y}=\pmb{X\beta}+\pmb{\epsilon}= \begin{pmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1\\ 1 & 0 & 1 \end{pmatrix}\begin{pmatrix} \mu \\ \alpha_1 \\\alpha_2 \end{pmatrix}+ \begin{pmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{21} \\ \epsilon_{22} \end{pmatrix}\]
\(\pmb{X}\) has rank 2, so there are two linearly independent estimable functions. These can be chosen in any number of ways, for example \(\mu+\alpha_1\) and \(\mu+\alpha_2\). With this choice we have
\[\pmb{\gamma} = \begin{pmatrix} \mu+\alpha_1 \\ \mu+\alpha_2\end{pmatrix}=\begin{pmatrix} 1&1&0 \\ 1&0&1 \end{pmatrix}\begin{pmatrix} \mu \\ \alpha_1 \\ \alpha_2 \end{pmatrix}=\pmb{U\beta}\]
Let
\[\pmb{Z}=\begin{pmatrix} 1 & 0 \\ 1 & 0 \\ 0 & 1\\ 0 & 1 \end{pmatrix}\]
then \(\pmb{Z\gamma}=\pmb{X\beta}\)
A side condition is an \((p-k)\times k\) matrix \(\pmb{T}\) of rank p-k such that \(\pmb{T\beta}=0\) and \(\pmb{T\beta}\) are nonestimable functions.
Note that if one of the \(\pmb{T\beta}\) were an estimable function it would be a linear combination of \(\pmb{X'X\beta}\) and would therefore not add to the rank.
If \(\pmb{y=X\beta+\epsilon}\) and \(\pmb{T}\) is a side condition, then
\[\pmb{\hat{\beta}} = \left( \pmb{X'X} + \pmb{T'T} \right)^{-1}\pmb{X'y}\]
is the unique vector \(\pmb{\hat{\beta}}\) such that \(\pmb{X'X\hat{\beta}=X'y}\) and \(\pmb{T\hat{\beta}}=0\)
proof the two equation can be combined into
\[\begin{pmatrix} \pmb{y} \\ \pmb{0} \end{pmatrix}=\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\pmb{\beta}+\begin{pmatrix} \pmb{\epsilon} \\ \pmb{0} \end{pmatrix}\]
and by the conditions of the theorem the matrix \(\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\) is full-rank. Therefore \(\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\) has an inverse, and we find
\[ \begin{aligned} &\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\pmb{\hat{\beta}} = \begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{y} \\ \pmb{0} \end{pmatrix}\\ \\ &\pmb{\hat{\beta}} = \left(\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\right)^{-1}\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}'\begin{pmatrix} \pmb{y} \\ \pmb{0} \end{pmatrix}\\ \\ &\pmb{\hat{\beta}} = \left(\begin{pmatrix} \pmb{X}' & \pmb{T}' \end{pmatrix}\begin{pmatrix} \pmb{X} \\ \pmb{T} \end{pmatrix}\right)^{-1}\begin{pmatrix} \pmb{X'} & \pmb{T'} \end{pmatrix}'\begin{pmatrix} \pmb{y} \\ \pmb{0} \end{pmatrix}\\ \\ &\pmb{\hat{\beta}} = \left( \pmb{X'X} + \pmb{T'T} \right)^{-1}\pmb{X'y} \end{aligned} \]
Let’s return to example (7.2.18), where we used the model
\[y_{ij}=\mu+\alpha_i+\epsilon_{ij}\]
with i=1, 2.
Using theorem (7.2.4) we can easily see that \(\alpha_1+\alpha_2\) is not an estimable function. This can be written as \((0\text{ }1\text{ }1)\pmb{\beta}=0\) and so \(\pmb{T}=(0\text{ }1\text{ }1)\).
Now
\[ \begin{aligned} &\pmb{X'X+T'T} = \begin{pmatrix} 4 & 2& 2\\ 2 & 2& 0\\ 2 & 0& 2\\ \end{pmatrix}\begin{pmatrix} 0 \\ 1 \\1 \end{pmatrix}(0\text{ }1\text{ }1) = \\ &\begin{pmatrix} 4 & 2& 2\\ 2 & 2& 0\\ 2 & 0& 2\\ \end{pmatrix} \begin{pmatrix} 0 & 0 &0 \\ 0 &1 & 1\\ 0 &1 & 1 \end{pmatrix}= \begin{pmatrix} 4 & 2& 2\\ 2 & 3& 1\\ 2 & 1& 3\\ \end{pmatrix}\\ &(\pmb{X'X+T'T})^{-1} = \frac14\begin{pmatrix} 2 & -1& -1\\ -1 & 2& 0\\ -1 & 0& 2\\ \end{pmatrix}\\ &\pmb{\hat{\beta}} =(\pmb{X'X+T'T})^{-1}\pmb{X'y}=\\ &\frac14\begin{pmatrix} 2 & -1& -1\\ -1 & 2& 0\\ -1 & 0& 2\\ \end{pmatrix} \begin{pmatrix} 1 & 1& 1 & 1\\ 1 & 1& 0 & 0\\ 0 & 0& 1 & 1\\ \end{pmatrix} \begin{pmatrix} y_{11} \\ y_{12} \\ y_{21} \\ y_{22}\end{pmatrix}=\\ &\frac14\begin{pmatrix} 2 & -1& -1\\ -1 & 2& 0\\ -1 & 0& 2\\ \end{pmatrix} \begin{pmatrix} y_{..}\\ y_{1.}\\ y_{2.}\\ \end{pmatrix} = \\ &\frac14\begin{pmatrix} 2y_{..} - y_{1.}-y_{2.} \\ 2y_{1.}-y_{..} \\ 2y_{2.}-y_{..}\end{pmatrix}= \begin{pmatrix} \bar{y}_{..} \\ \bar{y}_{1.}-\bar{y}_{..}\\\bar{y}_{2.}-\bar{y}_{..} \end{pmatrix} \end{aligned} \]
because \(\bar{y}_{1.}+\bar{y}_{.}=\bar{y}_{..}\)