While in a regression problem the focus is usually on estimation, in the case of ANOVA it is generally on hypothesis testing.
In this section we will consider hypotheses about the \(\beta\)’s in the model \(\pmb{y=X\beta+\epsilon}\), where \(\pmb{X}\) is \(n\times p\) of rank \(k<p\le n\), and \(\pmb{y}\sim N_n(\pmb{X\beta},\sigma^2\pmb{I})\).
There is a close connection between estimable functions and whether or not a hypothesis can be tested:
A hypothesis such as \(H_0:\beta_1=\beta_2\) is called testable if there exists linearly independent estimable functions \(\pmb{\lambda_1'\beta}\), .., \(\pmb{\lambda_k'\beta}\) such that \(H_0\) is true iff
\[\pmb{\lambda_1'\beta}= ...=\pmb{\lambda_k'\beta}=0\]
say we have the model
\[y_i=\mu+\alpha_i+\beta_j+\epsilon_{ij}\]
with i=1,2,3;j=1,2,3.
and we want to test \(H_0:\alpha_1=\alpha_2=\alpha_3\). We find
\[ \pmb{X\beta} = \begin{pmatrix} 1 & 1 & 0 & 0 & 1 & 0 & 0\\ 1 & 1 & 0 & 0 & 0 & 1 & 0\\ 1 & 1 & 0 & 0 & 0 & 0 & 1\\ 1 & 0 & 1 & 0 & 1 & 0 & 0\\ 1 & 0 & 1 & 0 & 0 & 1 & 0\\ 1 & 0 & 1 & 0 & 0 & 0 & 1\\ 1 & 0 & 0 & 1 & 1 & 0 & 0\\ 1 & 0 & 0 & 1 & 0 & 1 & 0\\ 1 & 0 & 0 & 1 & 0 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} \mu \\ \alpha_1\\ \alpha_2\\ \alpha_3 \\ \beta_1 \\ \beta_2 \\ \beta3 \end{pmatrix} = \begin{pmatrix} \mu + \alpha_1+ \beta_1 \\ \mu + \alpha_1+ \beta_2 \\ \mu + \alpha_1+ \beta_3 \\ \mu + \alpha_2+ \beta_1 \\ \mu + \alpha_2+ \beta_2 \\ \mu + \alpha_2+ \beta_3 \\ \mu + \alpha_3+ \beta_1 \\ \mu + \alpha_3+ \beta_2 \\ \mu + \alpha_3+ \beta_3 \\ \end{pmatrix} \] Now
\[(\mu + \alpha_1+ \beta_1)-(\mu + \alpha_2+ \beta_1)=\alpha_1-\alpha_2\] so \(\alpha_1-\alpha_2\) is an estimable function. Also
\[(\mu + \alpha_1+ \beta_1)+(\mu + \alpha_2+ \beta_1)-2(\mu + \alpha_3+ \beta_1)=\\ \alpha_1+\alpha_2-2\alpha_3\]
and so \(\alpha_1+\alpha_2-2\alpha_3\) is an estimable function. But
\[ \begin{aligned} &\alpha_1-\alpha_2 = 0\\ &\alpha_1+\alpha_2-2\alpha_3 = 0\\ \end{aligned} \] iff \(\alpha_1 = \alpha_2=\alpha_3\). Therefore \(H_0\) is testable and is equivalent to
\[ H_0: \begin{pmatrix} \alpha_1-\alpha_2 \\ \alpha_1+\alpha_2-2\alpha_3 \\ \end{pmatrix}= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \]
So if we want to test \(H_0:\beta_1=..=\beta_q\) we can find a set of linearly independent estimable functions such that \(H_0:\beta_1=..=\beta_q\) is equivalent to
\[
H_0:\pmb{\gamma}_1 =
\begin{pmatrix}
\pmb{\lambda_1'\beta} \\
\pmb{\lambda_2'\beta} \\
\vdots \\
\pmb{\lambda_l'\beta} \\
\end{pmatrix}=
\begin{pmatrix} 0 \\ 0 \\\vdots \\ 0 \end{pmatrix}
\]
It is also possible to find
\[ \pmb{\gamma}_2 = \begin{pmatrix} \pmb{\lambda_{l+1}'\beta} \\ \vdots \\ \pmb{\lambda_k'\beta} \end{pmatrix} \]
such that the functions \(\pmb{\lambda_i'\beta}\),i=1,..,k are linearly independent and estimable. Let \(\pmb{\gamma} = \begin{pmatrix} \pmb{\gamma}_1 \\ \pmb{\gamma}_2 \end{pmatrix}\), then we can reparametrize from the non-full rank model \(\pmb{y=X\beta+\epsilon}\) to the full rank model
\[\pmb{y=Z\gamma+\epsilon=Z_1\gamma_1+Z_2\gamma_2+\epsilon}\] Now under the null hypothesis we have the reduced model \(\pmb{y=Z_2\gamma_2^{*}+\epsilon}^{*}\). Also
Consider the partitioned model \(\pmb{y=X\beta+\epsilon=X_1\beta_1+X_2\beta_2+\epsilon}\). If \(\pmb{X_2'X_1=0}\), any estimate of \(\pmb{\beta^{*}_2}\) in the reduced model is also an estimate of \(\pmb{\beta_2}\) in the full model.
proof follows from (6.3.4)
So we know that the estimate of \(\gamma_2^{*}\) is the same as the estimate of \(\gamma_2\) if the columns of \(\pmb{Z}_2\) are orthogonal to the columns of \(\pmb{Z}_1\), that is if \(\pmb{Z_2'Z_1=0}\). This is typically true for the balanced models considered here.
Since \(\pmb{y=Z\gamma+\epsilon}\) is a full-rank model, we can use the theorems of section 6.6. The details of the test are here:
\[ \begin{array}{cccc} \text{Source} & \text{df} & \text{SS} & \text{F} \\ \hline \\ \text{Due to }\gamma_1 & l & \text{SS}(\pmb{\hat{\gamma}_1|\pmb{\hat{\gamma}_2)}}=\pmb{\hat{\gamma}'Z'y}-\pmb{\hat{\gamma}_2'Z_2'y} & \frac{\text{SS}(\pmb{\hat{\gamma}_1|\pmb{\hat{\gamma}_2)}}/l}{\text{SSE}/(n-k)}\\ \text{ adjusted for } \gamma_1\\ \text{Error} & n-k & \text{SSE}=\pmb{y'y-\hat{\gamma}'Z'y} \\ \text{Total} & n-1 & \text{SST}=\pmb{y'y}-n\bar{y}^2\\ \hline \\ \end{array} \]
The difficulty with this approach is finding the matrix \(\pmb{Z}\). However, we also have \(\text{SSE}=\pmb{y'y-\hat{\beta}'X'y}\) and (7.2.18), so
\[\pmb{y'y-\hat{\beta}'X'y}=\pmb{y'y-\hat{\gamma}'Z'y}\]
or
\[\pmb{\hat{\beta}'X'y}=\pmb{\hat{\gamma}'Z'y}\]
The same is true for the reduced model obtained by setting \(\beta_1=..=\beta_l\), and so we have the ANOVA table for testing \(H_0:\beta_1=..=\beta_l\) in balanced non-full rank models:
\[ \begin{array}{cccc} \text{Source} & \text{df} & \text{SS} & \text{F} \\ \hline \\ \text{Due to }\beta_1 & l & \pmb{\hat{\beta}'X'y}-\pmb{\hat{\beta}_2'X_2'y} & \frac{\text{SS}(\pmb{\hat{\beta}_1|\pmb{\hat{\beta}_2)}}/l}{\text{SSE}/(n-k)}\\ \text{ adjusted for } \beta_1\\ \text{Error} & n-k & \text{SSE}=\pmb{y'y-\hat{\beta}'X'y} \\ \text{Total} & n-1 & \text{SST}=\pmb{y'y}-n\bar{y}^2\\ \hline \\ \end{array} \]
If \(\pmb{y}\sim N_n(\pmb{X\beta}, \sigma^2\pmb{I})\), where \(\pmb{X}\) is \(n\times p\) pf rank \(k<p\le n\) , if \(\pmb{C}\) is \(m\times p\) of rank \(m\le k\) such that \(\pmb{C\beta}\) is a set of m linearly independent estimable functions, and if \(\pmb{\hat{\beta}}=(\pmb{X'X})^-\pmb{X'y}\), then
\(\pmb{C(X'X)^-C'}\) is nonsingular
\(\pmb{C\hat{\beta}}\sim N_m[\pmb{C\beta}, \sigma^2\pmb{C(X'X)^-C'}]\)
\(\text{SSH}/\sigma^2=(\pmb{C\hat{\beta}})'\pmb{C(X'X)^-C'}]^{-1}\pmb{C\hat{\beta}}'/\sigma^2 \sim \chi^2(m,\lambda)\) where \(\lambda=(\pmb{C\beta})'\pmb{C(X'X)^-C'}]^{-1}\pmb{C\beta}'/(2\sigma^2)\)
\(\text{SSE}/\sigma^2=\pmb{y}[\pmb{I-\pmb{X(X'X)^-X'}]y}/\sigma^2 \sim \chi^2(n-k)\)
SSH and SSE are independent
proof omitted
As an example of all of the above consider an additive two-way model without interactions and without repetitions:
\[y_{ij}=\mu+\alpha_i+\beta_j+\epsilon_{ij}\] i=1,2,3 and j=1,2
Let’s use R for some of the work:
make.X=function(I, J) {
X=NULL
for(i in 1:I) {
tmp=matrix(0, J, I)
tmp[, i]=1
X=rbind(X, cbind(tmp, diag(J)))
}
X=cbind(1, X)
X
}
X=make.X(3, 2)
qr(X)$rank
## [1] 4
t(X)%*%X
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 6 2 2 2 3 3
## [2,] 2 2 0 0 1 1
## [3,] 2 0 2 0 1 1
## [4,] 2 0 0 2 1 1
## [5,] 3 1 1 1 3 0
## [6,] 3 1 1 1 0 3
say we want to test \(H_0:\alpha_1=\alpha_2=\alpha_3\). This can be written as
\[H_0:\alpha_1-\alpha_2=0\text{ and } \alpha_1-\alpha_3=0\]
so \(H_0\) is testable if \(\alpha_1-\alpha_2=0\text{ and } \alpha_1-\alpha_3=0\) are estimable.
Let’s write
\[\alpha_1-\alpha_2 = \begin{pmatrix} 0 & 1 & -1& 0 & 0 & 0 \\ \end{pmatrix}\pmb{\beta}=\pmb{\lambda_1'\beta}\] then
\[ \begin{pmatrix} 0 & 1/2 & -1/2& 0 & 0 & 0 \\ \end{pmatrix} \begin{pmatrix} 6 & 2 & 2 & 2 & 3 & 3 \\ 2 & 2 & 0 & 0 & 1 & 1 \\ 2 & 0 & 2 & 0 & 1 & 1\\ 2 & 0 & 0 & 2 & 1 & 1\\ 3 & 1 & 1 & 1 & 3 & 0\\ 3 & 1 & 1 & 1 & 0 & 3 \end{pmatrix}= \begin{pmatrix} 0 & 1 & -1& 0 & 0 & 0 \\ \end{pmatrix} \]
Similarly we have
\[\alpha_1-\alpha_3 = \begin{pmatrix} 0 & 1 & 0& -1 & 0 & 0 \\ \end{pmatrix}\pmb{\beta}=\pmb{\lambda_2\beta}\] and
\[ \begin{pmatrix} 0 & 1/2 & 0& -1/2 & 0 & 0 \\ \end{pmatrix} \pmb{X'X}= \begin{pmatrix} 0 & 1 & 0&-1 & 0 & 0 \\ \end{pmatrix} \] If we need a complete set of linearly independent estimable functions we can subtract rows of \(\pmb{X}\). One solution is
\[ \begin{pmatrix} 1 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & -1 \\ 0 & 1 & -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & -1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} \mu \\ \alpha_1 \\ \alpha_2 \\\alpha_3 \\\beta_1 \\\beta_2 \end{pmatrix}= \begin{pmatrix} \mu + \alpha_1 +\beta_1\\ \beta_1-\beta_2 \\ \alpha_1-\alpha_2 \\ 0 \\ \alpha_1-\alpha_3\\ 0 \end{pmatrix}=\pmb{0} \] As we needed two estimable functions to express the null hypothesis, the sum of squares will have two degrees of freedom.
The normal equations are given by
\[ \begin{pmatrix} 6 & 2 & 2 & 2 & 3 & 3 \\ 2 & 2 & 0 & 0 & 1 & 1 \\ 2 & 0 & 2 & 0 & 1 & 1\\ 2 & 0 & 0 & 2 & 1 & 1\\ 3 & 1 & 1 & 1 & 3 & 0\\ 3 & 1 & 1 & 1 & 0 & 3 \end{pmatrix} \begin{pmatrix} \hat{\mu} \\ \hat{\alpha_1} \\ \hat{\alpha_2} \\ \hat{\alpha_3} \\ \hat{\beta_1} \\ \hat{\beta_2} \end{pmatrix}= \begin{pmatrix} y_{..} \\ y_{1.} \\ y_{2.} \\ y_{3.} \\ y_{.1} \\ y_{.2} \\ \end{pmatrix} \]
adding the side conditions \(\hat{\alpha}_1+\hat{\alpha}_2+\hat{\alpha}_3=0\) and \(\hat{\beta}_1+\hat{\beta}_2=0\) we find the solutions
\[ \begin{aligned} &\hat{\mu} =\bar{y}_{..} \\ &\hat{\alpha}_1 =\bar{y}_{1.}-\bar{y}_{..} \\ &\hat{\alpha}_2 =\bar{y}_{2.}-\bar{y}_{..} \\ &\hat{\alpha}_3 =\bar{y}_{3.}-\bar{y}_{..} \\ &\hat{\beta}_1 =\bar{y}_{.1}-\bar{y}_{..} \\ &\hat{\beta}_2 =\bar{y}_{.2}-\bar{y}_{..} \end{aligned} \]
For the test \(H_0:\alpha_1=\alpha_2=\alpha_3\) we need \(\pmb{\hat{\beta}X'y}\), which we denote by
\[ \begin{aligned} &\text{SS}(\mu, \alpha, \beta) = \pmb{\hat{\beta}X'y} =\\ &\begin{pmatrix} \hat{\mu} & \hat{\alpha_1} & \hat{\alpha_2} & \hat{\alpha_3} & \hat{\beta_1} & \hat{\beta_2} \end{pmatrix}= \begin{pmatrix} y_{..} \\ y_{1.} \\ y_{2.} \\ y_{3.} \\ y_{.1} \\ y_{.2} \\ \end{pmatrix} = \\ &\bar{y}_{..}y_{..}+\sum_{i=1}^3 (\bar{y}_{i.}-\bar{y}_{..})y_{i.}+\sum_{j=1}^2 (\bar{y}_{.j}-\bar{y}_{..})y_{.j}=\\ &y_{..}^2/6+\sum_{i=1}^3 (y_{i.}/2-y_{..}/6)y_{i.}+\sum_{j=1}^2 (y_{.j}/3-y_{..}/6)y_{.j}=\\ &y_{..}^2/6+\left( \sum_{i=1}^3 y_{i.}^2/2- y_{..}^2/6\right)+\left( \sum_{j=1}^2 y_{.j}^2/3- y_{..}^2/6\right) \end{aligned} \] where we used the fact that \(\sum_i y_{i.}=\sum_j y_{.j}=y_{..}\).
For the error sum of squares we find
\[ \begin{aligned} &\pmb{y'y-\hat{\beta}X'y}= \\ &\sum_{i,j} y_{ij}^2- y_{..}^2/6+\left( \sum_{i=1}^3 y_{i.}^2/2- y_{..}^2/6\right)+\left( \sum_{j=1}^2 y_{.j}^2/3- y_{..}^2/6\right) \end{aligned} \]
Next we need \(\pmb{\hat{\beta}X_2'y}\) for the reduced model
\[y_{ij}=\mu+\beta_j+\epsilon_{ij}\]
where \(\mu\) is \(\mu+\alpha\).
Now the normal equations are found as follows
X2=X[, -c(2:4)]
X2
## [,1] [,2] [,3]
## [1,] 1 1 0
## [2,] 1 0 1
## [3,] 1 1 0
## [4,] 1 0 1
## [5,] 1 1 0
## [6,] 1 0 1
t(X2)%*%X2
## [,1] [,2] [,3]
## [1,] 6 3 3
## [2,] 3 3 0
## [3,] 3 0 3
and adding the side condition \(\hat{\beta_1}+\hat{\beta}_2=0\) we have the system of equations
\[ \begin{aligned} &6\hat{\mu} +3\hat{\beta_1}+3\hat{\beta}_2 = y_{..}\\ &3\hat{\mu} +3\hat{\beta_1} = y_{.1}\\ & \hat{\beta_1}+\hat{\beta}_2=0 \\ \end{aligned} \] which we solve:
\[ \begin{aligned} &6\hat{\mu} +3(\hat{\beta_1}+\hat{\beta}_2) = 6\hat{\mu}=y_{..}\\ &\hat{\mu}=\bar{y}_{..}\\ &\text{I-2*II}: -3\hat{\beta_1}+3\hat{\beta}_2 = y_{..}-2y_{.1} \\ &-3\hat{\beta_1}-3\hat{\beta}_1 = y_{..}-2y_{.1} \\ &\hat{\beta_1} = (2y_{.1}-y_{..})/6 = \bar{y}_{.1}-\bar{y}_{..}\\ &\hat{\beta_2} = -(\bar{y}_{.1}-\bar{y}_{..}) = \bar{y}_{.2}-\bar{y}_{..}\\ \end{aligned} \]
\[ \begin{aligned} &\text{SS}(\mu, \beta) = \pmb{\hat{\beta}X_2'y} = \\ &\begin{pmatrix} \hat{\mu} & \hat{\beta_1} & \hat{\beta_2} \end{pmatrix}= \begin{pmatrix} y_{..} \\ y_{.1} \\ y_{.2} \\ \end{pmatrix} = \\ &\bar{y}_{..}y_{..}+\sum_{j=1}^2 (\bar{y}_{.j}-\bar{y}_{..})y_{.j}=\\ &y_{..}^2/6+\sum_{i=1}^3 (y_{i.}/2-y_{..}/6)y_{i.}+\sum_{j=1}^2 (y_{.j}/3-y_{..}/6)y_{.j}=\\ &y_{..}^2/6+\left( \sum_{j=1}^2 y_{.j}^2/3- y_{..}^2/6\right) & = \\ \end{aligned} \] Finally
\[\text{SS}(\alpha|\mu,\beta)=\pmb{\hat{\beta}X'y}-\pmb{\hat{\beta}_2X_2'y}=\sum_{i=1}^3 y_{i.}^2/2- y_{..}^2/6\] and we summarize the test for \(H_0:\alpha_1=\alpha_2=\alpha_3\) in the ANOVA table
\[ \begin{array}{cccc} \text{Source} & \text{df} & \text{SS} & \text{F} \\ \hline \\ \text{Due to }\alpha & 2 & \text{SS}(\alpha|\mu,\beta)=\sum_{i} y_{i.}^2/2- y_{..}^2/6 & \frac{\text{SS}(\alpha|\mu,\beta)/2}{\text{SSE}/2}\\ \text{ adjusted for } \beta,\mu\\ \text{Error} & 2 & \text{SSE}=\pmb{y'y-\hat{\beta}'X'y} \\ \text{Total} & 5 & \text{SST}=\pmb{y'y}-n\bar{y}^2 \\ \hline \\ \end{array} \]
Let’s generate some data and see how this works. We do this so that the null hypothesis is true in y1 and false in y2:
make.X=function(I, J) {
X=NULL
for(i in 1:I) {
tmp=matrix(0, J, I)
tmp[, i]=1
X=rbind(X, cbind(tmp, diag(J)))
}
X=cbind(1, X)
X
}
X=make.X(3, 2)
X
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 1 0 0 1 0
## [2,] 1 1 0 0 0 1
## [3,] 1 0 1 0 1 0
## [4,] 1 0 1 0 0 1
## [5,] 1 0 0 1 1 0
## [6,] 1 0 0 1 0 1
n=nrow(X)
k=qr(X)$rank
beta1=rbind(5, 0, 0, 0, -2, 2)
beta2=rbind(5, -3, -1, 4, -2, 2)
epsilon=rnorm(6, 0, 1)
y1=cbind(round(X%*%beta1+epsilon, 3))
y2=cbind(round(X%*%beta2+epsilon, 3))
# Find ydots and betahats
I=3;J=2
ydots1=c(t(X)%*%y1)
tmp=ydots1/apply(X, 2, sum)
tmp[-1]=tmp[-1]-tmp[1]
betahat1=tmp
ydots2=c(t(X)%*%y2)
tmp=ydots2/apply(X, 2, sum)
tmp[-1]=tmp[-1]-tmp[1]
betahat2=tmp
rbind(round(betahat1, 3), round(betahat2, 3))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 4.648 0.365 -0.129 -0.236 -1.975 1.976
## [2,] 4.648 -2.636 -1.129 3.764 -1.975 1.976
# Find sse
sse1=t(y1)%*%y1-rbind(betahat1)%*%t(X)%*%y1
sse2=t(y2)%*%y2-rbind(betahat2)%*%t(X)%*%y2
round(c(sse1, sse2), 5)
## [1] 1.12942 1.12942
# Find salpha given mu, beta
ssalpha1=sum(ydots1[1+1:I]^2/J)-ydots1[1]^2/(I*J)
ssalpha2=sum(ydots2[1+1:I]^2/J)-ydots2[1]^2/(I*J)
round(c(ssalpha1, ssalpha2), 3)
## [1] 0.410 44.784
# Find F statistics
F1=(ssalpha1/(I-1))/(sse1/(n-k))
F2=(ssalpha2/(I-1))/(sse2/(n-k))
# Find p values
round(c(F1, F2), 1)
## [1] 0.4 39.7
c(1-pf(F1, I-1, n-k), 1-pf(F2, I-1, n-k))
## [1] 0.73370245 0.02459893
and so we correctly reject the second case but not the first.
Let’s apply this to the hearing aid data. Instead of deriving the equations all over we do this by analogy:
I=4;J=24
X=make.X(I, J)
n=nrow(X)
k=qr(X)$rank
k
## [1] 27
y=as.matrix(hearingaid[, 3, drop=FALSE])
ydots=c(t(X)%*%y)
tmp=ydots/apply(X, 2, sum)
tmp[-1]=tmp[-1]-tmp[1]
betahat=tmp
round(betahat, 3)
## [1] 28.312 4.438 1.354 -3.062 -2.729 -3.812 -4.312 -0.312 -7.812
## [10] 2.688 -0.312 -1.312 0.188 8.188 2.188 -0.312 2.688 3.688
## [19] 7.688 5.188 -1.312 0.188 -8.812 9.688 -6.812 -11.312 -6.312
## [28] -1.312 11.688
# Find sse
sse=t(y)%*%y-rbind(betahat)%*%t(X)%*%y
round(sse/c(1, n-k), 3)
## [1] 2506.542 36.327
# Find salpha given mu, beta
ssalpha=sum(ydots[1+1:I]^2/J)-ydots[1]^2/(I*J)
round(ssalpha/c(1, I-1), 3)
## [1] 920.458 306.819
# Find F statistics
FTS=(ssalpha/(I-1))/(sse/(n-k))
round(FTS, 3)
## Score
## Score 8.446
# Find p value
1-pf(FTS, I-1, n-k)
## Score
## Score 7.412012e-05
and so we reject the null hypothesis, the lists are not all equal.
How about letting R do all the work?
summary(aov(Score~as.factor(Subject)+as.factor(List), data=hearingaid))
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(Subject) 23 3232 140.51 3.868 6.96e-06
## as.factor(List) 3 920 306.82 8.446 7.41e-05
## Residuals 69 2507 36.33
and so we have found all the relevant numbers!