Things begin to be rather difficult when we consider two (or more) factors with an unequal number of replications, maybe even some factor-level combinations with zero counts. In this situation there turn out to be several ways to parametrize the same experiment which can actually lead to different results, and there is no mathematical way to determine which is correct. In practice one needs to use subject matter knowledge to decide which model to use. In general when planning an experiment it is highly recommended to attempt an at least somewhat balanced design to avoid those difficulties.
In this section we will discuss one approach called the cell means model:
\[y_{ijk}=\mu+\alpha_i+\beta_j+\epsilon_{ijk} = \mu_{ij}+\epsilon_{ijk}\]
i=1,,.,I;j=1,..J,K;k=1,..,nij
Throughout this section we will use the following for purposes of illustration: I=2,J=3, n12=n21=1, n11=n13=n23=2 and n22=3, so \(N=\sum n_{ij} = 11\). Therefore we have the equations
\[ \begin{aligned} &y_{111} = \mu_{11}+\epsilon_{111}\\ &y_{112} = \mu_{11}+\epsilon_{112}\\ &y_{121} = \mu_{12}+\epsilon_{121}\\ &y_{131} = \mu_{13}+\epsilon_{131}\\ &y_{132} = \mu_{13}+\epsilon_{132}\\ &y_{211} = \mu_{21}+\epsilon_{211}\\ &y_{221} = \mu_{22}+\epsilon_{221}\\ &y_{222} = \mu_{22}+\epsilon_{222}\\ &y_{223} = \mu_{22}+\epsilon_{223}\\ &y_{231} = \mu_{23}+\epsilon_{231}\\ &y_{232} = \mu_{23}+\epsilon_{232}\\ \end{aligned} \]
In matrix form the model is \(\pmb{y=W\mu+\epsilon}\) or
\[ \begin{pmatrix} y_{111} \\ y_{112} \\ y_{121} \\ y_{131} \\ y_{132} \\ y_{211} \\ y_{221} \\ y_{222} \\ y_{223} \\ y_{231} \\ y_{232} \end{pmatrix}= \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ \end{pmatrix} \begin{pmatrix} \mu_{11} \\ \mu_{12} \\ \mu_{13} \\ \mu_{21} \\ \mu_{22} \\ \mu_{23} \\ \end{pmatrix}= \begin{pmatrix} \epsilon_{111} \\ \epsilon_{112} \\ \epsilon_{121} \\ \epsilon_{131} \\ \epsilon_{132} \\ \epsilon_{211} \\ \epsilon_{221} \\ \epsilon_{222} \\ \epsilon_{223} \\ \epsilon_{231} \\ \epsilon_{232} \end{pmatrix} \]
here \(\pmb{y}\) and \(\pmb{\epsilon}\) are \(11\times 1\) and \(\pmb{W}\) is \(11\times 6\), in general they are \(N\times 1\) and \(\pmb{W}\) is \(N\times IJ\).
Since \(\pmb{W}\) is full rank we have
\[\hat{\mu} = (\pmb{W'W})^{-1}\pmb{W'y}\]
Let’s use R to find \(\hat{\mu}\):
W=rbind(c(1 , 0 , 0 , 0 , 0 , 0),
c(1 , 0 , 0 , 0 , 0 , 0),
c(0 , 1 , 0 , 0 , 0 , 0),
c(0 , 0 , 1 , 0 , 0 , 0),
c(0 , 0 , 1 , 0 , 0 , 0),
c(0 , 0 , 0 , 1 , 0 , 0),
c(0 , 0 , 0 , 0 , 1 , 0),
c(0 , 0 , 0 , 0 , 1 , 0),
c(0 , 0 , 0 , 0 , 1 , 0),
c(0 , 0 , 0 , 0 , 0 , 1),
c(0 , 0 , 0 , 0 , 0 , 1))
t(W)%*%W
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 2 0 0 0 0 0
## [2,] 0 1 0 0 0 0
## [3,] 0 0 2 0 0 0
## [4,] 0 0 0 1 0 0
## [5,] 0 0 0 0 3 0
## [6,] 0 0 0 0 0 2
shows that \(\pmb{W'W}\) is a diagonal matrix with the respective sample sizes on the diagonal, so in general \((\pmb{W'W})^{-1}=diag(\frac1{n_{11}}, ..,\frac1{n_{IJ}})\). Of course \(\pmb{W'y}=(y_{11.}, .., y_{IJ.})'\), and so
\[\pmb{\hat{\mu}} = \begin{pmatrix} \bar{y}_{11.} \\\bar{y}_{12.}\\\vdots\\\bar{y}_{IJ.} \end{pmatrix}=\pmb{\bar{y}}\] Also
\[ \begin{aligned} &cov(\pmb{\hat{\mu}}) =\sigma^2 (\pmb{W'W})^{-1}=\\ &diag(\frac{\sigma^2}{n_{11}}, ..,\frac{\sigma^2}{n_{IJ}}) \end{aligned} \]
An unbiased estimator of \(\sigma^2\) is given by
\[s^2=\frac{\text{SSE}}{\nu_E}=\frac{(\pmb{y-W\hat{\mu}})'(\pmb{y-W\hat{\mu}})}{N-IJ}\] Alternatively we have
\[\text{SSE}=\pmb{y'[I-W(W'W)^{-1}W']y}=\sum_{ijk}(y_{ijk}-\bar{y}_{ij})^2\]
If \(n_{ij}\ge 2\) for all cells we can find the within-cell variance \(s^2_{ij}=\frac1{n_{ij}-1}\sum_{k=1}^{n_{ij}} (y_{ijk}-\bar{y}_{ij} )^2\) and the \(s^2\) can be written as the pooled estimator.
\[s^2=\frac{\sum_{ij}(n_{ij}-1)s_{ij}^2}{N-IJ}\]
Notice that the cell means model does not have any explicit terms for main effects and/or interactions. These are “hard-coded” in to the \(\mu_{ij}\)’s. If we want to test for them we have to express them as contrasts.
Let’s consider the main effect of factor A. In the vector
\[\pmb{\mu} = \begin{pmatrix} \mu_{11} & \mu_{12} & \mu_{13} & \mu_{21} & \mu_{22} & \mu_{23} \end{pmatrix}'\]
the first three elements correspond to the first level of factor A, the last three to the second level. So for the main affect of A we can compare the mean of \(\mu_{11},\mu_{12},\mu_{13}\) to the mean of \(\mu_{21},\mu_{22},\mu_{23}\). The difference can be expressed as a contrast:
\[ \begin{aligned} &\pmb{a'\mu} = \mu_{11}+\mu_{12}+\mu_{13}-\mu_{21}-\mu_{22}-\mu_{23}\\ &\begin{pmatrix} 1&1&1&-1&-1&-1 \\ \end{pmatrix}\pmb{\mu} = \\ &(\mu_{11}-\mu_{21})+(\mu_{12}-\mu_{22})+(\mu_{13}-\mu_{23}) \end{aligned} \]
So we have \(H_0:\pmb{a'\mu}=0\) is equivalent to \(H_0: (\mu_{11}-\mu_{21})+(\mu_{12}-\mu_{22})+(\mu_{13}-\mu_{23})=0\).
Factor B has three levels, so we will need two contrasts:
\[ \begin{aligned} &\pmb{b_1'\mu} = 2(\mu_{11}+\mu_{21})-(\mu_{12}+\mu_{22})-(\mu_{13}+\mu_{23}) =\\ &\begin{pmatrix} 2&-1&-1&2&-1&-1 \\ \end{pmatrix}\pmb{\mu} \\ &\pmb{b_2'\mu} = (\mu_{12}+\mu_{22})-(\mu_{13}+\mu_{23}) =\\ &\begin{pmatrix} 0&1&-1&0&1&-1 \\ \end{pmatrix}\pmb{\mu} \\ \end{aligned} \]
Setting \(\pmb{B} = \begin{pmatrix} \pmb{b}_1' \\ \pmb{b}_2' \end{pmatrix}\) we have \(H_0:\pmb{B\mu}=0\) is equivalent to \(H_0: \mu_{11}+\mu_{21}=\mu_{12}+\mu_{22}=\mu_{13}+\mu_{23}\).
Finally for the interaction we find the test \(H_0: \mu_{11}-\mu_{21}=\mu_{12}-\mu_{22}=\mu_{13}-\mu_{23}\)
which also can be written in terms of contrasts:
\[\pmb{C} = \begin{pmatrix} 2&-1&-1&-2&1&1 \\ 0&1&-1&0&-1&1\end{pmatrix}\]
Here are the corresponding tests:
\[F=\frac{\pmb{(a'\hat{\mu})'[a'(W'W)^{-1}a]^{-1}(a'\hat{\mu})}}{s^2}=\\\frac{(\sum_{ij}a_{ij}\bar{y}_{ij.})^2}{s^2\sum_{ij}a^2_{ij}/n_{ij}}\sim F(1, N-ab)\] - \(H_0:\pmb{B'\mu}=0\)
\[F=\frac{\pmb{(B'\hat{\mu})'[B'(W'W)^{-1}B]^{-1}(B'\hat{\mu})}}{s^2}=\sim F(\nu_B, N-ab)\] where \(\nu_B\) is the number of rows of \(\pmb{B}\).
\[F=\frac{\pmb{(C'\hat{\mu})'[C'(W'W)^{-1}C]^{-1}(C'\hat{\mu})}}{s^2}=\sim F(\nu_C, N-ab)\] where \(\nu_C\) is the number of rows of \(\pmb{C}\).
Remington and Schork carried out an experiment to investigate the effects of smoking on physical activity. 21 people were classified as to their smoking habit and then randomly assigned to one of three physical activities. The time until maximum oxygen uptake in minutes was recorded. The data is
df=data.frame(
Time=c(12.8,13.5,11.2,16.2,17.8,22.6,19.3,18.9,9.2,7.5,13.2,8.1,16.2,16.1,17.8),
Smoking=factor(rep(c("None", "Heavy"), c(8, 7)),
levels = c("None", "Heavy"),
ordered = TRUE),
Activity=factor(c(rep(c("Bycicle", "Treatmill", "Step"), c(3, 2, 3)),
rep(c("Bycicle", "Treatmill", "Step"), c(2,2,3))),
levels = c("Bycicle", "Treatmill", "Step"),
ordered=TRUE)
)
kable.nice(df, do.row.names = FALSE)
Time | Smoking | Activity |
---|---|---|
12.8 | None | Bycicle |
13.5 | None | Bycicle |
11.2 | None | Bycicle |
16.2 | None | Treatmill |
17.8 | None | Treatmill |
22.6 | None | Step |
19.3 | None | Step |
18.9 | None | Step |
9.2 | Heavy | Bycicle |
7.5 | Heavy | Bycicle |
13.2 | Heavy | Treatmill |
8.1 | Heavy | Treatmill |
16.2 | Heavy | Step |
16.1 | Heavy | Step |
17.8 | Heavy | Step |
The sample sizes and within cell means are
n=tapply(df[, 1], df[, 2:3], length)
n
## Activity
## Smoking Bycicle Treatmill Step
## None 3 2 3
## Heavy 2 2 3
N=sum(n)
We need
I=2;J=3
baryij.=c(t(tapply(df[, 1], df[, 2:3], mean)))
n=c(t(n))
sse=sum( (df$Time-rep(baryij., n))^2)
s2=sse/(N-I*J)
a=c(1,1,1,-1,-1,-1)
FA=sum(a*baryij.)^2/s2/sum(a^2/n)
round(c(sse/c(1,N-I*J), FA, 1-pf(FA, 1, N-I*J)), 4)
## [1] 28.5767 3.1752 24.9272 0.0007
W.Winv=diag(1/n)
B=rbind(c(2, -1, -1, 2, -1, -1),
c(0, 1, -1, 0, 1, -1))
muhat=cbind(baryij.)
ssB=t(B%*%muhat)%*%solve(B%*%W.Winv%*%t(B))%*%(B%*%muhat)
FB=(ssB/2)/(sse/(N-I*J))
round(c(FB, 1-pf(FA, 2, N-I*J)), 4)
## [1] 27.8014 0.0002
C=rbind(c(2, -1, -1, -2, 1, 1),
c(0, 1, -1, 0, -1, 1))
ssC=t(C%*%muhat)%*%solve(C%*%W.Winv%*%t(C))%*%(C%*%muhat)
FC=(ssC/2)/(sse/(N-I*J))
round(c(FC, 1-pf(FC, 2, N-I*J)), 4)
## [1] 0.7678 0.4922
Let’s compare this with
summary(aov(Time~Smoking*Activity, data=df))
## Df Sum Sq Mean Sq F value Pr(>F)
## Smoking 1 58.30 58.30 18.362 0.002035
## Activity 2 180.33 90.17 28.398 0.000129
## Smoking:Activity 2 4.88 2.44 0.768 0.492173
## Residuals 9 28.58 3.18
and we see while the error sum of squares is the same and the results are as well (both factors are significant whereas the interaction is not), the values of the F statistics are somewhat different. This is because aov uses a somewhat different parametrization.
If it is clear that there is no interaction between the factors (maybe because of the design of the experiment) we might wish to test an additive model. Using the cell means approach this means we want to fit the constraint model
\[\pmb{y=W\mu+\epsilon}\text{ subject to }\pmb{C\mu=0}\]
such a model can be fit using Lagrange multipliers.
Sometimes it happens that \(n_{ij}=0\) for one (or a few) cells. It turns out that the cell means model is still viable, with the following change: the matrix \(\pmb{W}\) now has rows with all 0’s, as many as there are cells with 0 counts. Therefore \(\pmb{W}\) is no-full rank and no longer has an inverse. However, if \((W'W)^{-1}\) is replaced by the generalized inverse \((W'W)^{-}\) the above derivation still holds.