Things begin to be rather difficult when we consider two (or more) factors with an unequal number of replications, maybe even some factor-level combinations with zero counts. In this situation there turn out to be several ways to parametrize the same experiment which can actually lead to different results, and there is no mathematical way to determine which is correct. In practice one needs to use subject matter knowledge to decide which model to use. In general when planning an experiment it is highly recommended to attempt an at least somewhat balanced design to avoid those difficulties.

In this section we will discuss one approach called the cell means model:

\[y_{ijk}=\mu+\alpha_i+\beta_j+\epsilon_{ijk} = \mu_{ij}+\epsilon_{ijk}\]

i=1,,.,I;j=1,..J,K;k=1,..,nij

Unconstrained Model

Throughout this section we will use the following for purposes of illustration: I=2,J=3, n12=n21=1, n11=n13=n23=2 and n22=3, so \(N=\sum n_{ij} = 11\). Therefore we have the equations

\[ \begin{aligned} &y_{111} = \mu_{11}+\epsilon_{111}\\ &y_{112} = \mu_{11}+\epsilon_{112}\\ &y_{121} = \mu_{12}+\epsilon_{121}\\ &y_{131} = \mu_{13}+\epsilon_{131}\\ &y_{132} = \mu_{13}+\epsilon_{132}\\ &y_{211} = \mu_{21}+\epsilon_{211}\\ &y_{221} = \mu_{22}+\epsilon_{221}\\ &y_{222} = \mu_{22}+\epsilon_{222}\\ &y_{223} = \mu_{22}+\epsilon_{223}\\ &y_{231} = \mu_{23}+\epsilon_{231}\\ &y_{232} = \mu_{23}+\epsilon_{232}\\ \end{aligned} \]

In matrix form the model is \(\pmb{y=W\mu+\epsilon}\) or

\[ \begin{pmatrix} y_{111} \\ y_{112} \\ y_{121} \\ y_{131} \\ y_{132} \\ y_{211} \\ y_{221} \\ y_{222} \\ y_{223} \\ y_{231} \\ y_{232} \end{pmatrix}= \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ \end{pmatrix} \begin{pmatrix} \mu_{11} \\ \mu_{12} \\ \mu_{13} \\ \mu_{21} \\ \mu_{22} \\ \mu_{23} \\ \end{pmatrix}= \begin{pmatrix} \epsilon_{111} \\ \epsilon_{112} \\ \epsilon_{121} \\ \epsilon_{131} \\ \epsilon_{132} \\ \epsilon_{211} \\ \epsilon_{221} \\ \epsilon_{222} \\ \epsilon_{223} \\ \epsilon_{231} \\ \epsilon_{232} \end{pmatrix} \]

here \(\pmb{y}\) and \(\pmb{\epsilon}\) are \(11\times 1\) and \(\pmb{W}\) is \(11\times 6\), in general they are \(N\times 1\) and \(\pmb{W}\) is \(N\times IJ\).

Since \(\pmb{W}\) is full rank we have

\[\hat{\mu} = (\pmb{W'W})^{-1}\pmb{W'y}\]

Let’s use R to find \(\hat{\mu}\):

W=rbind(c(1 , 0 , 0 , 0 , 0 , 0),
c(1 , 0 , 0 , 0 , 0 , 0),
c(0 , 1 , 0 , 0 , 0 , 0),
c(0 , 0 , 1 , 0 , 0 , 0),
c(0 , 0 , 1 , 0 , 0 , 0),
c(0 , 0 , 0 , 1 , 0 , 0),
c(0 , 0 , 0 , 0 , 1 , 0),
c(0 , 0 , 0 , 0 , 1 , 0),
c(0 , 0 , 0 , 0 , 1 , 0),
c(0 , 0 , 0 , 0 , 0 , 1),
c(0 , 0 , 0 , 0 , 0 , 1))
t(W)%*%W
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    2    0    0    0    0    0
## [2,]    0    1    0    0    0    0
## [3,]    0    0    2    0    0    0
## [4,]    0    0    0    1    0    0
## [5,]    0    0    0    0    3    0
## [6,]    0    0    0    0    0    2

shows that \(\pmb{W'W}\) is a diagonal matrix with the respective sample sizes on the diagonal, so in general \((\pmb{W'W})^{-1}=diag(\frac1{n_{11}}, ..,\frac1{n_{IJ}})\). Of course \(\pmb{W'y}=(y_{11.}, .., y_{IJ.})'\), and so

\[\pmb{\hat{\mu}} = \begin{pmatrix} \bar{y}_{11.} \\\bar{y}_{12.}\\\vdots\\\bar{y}_{IJ.} \end{pmatrix}=\pmb{\bar{y}}\] Also

\[ \begin{aligned} &cov(\pmb{\hat{\mu}}) =\sigma^2 (\pmb{W'W})^{-1}=\\ &diag(\frac{\sigma^2}{n_{11}}, ..,\frac{\sigma^2}{n_{IJ}}) \end{aligned} \]

An unbiased estimator of \(\sigma^2\) is given by

\[s^2=\frac{\text{SSE}}{\nu_E}=\frac{(\pmb{y-W\hat{\mu}})'(\pmb{y-W\hat{\mu}})}{N-IJ}\] Alternatively we have

\[\text{SSE}=\pmb{y'[I-W(W'W)^{-1}W']y}=\sum_{ijk}(y_{ijk}-\bar{y}_{ij})^2\]

If \(n_{ij}\ge 2\) for all cells we can find the within-cell variance \(s^2_{ij}=\frac1{n_{ij}-1}\sum_{k=1}^{n_{ij}} (y_{ijk}-\bar{y}_{ij} )^2\) and the \(s^2\) can be written as the pooled estimator.

\[s^2=\frac{\sum_{ij}(n_{ij}-1)s_{ij}^2}{N-IJ}\]


Notice that the cell means model does not have any explicit terms for main effects and/or interactions. These are “hard-coded” in to the \(\mu_{ij}\)’s. If we want to test for them we have to express them as contrasts.

Let’s consider the main effect of factor A. In the vector

\[\pmb{\mu} = \begin{pmatrix} \mu_{11} & \mu_{12} & \mu_{13} & \mu_{21} & \mu_{22} & \mu_{23} \end{pmatrix}'\]

the first three elements correspond to the first level of factor A, the last three to the second level. So for the main affect of A we can compare the mean of \(\mu_{11},\mu_{12},\mu_{13}\) to the mean of \(\mu_{21},\mu_{22},\mu_{23}\). The difference can be expressed as a contrast:

\[ \begin{aligned} &\pmb{a'\mu} = \mu_{11}+\mu_{12}+\mu_{13}-\mu_{21}-\mu_{22}-\mu_{23}\\ &\begin{pmatrix} 1&1&1&-1&-1&-1 \\ \end{pmatrix}\pmb{\mu} = \\ &(\mu_{11}-\mu_{21})+(\mu_{12}-\mu_{22})+(\mu_{13}-\mu_{23}) \end{aligned} \]

So we have \(H_0:\pmb{a'\mu}=0\) is equivalent to \(H_0: (\mu_{11}-\mu_{21})+(\mu_{12}-\mu_{22})+(\mu_{13}-\mu_{23})=0\).

Factor B has three levels, so we will need two contrasts:

\[ \begin{aligned} &\pmb{b_1'\mu} = 2(\mu_{11}+\mu_{21})-(\mu_{12}+\mu_{22})-(\mu_{13}+\mu_{23}) =\\ &\begin{pmatrix} 2&-1&-1&2&-1&-1 \\ \end{pmatrix}\pmb{\mu} \\ &\pmb{b_2'\mu} = (\mu_{12}+\mu_{22})-(\mu_{13}+\mu_{23}) =\\ &\begin{pmatrix} 0&1&-1&0&1&-1 \\ \end{pmatrix}\pmb{\mu} \\ \end{aligned} \]

Setting \(\pmb{B} = \begin{pmatrix} \pmb{b}_1' \\ \pmb{b}_2' \end{pmatrix}\) we have \(H_0:\pmb{B\mu}=0\) is equivalent to \(H_0: \mu_{11}+\mu_{21}=\mu_{12}+\mu_{22}=\mu_{13}+\mu_{23}\).

Finally for the interaction we find the test \(H_0: \mu_{11}-\mu_{21}=\mu_{12}-\mu_{22}=\mu_{13}-\mu_{23}\)

which also can be written in terms of contrasts:

\[\pmb{C} = \begin{pmatrix} 2&-1&-1&-2&1&1 \\ 0&1&-1&0&-1&1\end{pmatrix}\]

Here are the corresponding tests:

  • \(H_0:\pmb{a'\mu}=0\)

\[F=\frac{\pmb{(a'\hat{\mu})'[a'(W'W)^{-1}a]^{-1}(a'\hat{\mu})}}{s^2}=\\\frac{(\sum_{ij}a_{ij}\bar{y}_{ij.})^2}{s^2\sum_{ij}a^2_{ij}/n_{ij}}\sim F(1, N-ab)\] - \(H_0:\pmb{B'\mu}=0\)

\[F=\frac{\pmb{(B'\hat{\mu})'[B'(W'W)^{-1}B]^{-1}(B'\hat{\mu})}}{s^2}=\sim F(\nu_B, N-ab)\] where \(\nu_B\) is the number of rows of \(\pmb{B}\).

  • Interaction test \(H_0:\pmb{C'\mu}=0\)

\[F=\frac{\pmb{(C'\hat{\mu})'[C'(W'W)^{-1}C]^{-1}(C'\hat{\mu})}}{s^2}=\sim F(\nu_C, N-ab)\] where \(\nu_C\) is the number of rows of \(\pmb{C}\).

Example

Remington and Schork carried out an experiment to investigate the effects of smoking on physical activity. 21 people were classified as to their smoking habit and then randomly assigned to one of three physical activities. The time until maximum oxygen uptake in minutes was recorded. The data is

df=data.frame(
  Time=c(12.8,13.5,11.2,16.2,17.8,22.6,19.3,18.9,9.2,7.5,13.2,8.1,16.2,16.1,17.8),
  Smoking=factor(rep(c("None", "Heavy"), c(8, 7)), 
                levels = c("None", "Heavy"),
                ordered = TRUE),
  Activity=factor(c(rep(c("Bycicle", "Treatmill", "Step"), c(3, 2, 3)),
                    rep(c("Bycicle", "Treatmill", "Step"), c(2,2,3))),
                levels = c("Bycicle", "Treatmill", "Step"),
                ordered=TRUE)
)
kable.nice(df, do.row.names = FALSE)
Time Smoking Activity
12.8 None Bycicle
13.5 None Bycicle
11.2 None Bycicle
16.2 None Treatmill
17.8 None Treatmill
22.6 None Step
19.3 None Step
18.9 None Step
9.2 Heavy Bycicle
7.5 Heavy Bycicle
13.2 Heavy Treatmill
8.1 Heavy Treatmill
16.2 Heavy Step
16.1 Heavy Step
17.8 Heavy Step

The sample sizes and within cell means are

n=tapply(df[, 1], df[, 2:3], length)
n
##        Activity
## Smoking Bycicle Treatmill Step
##   None        3         2    3
##   Heavy       2         2    3
N=sum(n)

We need

I=2;J=3
baryij.=c(t(tapply(df[, 1], df[, 2:3], mean)))
n=c(t(n))
sse=sum( (df$Time-rep(baryij., n))^2)
s2=sse/(N-I*J)
  • Test for Smoking:
a=c(1,1,1,-1,-1,-1)
FA=sum(a*baryij.)^2/s2/sum(a^2/n)
round(c(sse/c(1,N-I*J), FA, 1-pf(FA, 1, N-I*J)), 4)
## [1] 28.5767  3.1752 24.9272  0.0007
  • Test for Activity:
W.Winv=diag(1/n)
B=rbind(c(2, -1, -1, 2, -1, -1),
        c(0, 1, -1, 0, 1, -1))
muhat=cbind(baryij.)
ssB=t(B%*%muhat)%*%solve(B%*%W.Winv%*%t(B))%*%(B%*%muhat)
FB=(ssB/2)/(sse/(N-I*J))
round(c(FB, 1-pf(FA, 2, N-I*J)), 4)
## [1] 27.8014  0.0002
  • Test for Interaction:
C=rbind(c(2, -1, -1, -2, 1, 1),
        c(0, 1, -1, 0, -1, 1))
ssC=t(C%*%muhat)%*%solve(C%*%W.Winv%*%t(C))%*%(C%*%muhat)
FC=(ssC/2)/(sse/(N-I*J))
round(c(FC, 1-pf(FC, 2, N-I*J)), 4)
## [1] 0.7678 0.4922

Let’s compare this with

summary(aov(Time~Smoking*Activity, data=df))
##                  Df Sum Sq Mean Sq F value   Pr(>F)
## Smoking           1  58.30   58.30  18.362 0.002035
## Activity          2 180.33   90.17  28.398 0.000129
## Smoking:Activity  2   4.88    2.44   0.768 0.492173
## Residuals         9  28.58    3.18

and we see while the error sum of squares is the same and the results are as well (both factors are significant whereas the interaction is not), the values of the F statistics are somewhat different. This is because aov uses a somewhat different parametrization.

Additive Model

If it is clear that there is no interaction between the factors (maybe because of the design of the experiment) we might wish to test an additive model. Using the cell means approach this means we want to fit the constraint model

\[\pmb{y=W\mu+\epsilon}\text{ subject to }\pmb{C\mu=0}\]

such a model can be fit using Lagrange multipliers.

Missing Values

Sometimes it happens that \(n_{ij}=0\) for one (or a few) cells. It turns out that the cell means model is still viable, with the following change: the matrix \(\pmb{W}\) now has rows with all 0’s, as many as there are cells with 0 counts. Therefore \(\pmb{W}\) is no-full rank and no longer has an inverse. However, if \((W'W)^{-1}\) is replaced by the generalized inverse \((W'W)^{-}\) the above derivation still holds.