We have data \(X_1,..,X_n\sim N(\mu_x,\sigma)\) and \(Y_1,..,Y_m\sim N(\mu_y,\sigma)\), \(\sigma\) unknown. (This is the so called two sample problem). Say we want to test
\[H_0:\mu_x=\mu_y\text{ vs. }H_0:\mu_x\ne\mu_y\]
The classical test is based on the test statistic
\[T=\frac{\bar{x}-\bar{y}}{s_p\sqrt{1/n+1/m}}\] where \(s_p^2=\frac{(n-1)s_x^2+(m-1)s_y^2}{n+m-2}\) is called the pooled standard deviation. Under the null hypothesis \(T\sim t(n+m-2)\) and the the test rejects the null hypothesis if \(|T|>qt_{\alpha/2,n+m-2}\).
Derive the two sample problem as a special case of ANOVA.
We have the model \(\pmb{y=X\beta+\epsilon}\) where
\[ \pmb{X} = \begin{pmatrix} \pmb{j_n} & \pmb{j_n}& 0 \\ \pmb{j_m} & 0 & \pmb{j_m} \\ \end{pmatrix}\\ \pmb{X'X}= \begin{pmatrix} n+m & n & m \\ n & n & 0 \\ m & 0 & m \end{pmatrix}\\ (\pmb{X'X})^{-}= \begin{pmatrix} 0 & 1/n & 0 \\ 0 & 0 & 1/m \\ \end{pmatrix}\\ \pmb{\hat{\beta}} = \begin{pmatrix} \mu\\\alpha_1\\\alpha_2 \end{pmatrix}= (\pmb{X'X})^{-}\pmb{X'y}=\\ \begin{pmatrix} 0 & 1/n & 0 \\ 0 & 0 & 1/m \\ \end{pmatrix} \begin{pmatrix} \pmb{j_n} & \pmb{j_m} \\ \pmb{j_n} & 0 \\ 0 & \pmb{j_m} \end{pmatrix} \begin{pmatrix} \pmb{x} \\ \pmb{y} \end{pmatrix}=\\ \begin{pmatrix} 0 & 1/n & 0 \\ 0 & 0 & 1/m \\ \end{pmatrix} \begin{pmatrix} \sum x_i+\sum y_j \\ \sum x_i\\ \sum y_j \end{pmatrix} = \begin{pmatrix} 0 \\ \bar{x}\\ \bar{y} \end{pmatrix} \] The null hypothesis of the two sample problem is equivalent to
\[H_0:\alpha_1-\alpha_2=0\text{ vs. }H_0:\alpha_1-\alpha_2\ne0\] which is a contrast with \(\pmb{c} = \begin{pmatrix} 0 &1 &-1 \end{pmatrix}'\). We can therefore use theorem (9.3.5). We find the test statistic
\[F = \frac{(\bar{x}-\bar{y})^2}{s^2(1/n+1/m)}\] Now
\[ \begin{aligned} &(n+m-2)s^2= \sum_{i,j} (y_{ij}-\bar{y}_{i.})^2=\\ &\sum_{i} (x_{i}-\bar{x})^2+\sum_{j} (y_{j}-\bar{y})^2 = \\ &(n-1)s_x^2+(m-1)s_y^2=s_p^2 \end{aligned} \] and so
\[F = \frac{(\bar{x}-\bar{y})^2}{s_p^2(1/n+1/m)}\]
and under the null hypothesis \(F\sim F(1, n+m-2)\). Clearly \(T^2=F\), and so
\[ \begin{aligned} &\alpha =P(\text{reject }H_0\vert H_0\text{ true}) = \\ &P(F>qf_{\alpha,1,n+m-2}) = \\ &1-P(F<qf(\alpha,1,n+m-2)) = \\ &1-P(T^2<qf_{\alpha,1,n+m-2}) = \\ &1-P\left(-\sqrt{qf_{\alpha,1,n+m-2}}<|T|<\sqrt{qf_{\alpha,1,n+m-2}}\right) = \\ &1-P\left(-qt_{\alpha/2,n+m-2}<|T|<qt_{\alpha/2,n+m-2}\right) = \\ &P\left(|T|>qt_{\alpha/2,n+m-2}\right) \end{aligned} \] and so the two tests are identical.
Note this derivation can also be done by considering a contrast \(\pmb{c} =\begin{pmatrix} 1 & -1 \end{pmatrix}\)
In an experiment an industrial engineer studied the effect of the type of coating and its thickness on the durability of a certain type of paint. There were three types of coatings labeled 1, 2 and 3, and three thickness levels labeled thin, medium and thick. The durability was measured in days. The data is
thin | medium | thick | |
---|---|---|---|
1 | 166, 154, 155, 156, 149 | 167, 171, 166, 165, 185 | 181, 185, 178, 178, 174 |
2 | 219, 241, 216, 220, 220 | 263, 241, 246, 245, 224 | 242, 258, 257, 242, 250 |
3 | 277, 276, 277, 278, 280 | 309, 281, 309, 302, 314 | 350, 348, 359, 340, 342 |
Analyze this data to see what effect(s) if any the type of coating and the thickness have on the durability. Which factor-level combination(s) is/are statistically significantly the best?
Clearly we have a balanced two way design. After entering the data into R and sorting the variable type by the mean durability we start by drawing the boxplots:
ggplot(data=df , aes(Type, Durability)) +
geom_boxplot()
ggplot(data=df , aes(Thickness, Durability)) +
geom_boxplot()
and so it seems that type has a strong and thickness a weak positive relationship with durability.
Here are the summary statistics:
sum.tbl <-
data.frame(
Type=1:3,
n=as.integer(tapply(df$Durability, df$Type, length)),
Mean=round(tapply(df$Durability, df$Type, mean), 1),
Sd=round(tapply(df$Durability, df$Type, sd), 2)
)
kable.nice(sum.tbl, do.row.names = FALSE)
Type | n | Mean | Sd |
---|---|---|---|
1 | 15 | 168.7 | 11.57 |
2 | 15 | 238.9 | 15.51 |
3 | 15 | 309.5 | 31.11 |
sum.tbl <-
data.frame(
Thickness=c("thin", "medium", "thick"),
n=as.integer(tapply(df$Durability, df$Thickness, length)),
Mean=round(tapply(df$Durability, df$Thickness, mean), 1),
Sd=round(tapply(df$Durability, df$Thickness, sd), 2)
)
kable.nice(sum.tbl, do.row.names = FALSE)
Thickness | n | Mean | Sd |
---|---|---|---|
thin | 15 | 218.9 | 51.87 |
medium | 15 | 239.2 | 57.06 |
thick | 15 | 258.9 | 71.82 |
Next we check for interaction:
mns = tapply(df$Durability, df[, -1], mean)
df1=data.frame(Type=rep(1:3, 3),
Means=c(mns),
Thickness =
factor(rep(c("thin", "medium", "thick"), each=3),
levels=c("thin", "medium", "thick"), ordered = TRUE))
ggplot(data=df1, aes(Type, Means, color=Thickness)) +
geom_point() + geom_line()
which indicates some interaction. Let’s check:
fit=aov(Durability~.^2 , data=df)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## Type 2 148685 74342 946.37 < 2e-16
## Thickness 2 12001 6000 76.38 1.11e-13
## Type:Thickness 4 3959 990 12.60 1.65e-06
## Residuals 36 2828 79
and indeed the interaction term is significant. Therefore both factors are significant.
Let’s check the assumptions. Here is the residual vs. fits plot:
df <- data.frame(Residuals=resid(fit),
Fits = fitted(fit))
ggplot(data=df, aes(Fits, Residuals)) +
geom_point() +
geom_hline(yintercept = 0)
which shows that the equal variance assumption is justified.
The normal plot of residuals:
df <- data.frame(Residuals=resid(fit),
Fits = fitted(fit))
ggplot(data=df, aes(sample=Residuals)) +
geom_qq() + geom_qq_line()
give some hint of non-normal residuals but is not bad enough to be problematic.
Finally we use Tukey to do pairwise comparisons. Because of the interaction this is done for the factor-level combinations:
tuk <- TukeyHSD(fit)[[3]]
Now the interaction plot shows that the best combination is “3-thick” and the second best is “3-medium”, Tukey yields
tuk["3:thick-3:medium", ]
## diff lwr upr p adj
## 4.480000e+01 2.631799e+01 6.328201e+01 5.913594e-08
which shows that these two are stat. significantly different. Therefore “3-thick” is stat. significantly best.