Problem 1

We have data \(X_1,..,X_n\sim N(\mu_x,\sigma)\) and \(Y_1,..,Y_m\sim N(\mu_y,\sigma)\), \(\sigma\) unknown. (This is the so called two sample problem). Say we want to test

\[H_0:\mu_x=\mu_y\text{ vs. }H_0:\mu_x\ne\mu_y\]

The classical test is based on the test statistic

\[T=\frac{\bar{x}-\bar{y}}{s_p\sqrt{1/n+1/m}}\] where \(s_p^2=\frac{(n-1)s_x^2+(m-1)s_y^2}{n+m-2}\) is called the pooled standard deviation. Under the null hypothesis \(T\sim t(n+m-2)\) and the the test rejects the null hypothesis if \(|T|>qt_{\alpha/2,n+m-2}\).

Derive the two sample problem as a special case of ANOVA.

We have the model \(\pmb{y=X\beta+\epsilon}\) where

\[ \pmb{X} = \begin{pmatrix} \pmb{j_n} & \pmb{j_n}& 0 \\ \pmb{j_m} & 0 & \pmb{j_m} \\ \end{pmatrix}\\ \pmb{X'X}= \begin{pmatrix} n+m & n & m \\ n & n & 0 \\ m & 0 & m \end{pmatrix}\\ (\pmb{X'X})^{-}= \begin{pmatrix} 0 & 1/n & 0 \\ 0 & 0 & 1/m \\ \end{pmatrix}\\ \pmb{\hat{\beta}} = \begin{pmatrix} \mu\\\alpha_1\\\alpha_2 \end{pmatrix}= (\pmb{X'X})^{-}\pmb{X'y}=\\ \begin{pmatrix} 0 & 1/n & 0 \\ 0 & 0 & 1/m \\ \end{pmatrix} \begin{pmatrix} \pmb{j_n} & \pmb{j_m} \\ \pmb{j_n} & 0 \\ 0 & \pmb{j_m} \end{pmatrix} \begin{pmatrix} \pmb{x} \\ \pmb{y} \end{pmatrix}=\\ \begin{pmatrix} 0 & 1/n & 0 \\ 0 & 0 & 1/m \\ \end{pmatrix} \begin{pmatrix} \sum x_i+\sum y_j \\ \sum x_i\\ \sum y_j \end{pmatrix} = \begin{pmatrix} 0 \\ \bar{x}\\ \bar{y} \end{pmatrix} \] The null hypothesis of the two sample problem is equivalent to

\[H_0:\alpha_1-\alpha_2=0\text{ vs. }H_0:\alpha_1-\alpha_2\ne0\] which is a contrast with \(\pmb{c} = \begin{pmatrix} 0 &1 &-1 \end{pmatrix}'\). We can therefore use theorem (9.3.5). We find the test statistic

\[F = \frac{(\bar{x}-\bar{y})^2}{s^2(1/n+1/m)}\] Now

\[ \begin{aligned} &(n+m-2)s^2= \sum_{i,j} (y_{ij}-\bar{y}_{i.})^2=\\ &\sum_{i} (x_{i}-\bar{x})^2+\sum_{j} (y_{j}-\bar{y})^2 = \\ &(n-1)s_x^2+(m-1)s_y^2=s_p^2 \end{aligned} \] and so

\[F = \frac{(\bar{x}-\bar{y})^2}{s_p^2(1/n+1/m)}\]

and under the null hypothesis \(F\sim F(1, n+m-2)\). Clearly \(T^2=F\), and so

\[ \begin{aligned} &\alpha =P(\text{reject }H_0\vert H_0\text{ true}) = \\ &P(F>qf_{\alpha,1,n+m-2}) = \\ &1-P(F<qf(\alpha,1,n+m-2)) = \\ &1-P(T^2<qf_{\alpha,1,n+m-2}) = \\ &1-P\left(-\sqrt{qf_{\alpha,1,n+m-2}}<|T|<\sqrt{qf_{\alpha,1,n+m-2}}\right) = \\ &1-P\left(-qt_{\alpha/2,n+m-2}<|T|<qt_{\alpha/2,n+m-2}\right) = \\ &P\left(|T|>qt_{\alpha/2,n+m-2}\right) \end{aligned} \] and so the two tests are identical.

Note this derivation can also be done by considering a contrast \(\pmb{c} =\begin{pmatrix} 1 & -1 \end{pmatrix}\)

Problem 2

In an experiment an industrial engineer studied the effect of the type of coating and its thickness on the durability of a certain type of paint. There were three types of coatings labeled 1, 2 and 3, and three thickness levels labeled thin, medium and thick. The durability was measured in days. The data is

thin medium thick
1 166, 154, 155, 156, 149 167, 171, 166, 165, 185 181, 185, 178, 178, 174
2 219, 241, 216, 220, 220 263, 241, 246, 245, 224 242, 258, 257, 242, 250
3 277, 276, 277, 278, 280 309, 281, 309, 302, 314 350, 348, 359, 340, 342

Analyze this data to see what effect(s) if any the type of coating and the thickness have on the durability. Which factor-level combination(s) is/are statistically significantly the best?

Clearly we have a balanced two way design. After entering the data into R and sorting the variable type by the mean durability we start by drawing the boxplots:

ggplot(data=df , aes(Type, Durability)) +
  geom_boxplot()

ggplot(data=df , aes(Thickness, Durability)) +
  geom_boxplot()

and so it seems that type has a strong and thickness a weak positive relationship with durability.

Here are the summary statistics:

sum.tbl <-
  data.frame(
    Type=1:3,
    n=as.integer(tapply(df$Durability, df$Type, length)),
    Mean=round(tapply(df$Durability, df$Type, mean), 1),
    Sd=round(tapply(df$Durability, df$Type, sd), 2)
)
kable.nice(sum.tbl, do.row.names = FALSE)
Type n Mean Sd
1 15 168.7 11.57
2 15 238.9 15.51
3 15 309.5 31.11
sum.tbl <-
  data.frame(
    Thickness=c("thin", "medium", "thick"),
    n=as.integer(tapply(df$Durability, df$Thickness, length)),
    Mean=round(tapply(df$Durability, df$Thickness, mean), 1),
    Sd=round(tapply(df$Durability, df$Thickness, sd), 2)
)
kable.nice(sum.tbl, do.row.names = FALSE)
Thickness n Mean Sd
thin 15 218.9 51.87
medium 15 239.2 57.06
thick 15 258.9 71.82

Next we check for interaction:

mns = tapply(df$Durability, df[, -1], mean)
df1=data.frame(Type=rep(1:3, 3),
              Means=c(mns),
              Thickness = 
              factor(rep(c("thin", "medium", "thick"), each=3), 
                     levels=c("thin", "medium", "thick"), ordered = TRUE))
ggplot(data=df1, aes(Type, Means, color=Thickness)) +
  geom_point() +  geom_line() 

which indicates some interaction. Let’s check:

fit=aov(Durability~.^2 , data=df)
summary(fit)
##                Df Sum Sq Mean Sq F value   Pr(>F)
## Type            2 148685   74342  946.37  < 2e-16
## Thickness       2  12001    6000   76.38 1.11e-13
## Type:Thickness  4   3959     990   12.60 1.65e-06
## Residuals      36   2828      79

and indeed the interaction term is significant. Therefore both factors are significant.

Let’s check the assumptions. Here is the residual vs. fits plot:

df <- data.frame(Residuals=resid(fit), 
            Fits = fitted(fit))
ggplot(data=df, aes(Fits, Residuals)) +
            geom_point() +
            geom_hline(yintercept = 0)        

which shows that the equal variance assumption is justified.

The normal plot of residuals:

df <- data.frame(Residuals=resid(fit), 
            Fits = fitted(fit))
ggplot(data=df, aes(sample=Residuals)) +
           geom_qq() + geom_qq_line()       

give some hint of non-normal residuals but is not bad enough to be problematic.

Finally we use Tukey to do pairwise comparisons. Because of the interaction this is done for the factor-level combinations:

tuk <- TukeyHSD(fit)[[3]] 

Now the interaction plot shows that the best combination is “3-thick” and the second best is “3-medium”, Tukey yields

tuk["3:thick-3:medium", ]
##         diff          lwr          upr        p adj 
## 4.480000e+01 2.631799e+01 6.328201e+01 5.913594e-08

which shows that these two are stat. significantly different. Therefore “3-thick” is stat. significantly best.