Models with more than one predictor

ANOVA with more than one factor

ANOVA can be viewed as a linear model in the following way: Say we have a quantitative response variable y and a categorical predictor \(x\). Then

\[ y_{ij} = \mu + \alpha_i + \epsilon_{ij} \] so the \(ij^{th}\) observation comes from an overall mean \(\mu\), a contribution \(\alpha_i\) due to the fact that this observation comes from the \(i^{th}\) group and an error term.

With this formalism the null hypothesis is

\[ H_0: \alpha_1=..=\alpha_k=0 \] Now this extends naturally to more than one predictor. Say there are two, then

\[ y_{ijk} = \mu + \alpha_i + \beta_j + \gamma_{ij} + \epsilon_{ijk} \]

Notice the new term \(\gamma_{ij}\). It is meant to measure interaction, that is a possible relationship (association) between the factors.

If there is no such relationship, the model simplifies to an additive model:

\[ y_{ijk} = \mu + \alpha_i + \beta_j + \epsilon_{ijk} \]

Example: Testing Hearing Aids

Reference: Loven, Faith. (1981). A Study of the Interlist Equivalency of the CID W-22 Word List Presented in Quiet and in Noise. Unpublished MS Thesis, University of Iowa.

Description: Percent of a Standard 50-word list heard correctly in the presence of background noise. 24 subjects with normal hearing listened to standard audiology tapes of English words at low volume with a noisy background. They repeated the words and were scored correct or incorrect in their perception of the words. The order of list presentation was randomized.

The word lists are standard audiology tools for assessing hearing. They are calibrated to be equally difficult to perceive. However, the original calibration was performed with normal-hearing subjects and no noise background. The experimenter wished to determine whether the lists were still equally difficult to understand in the presence of a noisy background.

kable.nice(head(hearingaid))
Subject List Score
1 1 28
2 1 24
3 1 32
4 1 30
5 1 34
6 1 30

Notice that the values in both Subject and List are NOT numbers but labels, so both of them are categorical!

Because there are two factors this is called a twoway ANOVA. More specifically, this is a Randomized Block design with List as the factor and Subject as a blocking variable because the factor Subject is not of interest in itself.

Let’s start by looking at the boxplots. Because neither List nor Subject has an obvious ordering we use size:

tmp <- tapply(hearingaid$Score, hearingaid$List, mean)
hearingaid$List <- factor(hearingaid$List,              
                          levels=order(tmp), 
                          ordered = TRUE)
tmp <- tapply(hearingaid$Score, hearingaid$Subject, mean)
hearingaid$Subject <- factor(hearingaid$Subject, 
                             levels=order(tmp), 
                             ordered = TRUE)
pushViewport(viewport(layout = grid.layout(1, 2)))
print(ggplot(data=hearingaid, aes(List, Score)) +
         geom_boxplot(),
  vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggplot(data=hearingaid, aes(Subject, Score)) +
        geom_boxplot() ,
  vp=viewport(layout.pos.row=1, layout.pos.col=2))    

this shows why we should include Subject in the analysis: it has a large variation.

The summary statistics are:

sum.tbl <-
  data.frame(
    List=c(3, 4, 2, 1),
    n=as.integer(tapply(hearingaid$Score,
               hearingaid$List,length)),
    Mean=round(tapply(hearingaid$Score, 
                       hearingaid$List, mean), 1),
    Sd=round(tapply(hearingaid$Score, 
                    hearingaid$List, sd), 2)
)
rownames(sum.tbl) <- NULL
List n Mean Sd
3 24 25.2 8.32
4 24 25.6 7.78
2 24 29.7 8.06
1 24 32.8 7.41
kable.nice(sum.tbl)

Because Subject is the blocking variable one would normally not include a table of summary statistics.

Now for the test, or better tests, because we can in general test for either Subject or List. The routine we will use is again aov:

fit <- aov(Score~., data=hearingaid) 
summary(fit)           
##             Df Sum Sq Mean Sq F value   Pr(>F)
## Subject     23   3232  140.51   3.868 6.96e-06
## List         3    920  306.82   8.446 7.41e-05
## Residuals   69   2507   36.33

So we have two tests, one for List and one for Subject. However, only the one for List is of interest:

  1. Parameters of interest: List group means
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha=0.05\)
  5. Null hypothesis H0: \(\mu_1 = .. =\mu_4\) (List groups have the same means)
  6. Alternative hypothesis Ha: \(\mu_i \ne \mu_j\) (at least two List groups have different means)
  7. p value=0.00
  8. 0.000<0.05, there is some evidence that the group means are not the same, that List means are different)

As always we need to check the assumptions:

  • normal residuals

The normal plot of residuals looks fine.

df <- data.frame(Residuals=resid(fit), 
                 Fits = fitted(fit))
ggplot(data=df, aes(sample=Residuals)) +
  geom_qq() +geom_qq_line()

  • equal variance

In a oneway ANOVA we could just find the group standard deviations and compare them. Now (and in general if there is more than one factor) this is no longer a good idea, mainly because there are to many factor level combinations (4*24 here) and not enough observations for each (one here). Instead we will do the same as in the regression case, namely check the residual vs. fits plot for a change in spread from left to right.

ggplot(data=df, aes(Fits, Residuals)) +
  geom_point() +
  geom_hline(yintercept = 0)

again, everything looks fine.

Notice that the ANOVA table also has the test for the Subject means. This is not very interesting, the boxplot already makes it clear that different subjects have very different hearing abilities. If that were not so, we would eliminate Subject and run a oneway ANOVA.

Because we now have two factors, we need to worry about an additional problem, namely whether or not there is a relationship between the two factors. This is called interaction.

To check we can draw the interaction plot.

What is drawn here? First we find the mean response for each factor-level combination. Then plot those points vs. one of the factors and finally connect the dots that belong to the other factor.

Here the line segments are almost parallel. This implies that for any value of the factor A going from one value of B to the next adds the same amount to the response. So if we go from B=1 to B=2 both lines move up by about 2.0, and if we go from B=2 to B=3 both lines move down by 0.75.

Because of this we call such a model additive

Now consider the following interactions plot:

Here as we go from B=2 to B=3 the line goes up by 4 if A=1 but it goes down by 0.5 if A=1.

Deciding from the graph whether or not there is interaction is not always easy. Here are four interaction plots from a simulated data set, all guaranteed NOT to have any interaction:

This is even worse because in ANOVA problems we often have very small data sets, so there is a great amount of variation in these graphs from sample to sample.

So it would be nice if we could actually test for interaction, but that requires repeated measurements.

In the hearing aid data we only have one observation for each combination of Subject and List, so we need to decide on the basis of the interaction plot:

ggplot(data = hearingaid,
       aes(Subject , Score, 
           colour = List, group=List)) +
      stat_summary(fun.y=mean, geom="point")+
      stat_summary(fun.y=mean, geom="line")

There seems to be interaction between Lists and Subjects

Finally, it would of course be interesting to study just which lists are different, that is we could do a multiple comparison:

TukeyHSD(fit)$List
##          diff        lwr       upr        p adj
## 4-3 0.3333333 -4.2473895  4.914056 0.9974833454
## 2-3 4.4166667 -0.1640562  8.997390 0.0628040720
## 1-3 7.5000000  2.9192771 12.080723 0.0003038968
## 2-4 4.0833333 -0.4973895  8.664056 0.0974569856
## 1-4 7.1666667  2.5859438 11.747390 0.0005910298
## 1-2 3.0833333 -1.4973895  7.664056 0.2954408670

so List 1 is statistically significantly different from Lists 3 and 4.

No other differences are statistically significant.

Because Subject is only a blocking variable we won’t a multiple comparison for it.

Example: Gasoline Type and Milage

In an experiment to study gas mileage four different blends of gasoline are tested in each of three makes of automobiles. The cars are driven a fixed distance to determine the mpg (miles per gallon) The experiment is repeated three times for each blend-automobile combination. (Taken from Lyman Ott)

Note that the interest here is indifferent gasoline blends, automobile is a blocking variable, so this is a randomized block design.

Gasoline is numbers, but these are just codes for different blends, so it is a categorical variable or factor.

kable.nice(head(gasoline))
MPG Gasoline Automobile
22.7 1 A
22.4 1 A
22.9 1 A
21.5 2 A
21.8 2 A
21.6 2 A

Here is an interesting calculation:

table(gasoline$Gasoline, gasoline$Automobile)
##    
##     A B C
##   1 3 3 3
##   2 3 3 3
##   3 3 3 3
##   4 3 3 3

This shows us two things:

  1. we have repeated measurements (several observations per factor-level combination)

  2. we have a balanced design (the same number of repetitions in each factor-level combination)

This second feature used to be quite important because the calculations in a balanced design are much simpler. Nowadays with fast computers this is not important anymore. There are still good reasons why you want to design your experiment to have a balanced design if possible, though!

tmp <- tapply(gasoline$MPG, gasoline$Gasoline, mean)
gasoline$Gasoline <- factor(gasoline$Gasoline, 
                            levels=order(tmp), 
                            ordered = TRUE)
pushViewport(viewport(layout = grid.layout(1, 2)))
print(ggplot(data=gasoline, aes(Gasoline, MPG)) +
         geom_boxplot() ,
  vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggplot(data=gasoline, aes(Automobile, MPG)) + 
        geom_boxplot() ,
  vp=viewport(layout.pos.row=1, layout.pos.col=2))      

the boxplots suggest a difference between blends but not between automobiles.

The summary statistics are

sum.tbl <-
  data.frame(
    Gasoline=c(4, 2, 3, 1),
    n=as.integer(tapply(gasoline$MPG, 
                        gasoline$Gasoline,length)),
    Mean=round(tapply(gasoline$MPG, 
                      gasoline$Gasoline, mean), 1),
    Sd=round(tapply(gasoline$MPG, 
                    gasoline$Gasoline, sd), 2)
)
rownames(sum.tbl) <- NULL
Gasoline n Mean Sd
4 9 20.5 0.36
2 9 21.2 0.37
3 9 21.9 0.25
1 9 22.8 0.36
kable.nice(sum.tbl)

Interaction:

ggplot(data = gasoline,
       aes(Automobile , MPG, 
           colour = Gasoline, group=Gasoline)) +
      stat_summary(fun.y=mean, geom="point")+
      stat_summary(fun.y=mean, geom="line")

Lines are (almost) parallel, so there is no indication of interaction. We have repeated measurements (3 per factor-level combination), so we can test for this:

fit <- aov(MPG~Gasoline*Automobile, data=gasoline)
summary(fit)
##                     Df Sum Sq Mean Sq F value   Pr(>F)
## Gasoline             3 25.405   8.468  90.464 3.21e-13
## Automobile           2  0.527   0.263   2.813   0.0799
## Gasoline:Automobile  6  0.909   0.151   1.618   0.1854
## Residuals           24  2.247   0.094
  1. Parameters of interest: Interaction
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha\)=0.05
  5. Null hypothesis H0 : no interaction
  6. Alternative hypothesis Ha: some interaction
  7. p value = 0.1854
  8. 0.1854 > 0.05, there is no evidence of interaction.

So we will now proceed without the interaction term:

fit <- aov(MPG~., data=gasoline)
summary(fit)
##             Df Sum Sq Mean Sq F value   Pr(>F)
## Gasoline     3 25.405   8.468  80.510 1.89e-14
## Automobile   2  0.527   0.263   2.504   0.0987
## Residuals   30  3.156   0.105

let’s check the assumptions:

df <- data.frame(Residuals=resid(fit), 
                 Fits = fitted(fit))
pushViewport(viewport(layout = grid.layout(1, 2)))
print(ggplot(data=df, aes(sample=Residuals)) +
  geom_qq() +geom_qq_line(),
  vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggplot(data=df, aes(Fits, Residuals)) +
  geom_point() +
  geom_hline(yintercept = 0),
  vp=viewport(layout.pos.row=1, layout.pos.col=2))

the plots look fine, so no problem with the assumptions.

Now let’s test for the factors:

Test for Factor Gasoline:

  1. Parameters of interest: means of gasoline groups
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha\) = 0.05
  5. Null hypothesis H0 : \(\mu_1 = .. = \mu_4\) (Gasoline groups have the same means)
  6. Alternative hypothesis Ha: \(\mu_i \ne \mu_j\) (Gasoline groups have different means)
  7. p value=0.000
  8. 0.000<0.05, there is some evidence of differences in gasoline blends

Test for Factor Automobile is not really needed because this is a blocking variable.

Notice that if we included the interaction the p-value for Automobile was 0.08, without the interaction it is 0.1. One advantage of being able to fit an additive model is that often it makes the conclusions stronger.

TukeyHSD(fit)$Gasoline
##          diff       lwr      upr        p adj
## 2-4 0.6888889 0.2731720 1.104606 5.176968e-04
## 3-4 1.3888889 0.9731720 1.804606 2.402867e-09
## 1-4 2.2666667 1.8509497 2.682384 6.139533e-14
## 3-2 0.7000000 0.2842831 1.115717 4.235469e-04
## 1-2 1.5777778 1.1620609 1.993495 1.293686e-10
## 1-3 0.8777778 0.4620609 1.293495 1.650229e-05

so all blends are stat. significantly different, with blend 1 having the highest miles per gallon.

Example: Film Thickness in Semiconductor Production

Chemical vapor deposition is a process used in the semiconductor industry to deposit thin films of silicon dioxide and photoresit on substrates of wafers as they are manufactured. The films must be as thin as possible and have a uniform thickness, which is measured by a process called infrared interference. A process engineer wants to evaluate a low-pressure chemical vapor deposition process that reduces costs and increases productivity. The engineer has set up an experiment to study the effect of chamber temperature and pressure on film thickness.

kable.nice(head(filmcoatings))
Thickness Temperature Pressure
42 Low Low
43 Low Low
39 Low Low
45 Low Mid
43 Low Mid
45 Low Mid
table(Temperature, Pressure)
##            Pressure
## Temperature High Low Mid
##        High    3   3   3
##        Low     3   3   3
##        Mid     3   3   3

so again we have balanced design with repeated measurements

filmcoatings$Temperature <-
  factor(filmcoatings$Temperature, 
  levels=unique(filmcoatings$Temperature),
  ordered=TRUE)
filmcoatings$Pressure <-
  factor(filmcoatings$Pressure, 
  levels=unique(filmcoatings$Pressure),
  ordered=TRUE)
pushViewport(viewport(layout = grid.layout(1, 2)))
print(ggplot(data=filmcoatings, 
             aes(Temperature, Thickness)) + 
         geom_boxplot(),
vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggplot(data=filmcoatings, 
             aes(Pressure, Thickness)) +
         geom_boxplot(),
 vp=viewport(layout.pos.row=1, layout.pos.col=2))       

Unlike in the hearing aid or gasoline experiments, here we equally interested in both factors. This type of experiment is called a factorial design problem.

For us there is no practical difference between a randomized block design and a factorial design but the distinction can be important in other analyses.

sum.tbl <-
  data.frame(
    Temperature=unique(filmcoatings$Temperature),
    n=as.integer(tapply(filmcoatings$Thickness,
             filmcoatings$Temperature,length)),
    Mean=round(tapply(filmcoatings$Thickness,
             filmcoatings$Temperature, mean), 1),
    Sd=round(tapply(filmcoatings$Thickness,
             filmcoatings$Temperature, sd), 2)
)
rownames(sum.tbl) <- NULL
Temperature n Mean Sd
Low 9 43.7 2.29
Mid 9 38.0 2.35
High 9 37.7 2.92
kable.nice(sum.tbl)
sum.tbl <-
  data.frame(
    Pressure=unique(filmcoatings$Pressure),
    n=as.integer(tapply(filmcoatings$Thickness,
             filmcoatings$Pressure,length)),
    Mean=round(tapply(filmcoatings$Thickness,
             filmcoatings$Pressure, mean), 1),
    Sd=round(tapply(filmcoatings$Thickness,
             filmcoatings$Pressure, sd), 2)
)
rownames(sum.tbl) <- NULL
Pressure n Mean Sd
Low 9 38.1 2.85
Mid 9 39.1 4.37
High 9 42.1 2.80
kable.nice(sum.tbl)

Interaction

ggplot(data = filmcoatings,
       aes(Temperature, Thickness , 
           colour = Pressure, group=Pressure)) +
      stat_summary(fun.y=mean, geom="point")+
      stat_summary(fun.y=mean, geom="line")

The lines are not all parallel, so there is likely some interaction. Again we have repeated measurements (3 per factor-level combination), so we can actually test for this:

fit <- aov(Thickness~Temperature*Pressure,
           data=filmcoatings)
pushViewport(viewport(layout = grid.layout(1, 2)))
df <- data.frame(Residuals=resid(fit), 
            Fits = fitted(fit))
print(ggplot(data=df, aes(sample=Residuals)) +
           geom_qq() + geom_qq_line(),
    vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggplot(data=df, aes(Fits, Residuals)) +
            geom_point() +
            geom_hline(yintercept = 0),
    vp=viewport(layout.pos.row=1, layout.pos.col=2))        

the graphs show that there are no problems with the assumptions.

summary(fit)
##                      Df Sum Sq Mean Sq F value   Pr(>F)
## Temperature           2 204.67  102.33  47.638 6.46e-08
## Pressure              2  78.00   39.00  18.155 4.83e-05
## Temperature:Pressure  4  37.33    9.33   4.345   0.0124
## Residuals            18  38.67    2.15

Test for Interaction:

  1. Parameters of interest: Interaction
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha = 0.05\)
  5. Null hypothesis H0 : no interaction
  6. Alternative hypothesis Ha: some interaction
  7. p value = 0.0124
  8. 0.0124<0.05, there is some evidence of interaction

Test for Factor Temperature:

  1. Parameters of interest: means of temperature groups
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha = 0.05\)
  5. Null hypothesis H0 : \(\mu_1 = \mu_2 = \mu_3\) (Temperature groups have the same means)
  6. Alternative hypothesis Ha: \(\mu_i \ne \mu_j\) (Temperature groups have different means)
  7. p value = 0.000
  8. 0.000 < 0.05, there is some evidence of differences in temperature

Test for Factor Pressure:

  1. Parameters of interest: means of pressure groups
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha = 0.05\)
  5. Null hypothesis H0 : \(\mu_1 = \mu_2 = \mu_3\) (Pressure groups have the same means)
  6. Alternative hypothesis Ha: \(\mu_i \ne \mu_j\) (Pressure groups have different means)
  7. p value = 0.000
  8. 0.000<0.05, there is some evidence of differences in pressure

Finally, what we need is to find the best combination of pressure and temperature. So what we want is a multiple comparison for Temperature and Pressure (not either of them alone!). Easily done:

out <- TukeyHSD(fit)[[3]]
out
##                          diff         lwr        upr        p adj
## Mid:Low-Low:Low    -5.6666667  -9.8597499 -1.4735834 4.056833e-03
## High:Low-Low:Low   -4.0000000  -8.1930833  0.1930833 6.829422e-02
## Low:Mid-Low:Low     3.0000000  -1.1930833  7.1930833 2.908226e-01
## Mid:Mid-Low:Low    -3.0000000  -7.1930833  1.1930833 2.908226e-01
## High:Mid-Low:Low   -6.6666667 -10.8597499 -2.4735834 7.270485e-04
## Low:High-Low:Low    4.0000000  -0.1930833  8.1930833 6.829422e-02
## Mid:High-Low:Low   -1.3333333  -5.5264166  2.8597499 9.642909e-01
## High:High-Low:Low  -0.3333333  -4.5264166  3.8597499 9.999982e-01
## High:Low-Mid:Low    1.6666667  -2.5264166  5.8597499 8.866164e-01
## Low:Mid-Mid:Low     8.6666667   4.4735834 12.8597499 2.811245e-05
## Mid:Mid-Mid:Low     2.6666667  -1.5264166  6.8597499 4.291123e-01
## High:Mid-Mid:Low   -1.0000000  -5.1930833  3.1930833 9.938696e-01
## Low:High-Mid:Low    9.6666667   5.4735834 13.8597499 6.251132e-06
## Mid:High-Mid:Low    4.3333333   0.1402501  8.5264166 3.970300e-02
## High:High-Mid:Low   5.3333333   1.1402501  9.5264166 7.227536e-03
## Low:Mid-High:Low    7.0000000   2.8069167 11.1930833 4.142532e-04
## Mid:Mid-High:Low    1.0000000  -3.1930833  5.1930833 9.938696e-01
## High:Mid-High:Low  -2.6666667  -6.8597499  1.5264166 4.291123e-01
## Low:High-High:Low   8.0000000   3.8069167 12.1930833 8.029539e-05
## Mid:High-High:Low   2.6666667  -1.5264166  6.8597499 4.291123e-01
## High:High-High:Low  3.6666667  -0.5264166  7.8597499 1.147340e-01
## Mid:Mid-Low:Mid    -6.0000000 -10.1930833 -1.8069167 2.278849e-03
## High:Mid-Low:Mid   -9.6666667 -13.8597499 -5.4735834 6.251132e-06
## Low:High-Low:Mid    1.0000000  -3.1930833  5.1930833 9.938696e-01
## Mid:High-Low:Mid   -4.3333333  -8.5264166 -0.1402501 3.970300e-02
## High:High-Low:Mid  -3.3333333  -7.5264166  0.8597499 1.866153e-01
## High:Mid-Mid:Mid   -3.6666667  -7.8597499  0.5264166 1.147340e-01
## Low:High-Mid:Mid    7.0000000   2.8069167 11.1930833 4.142532e-04
## Mid:High-Mid:Mid    1.6666667  -2.5264166  5.8597499 8.866164e-01
## High:High-Mid:Mid   2.6666667  -1.5264166  6.8597499 4.291123e-01
## Low:High-High:Mid  10.6666667   6.4735834 14.8597499 1.512789e-06
## Mid:High-High:Mid   5.3333333   1.1402501  9.5264166 7.227536e-03
## High:High-High:Mid  6.3333333   2.1402501 10.5264166 1.284046e-03
## Mid:High-Low:High  -5.3333333  -9.5264166 -1.1402501 7.227536e-03
## High:High-Low:High -4.3333333  -8.5264166 -0.1402501 3.970300e-02
## High:High-Mid:High  1.0000000  -3.1930833  5.1930833 9.938696e-01

This is bit hard to read. Recall that we are only interested in small values of Thickness. Let’s redo the interaction plot:

ggplot(data = filmcoatings,
       aes(Temperature, Thickness , 
           colour = Pressure, group=Pressure)) +
      stat_summary(fun.y=mean, geom="point")+
      stat_summary(fun.y=mean, geom="line")

so we see that that the best combination is Temperature=High, Pressure=Mid.

The next-best is Temperature=Mid, Pressure=Low. What does Tukey say about the comparison of these?

out["High:Mid-Mid:Low", ]
##       diff        lwr        upr      p adj 
## -1.0000000 -5.1930833  3.1930833  0.9938696

so they are NOT statistically significant.

Let’s check the next couple of combinations:

out["High:Mid-High:Low", ]
##       diff        lwr        upr      p adj 
## -2.6666667 -6.8597499  1.5264166  0.4291123
out["High:Mid-Mid:Mid", ]
##       diff        lwr        upr      p adj 
## -3.6666667 -7.8597499  0.5264166  0.1147340
out["Mid:High-High:Mid", ]
##        diff         lwr         upr       p adj 
## 5.333333333 1.140250075 9.526416591 0.007227536

and now we have a stat. significant difference.

Notice that in the last one we have to change the order, because Tukey did as well.

So either of the four combinations (High Mid, Mid Low, High Low or Mid Mid), at least not at these sample sizes.


A simple idea for solving this problem seems to be this one:

  1. find the best temperature:
sort(round(tapply(Thickness, Temperature, mean), 1))
## High  Mid  Low 
## 37.7 38.0 43.7

so Temperature=High is best

  1. find the best pressure:
sort(round(tapply(Thickness, Pressure, mean), 1))
##  Low  Mid High 
## 38.1 39.1 42.1

so Pressure=Low is best

  1. take the combination: Pressure=Low, Temperature=High is best! Except it is not: we saw before that Pressure=Mid, Temperature=High is best.

This simple idea does not work because of the presence of interaction.

Example: Water Quality and Mining

The effects of mining and rock type on water quality.

kable.nice(head(mines))
Rock Mine Iron
Sandstone Unmined 0.20
Sandstone Unmined 0.25
Sandstone Unmined 0.04
Sandstone Unmined 0.06
Sandstone Unmined 1.20
Sandstone Unmined 0.30
table(Rock, Mine)
##            Mine
## Rock        Abandoned Reclaimed Unmined
##   Limestone        13        13      13
##   Sandstone        13        13      13
mines$Mine <- factor(mines$Mine,
        levels = c("Unmined", "Reclaimed", "Abandoned"),
        ordered = TRUE)
pushViewport(viewport(layout = grid.layout(1, 2)))
print(ggplot(data=mines, aes(Rock, Iron)) + 
         geom_boxplot(),
vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggplot(mines, aes(Mine, Iron)) +
         geom_boxplot(),
 vp=viewport(layout.pos.row=1, layout.pos.col=2))

We have a clear problem with outliers (aka the normal assumption), so we try the log transform:

mines$log.iron <- log(mines$Iron)
pushViewport(viewport(layout = grid.layout(1, 2)))
print(ggplot(data=mines, aes(Rock, log.iron)) + 
         geom_boxplot(),
vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggplot(mines, aes(Mine, log.iron)) +
         geom_boxplot(),
 vp=viewport(layout.pos.row=1, layout.pos.col=2))

This has solved the problem, so the analysis will be based on log.iron.

Summary Statistics

Because we use a transformation we will base the tables on Median and IQR

sum.tbl <-
  data.frame(
    Mine=levels(mines$Mine),
    n=as.integer(tapply(mines$Iron,
             mines$Mine,length)),
    Median=round(tapply(mines$Iron,
             mines$Mine, median), 1),
    IQR=round(tapply(mines$Iron,
             mines$Mine, IQR), 2)
)
rownames(sum.tbl) <- NULL
Mine n Median IQR
Unmined 26 0.5 2.72
Reclaimed 26 0.7 0.83
Abandoned 26 1.6 10.24
kable.nice(sum.tbl)

Note that the IQR’s are very different. This is because this data set has a lot of outliers which still effect the IQR.

Interaction

ggplot(data = mines,
       aes(Rock , log.iron, 
           colour = Mine, group=Mine)) +
      stat_summary(fun.y=mean, geom="point")+
      stat_summary(fun.y=mean, geom="line")

There seems to be some interaction. To confirm this test for it:

fit <- aov(log.iron~Rock*Mine, data=mines)
pushViewport(viewport(layout = grid.layout(1, 2)))
df <- data.frame(Residuals=resid(fit), 
            Fits = fitted(fit))
print(ggplot(data=df, aes(sample=Residuals)) +
           geom_qq() + geom_qq_line(),
    vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(ggplot(data=df, aes(Fits, Residuals)) +
            geom_point() +
            geom_hline(yintercept = 0),
    vp=viewport(layout.pos.row=1, layout.pos.col=2))        

assumptions are ok (after log transform!)

summary(fit)
##             Df Sum Sq Mean Sq F value   Pr(>F)
## Rock         1  10.15  10.154   4.629 0.034794
## Mine         2  45.13  22.566  10.287 0.000118
## Rock:Mine    2  43.45  21.727   9.904 0.000159
## Residuals   72 157.95   2.194

Test for Interaction:

  1. Parameters of interest: Interaction
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha = 0.05\)
  5. Null hypothesis H0 : no interaction
  6. Alternative hypothesis Ha: some interaction
  7. p value = 0.000
  8. 0.000<0.05, there is some evidence of interaction
    Check the assumptions of ANOVA: both plots look ok

Test for Factor Rock:

  1. Parameters of interest: means of pressure groups
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha = 0.05\)
  5. Null hypothesis H0 : \(\mu_1~= \mu_2\) (Rock groups have the same means)
  6. Alternative hypothesis Ha: \(\mu_1 \ne \mu_2\) (Rock groups have different means)
  7. p value = 0.035
  8. 0.035<0.05, there is some evidence of differences in Rock types.

Test for Factor Mine:

  1. Parameters of interest: means of pressure groups
  2. Method of analysis: ANOVA
  3. Assumptions of Method: residuals have a normal distribution, groups have equal variance
  4. Type I error probability \(\alpha = 0.05\)
  5. Null hypothesis H0 : \(\mu_1 = \mu_2 = \mu_3\) (Mine groups have the same means)
  6. Alternative hypothesis Ha: \(\mu_i \ne \mu_j\) (Mine groups have different means)
  7. p value = 0.000
  8. 0.000<0.05, there is some evidence of differences in Mine types

Multiple Comparison The main interest is in mines, so

TukeyHSD(fit)$Mine
##                            diff        lwr       upr        p adj
## Reclaimed-Unmined   -0.07955136 -1.0626180 0.9035152 0.9795435606
## Abandoned-Unmined    1.57238141  0.5893148 2.5554480 0.0007902888
## Abandoned-Reclaimed  1.65193277  0.6688662 2.6349994 0.0004103178

Interpretation: There is a stat. signif. difference between the mean iron content of abandoned mines and the others. The difference between unmined and reclaimed mines is not stat. sign, at least not at these sample sizes.

Example: Air Filters and Noise

The data are from a statement by Texaco, Inc. to the Air and Water Pollution Subcommittee of the Senate Public Works Committee on June 26, 1973. Mr. John McKinley, President of Texaco, cited the Octel filter, developed by Associated Octel Company as effective in reducing pollution. However, questions had been raised about the effects of pollution filters on aspects of vehicle performance, including noise levels. He referred to data presented in the datafile associated with this story as evidence that the Octel filter was was at least as good as a standard silencer in controlling vehicle noise levels.

kable.nice(head(airfilters))
Noise Size Filter Side
810 Small Standard Right
820 Small Standard Right
820 Small Standard Right
840 Medium Standard Right
840 Medium Standard Right
845 Medium Standard Right
airfilters$Size <- factor(airfilters$Size,
                  levels = unique(airfilters$Size),
                  ordered = TRUE)
plt1 <- ggplot(data=airfilters, aes(Size, Noise)) +
  geom_boxplot()
plt2 <- ggplot(data=airfilters, aes(Filter, Noise)) +
  geom_boxplot()
plt3 <- ggplot(data=airfilters, aes(Side, Noise)) +
  geom_boxplot()
pushViewport(viewport(layout = grid.layout(2, 2)))
print(plt1, 
  vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(plt2, 
  vp=viewport(layout.pos.row=1, layout.pos.col=2))
print(plt3, 
  vp=viewport(layout.pos.row=2, layout.pos.col=1))

it seems large cars are more quiet. Not much of an effect due to either side or filter.

plt1 <- ggplot(data = airfilters,
       aes(Size , Noise, 
           colour = Filter, group=Filter)) +
      stat_summary(fun.y=mean, geom="point")+
      stat_summary(fun.y=mean, geom="line")
plt2 <- ggplot(data = airfilters,
       aes(Size , Noise, 
           colour = Side, group=Side)) +
      stat_summary(fun.y=mean, geom="point")+
      stat_summary(fun.y=mean, geom="line")
plt3 <- ggplot(data = airfilters,
       aes(Side , Noise, 
           colour = Filter, group=Filter)) +
      stat_summary(fun.y=mean, geom="point")+
      stat_summary(fun.y=mean, geom="line")
pushViewport(viewport(layout = grid.layout(2, 2)))
print(plt1, 
  vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(plt2, 
  vp=viewport(layout.pos.row=1, layout.pos.col=2))
print(plt3, 
  vp=viewport(layout.pos.row=2, layout.pos.col=1))

a possible interaction between Filter and Side

fit <- aov(Noise~.^3, data=airfilters)
summary(fit)
##                  Df Sum Sq Mean Sq F value   Pr(>F)
## Size              2  26051   13026 893.190  < 2e-16
## Filter            1   1056    1056  72.429 1.04e-08
## Side              1      1       1   0.048 0.829104
## Size:Filter       2    804     402  27.571 6.05e-07
## Size:Side         2   1293     647  44.333 8.73e-09
## Filter:Side       1     17      17   1.190 0.286067
## Size:Filter:Side  2    301     151  10.333 0.000579
## Residuals        24    350      15

the three-way interaction is significant (p=0.000579), so we can not simplify this model.

The main question here is whether there is a difference between the filters, and the answer is yeas (p=0.000). Because Filter has only two values a multiple comparison is not necessary.