Reference: Loven, Faith. (1981). A Study of the Interlist Equivalency of the CID W-22 Word List Presented in Quiet and in Noise. Unpublished MS Thesis, University of Iowa.
Description: Percent of a Standard 50-word list heard correctly in the presence of background noise. 24 subjects with normal hearing listened to standard audiology tapes of English words at low volume with a noisy background. They repeated the words and were scored correct or incorrect in their perception of the words. The order of list presentation was randomized.
The word lists are standard audiology tools for assessing hearing. They are calibrated to be equally difficult to perceive. However, the original calibration was performed with normal-hearing subjects and no noise background. The experimenter wished to determine whether the lists were still equally difficult to understand in the presence of a noisy background.
head(hearingaid)
## Subject List Score
## 1 1 1 28
## 2 2 1 24
## 3 3 1 32
## 4 4 1 30
## 5 5 1 34
## 6 6 1 30
Notice that the values in both Subject and List are NOT numbers but labels, so both of them are categorical!
As long as we have one quantitative response (Score) and all the predictors (factors) are categorical (Subject, List) this is still an ANOVA problem, now called a twoway ANOVA More specifically, this is a Randomized Block design with List as the factor and Subject as a blocking variable.
attach(hearingaid)
bplot(Score, List, new_order = "Size")
bplot(Score, Subject, new_order = "Size")
The summary statistics are:
stat.table(Score, List, Sort = TRUE)
## Sample Size Mean Standard Deviation
## 3 24 25.2 8.3
## 4 24 25.6 7.8
## 2 24 29.7 8.1
## 1 24 32.8 7.4
Because Subject is the blocking variable one would normally not include a table of summary statistics.
Now for the test, or better tests, because we can in general test for either Subject or List. The routine we will use is called twoway:
twoway(Score, List, Subject)
## No repeated measurement! Interaction term can not be included
## List p = 0.000
## Subject p = 0.000
So we have two tests, one for List and one for Subject. However, only the one for List is of interest:
As always we need to check the assumptions. The normal plot of residuals looks fine.
Next the equal variance. In a oneway ANOVA we could just find the group standard deviations and compare them. Now (and in general if there are more than 1 factor) this is no longer a good idea mainy because there are to many factor level combinations (4*24 here ) and not enough observations for each (1 here). Instead we will do the same as in the regression case, namely check the residual vs. fits plot for a change in spread from left to right.
again, everything looks fine.
Notice that the ANOVA table also has the test for the Subject means. This is not very interesting, the boxplot already makes it clear that different subjects have very different hearing abilities. If that were not so, we would eliminate Subject and run a oneway ANOVA. Because we now have two factors, we need to worry about an additional problem, namely whether or not there is a relationship between the two factors. This is called
To do so we will check the interaction plot
An interaction plot looks as follows:
Here the line segments are allmost parallel. This implies that for any value of the factor A going from one value of B to the next adds the same amount to the response. So if we go from B=1 to B=2 both lines move up by about 2.0, and if we go from B=2 to B=3 both lines move down by 0.75.
Because of this we call such a model additive
Now consider the following interactions plot:
Here as we go from B=2 to B=3 the line goes up by 4 if A=1 but it goes down by 0.5 if A=1.
Here is another way of understanding the difference: Say you are told that you have an additive model and the following information:
Low | High | |
---|---|---|
In | 2.3 | 3.1 |
Out | 2.7 | ? |
Can we make a guess for the response if Factor 1 = “high” and Factor 2 = “out”? We see that if Factor 2 = “in” and going from “low” to “high” the response goes up by 0.4 (=2.7-2.3). In an additive model that means the response should go up the same amount for Factor 2 = “out”, that is it should go to 3.5 (=3.1+0.4).
But if there were interaction there would be no way to make any guess at all!
Deciding from the graph whether or not there is interaction is not always easy. Here are four interaction plots from a simulated data set, all guaranteed NOT to have any interaction:
This is even worse because in ANOVA problems we often have very small data sets, so there is a great amount of variation in these graphs from sample to sample.
So it would be nice if we could actually test for interaction, but that requires repeated measurements.
In the hearing aid data we only have one observation for each combination of Subject and List, so we need to decide on the basis of the interaction plot:
iplot(Score, List, Subject )
There seems to be interaction between Lists and Subjects
Finally, it would of course be interesting to study just which lists are different, that is we could do a multiple comparison:
tukey(Score, List, Subject, which="first")
## No repeated measurement!
## Interaction term can not be included
##
## Groups that are statistically significantly different:
## Groups p.value
## 1 1-3 0
## 2 1-4 0
so List 1 is statistically sigifcantly different from Lists 3 and 4.
No other differences are statistically significant.
Because Subject is only a blocking variable we would not due a multiple comparison for it if we wanted to we would use the command
tukey(Score, List, Subject, which="second")
## No repeated measurement!
## Interaction term can not be included
##
## Groups that are statistically significantly different:
## Groups p.value
## 1 15-21 0.0394
## 2 14-21 0.0063
## 3 9-21 0.0042
## 4 19-21 0.0012
## 5 24-21 0.0000
## 6 14-18 0.0394
## 7 9-18 0.0279
## 8 19-18 0.0093
## 9 24-18 0.0019
## 10 19-4 0.0195
## 11 24-4 0.0042
## 12 19-20 0.0394
## 13 24-20 0.0093
## 14 24-22 0.0135
In an experiment to study gas milage four different blends of gasoline are tested in each of three makes of automobiles. The cars are driven a fixed distance to determine the mpg (miles per gallon) The experiment is repeated three times for each blend-automobile combination. (Taken from Lyman Ott)
Note that the interest here is indifferent gasoline blends, automobile is a blocking variable, so this is a randomized block design.
Gasoline is numbers, but these are just codes for different blends, so it is a categorical variable or factor.
attach(gasoline)
head(gasoline)
## MPG Gasoline Automobile
## 1 22.7 1 A
## 2 22.4 1 A
## 3 22.9 1 A
## 4 21.5 2 A
## 5 21.8 2 A
## 6 21.6 2 A
Here is an interesting calculation:
table(Gasoline, Automobile)
## Automobile
## Gasoline A B C
## 1 3 3 3
## 2 3 3 3
## 3 3 3 3
## 4 3 3 3
This shows us two things:
we have repeated measurements (several observations per factor-level combination)
we have a balanced design (the same number of repetitions in each factor-level combination)
This second feature used to be quite important because the calculations in a balanced design are much simpler. Nowadays with fast computers this is not important anymore. There are still good reasons why you want to design your experiment to have a balanced design if possible, though!
bplot(MPG, Gasoline, new_order = "Size")
bplot(MPG, Automobile, new_order = "Size")
the boxplots suggest a difference between blends but not between automobiles.
The summary statistics are
stat.table(MPG, Gasoline, Sort = TRUE)
## Sample Size Mean Standard Deviation
## 4 9 20.5 0.4
## 2 9 21.2 0.4
## 3 9 21.9 0.2
## 1 9 22.8 0.4
stat.table(MPG, Automobile, Sort = TRUE)
## Sample Size Mean Standard Deviation
## C 12 21.4 0.9
## B 12 21.7 1.0
## A 12 21.7 0.8
Interaction:
iplot(MPG, Gasoline, Automobile)
Lines are (almost) parallel, so there is no indication of interaction. We have repeated measurements (3 per factor-level combination), so we can test for this:
twoway(MPG, Gasoline, Automobile)
## Gasoline p = 0.000
## Automobile p = 0.0799
## Interaction p = 0.1854
So we will now proceed without the interaction term:
twoway(MPG, Gasoline, Automobile, with.interaction=FALSE)
## Gasoline p = 0.000
## Automobile p = 0.0987
the plots look fine, so no problem with the assumptions.
Now let’s test for the factors:
Test for Factor Gasoline:
Test for Factor Automobile is not really needed because this is a blocking variable.
Notice that if we included the interaction the p-value for Automobile was 0.08, without the interaction it is 0.1. One advantage of being able to fit an additive model is that often it makes the conclusions stronger.
Because automobile is not significant and there is no interaction, we can drop automobile from the analysis and treat this as a oneway ANOVA problem:
tukey(MPG, Gasoline)
## Groups that are statistically significantly different:
## Groups p.value
## 1 2-4 0
## 2 3-4 0
## 3 1-4 0
## 4 3-2 0
## 5 1-2 0
## 6 1-3 0
so all blends are stat. significantly different, with blend 1 having the highest miles per gallon.
Chemical vapor deposition is a process used in the semiconductor industry to deposit thin films of silicon dioxide and photoresit on substrates of wafers as they are manufactured. The films must be as thin as possible and have a uniform thickness, which is measured by a process called infrared interference. A process engineer wants to evaluate a low-pressure chemical vapor deposition process that reduces costs and increases productivity. The engineer has set up an experiment to study the effect of chamber temperature and pressure on film thickness.
attach(filmcoatings)
filmcoatings
## Thickness Temperature Pressure
## 1 42 Low Low
## 2 43 Low Low
## 3 39 Low Low
## 4 45 Low Mid
## 5 43 Low Mid
## 6 45 Low Mid
## 7 45 Low High
## 8 44 Low High
## 9 47 Low High
## 10 36 Mid Low
## 11 34 Mid Low
## 12 37 Mid Low
## 13 39 Mid Mid
## 14 39 Mid Mid
## 15 37 Mid Mid
## 16 40 Mid High
## 17 42 Mid High
## 18 38 Mid High
## 19 38 High Low
## 20 37 High Low
## 21 37 High Low
## 22 35 High Mid
## 23 36 High Mid
## 24 33 High Mid
## 25 40 High High
## 26 41 High High
## 27 42 High High
table(Temperature, Pressure)
## Pressure
## Temperature High Low Mid
## High 3 3 3
## Low 3 3 3
## Mid 3 3 3
so again we have balanced design with repetead measurements
bplot(Thickness, Temperature)
Notice that the order of the boxes is strange: High-Low-Mid. This is because R uses alphabetic ordering unless told otherise. Let’s change that:
Temperature <- change.order(Temperature, c("Low","Mid","High"))
Pressure <- change.order(Pressure, c("Low","Mid","High"))
bplot(Thickness, Temperature)
bplot(Thickness, Pressure)
Unlike in the hearing aid or gasoline experiments, here we equally interested in both factors. This type of experiment is called a factorial design problem.
For us there is no practical difference between a randomized block design and a factorial design but the distinction can be important in other analyses.
stat.table(Thickness, Temperature)
## Sample Size Mean Standard Deviation
## Low 9 43.7 2.3
## Mid 9 38.0 2.3
## High 9 37.7 2.9
stat.table(Thickness, Pressure)
## Sample Size Mean Standard Deviation
## Low 9 38.1 2.8
## Mid 9 39.1 4.4
## High 9 42.1 2.8
Interaction
iplot(Thickness, Temperature, Pressure)
The lines are not all parallel, so there is likely some interaction. Again we have repeated measurements (3 per factor-level combination), so we can actually test for this:
twoway(Thickness, Temperature, Pressure)
## Temperature p = 0.000
## Pressure p = 0.000
## Interaction p = 0.0124
the graphs show that there are no problems with the assumptions.
Test for Interaction:
Test for Factor Temperature:
Test for Factor Pressure:
Finally, what we need is to find the best combination of pressure and temperature. So what we want is a multiple comparison for Temperature and Pressure (not either of them alone!). Easily done:
tukey(Thickness, Temperature, Pressure, which="interaction")
## Groups that are statistically significantly different:
## Groups p.value
## 1 Low:Mid-High:Low 0.0000
## 2 Low:High-High:Low 0.0000
## 3 Low:Low-Mid:Low 0.0041
## 4 Low:Mid-Mid:Low 0.0000
## 5 High:High-Mid:Low 0.0072
## 6 Mid:High-Mid:Low 0.0397
## 7 Low:High-Mid:Low 0.0000
## 8 High:Mid-Low:Low 0.0000
## 9 Low:Mid-High:Mid 0.0000
## 10 High:High-High:Mid 0.0013
## 11 Mid:High-High:Mid 0.0072
## 12 Low:High-High:Mid 0.0000
## 13 Low:Mid-Mid:Mid 0.0023
## 14 Low:High-Mid:Mid 0.0000
## 15 Mid:High-Low:Mid 0.0397
## 16 Low:High-High:High 0.0397
## 17 Low:High-Mid:High 0.0072
This is bit hard to read. In the past we have used the tapply command to sort the groups by their means. We want to do the same here but first we need to make a new variable that combines the Temperature and the Pressure:
TP <- paste0(Temperature,"-",Pressure)
TP
## [1] "Low-Low" "Low-Low" "Low-Low" "Low-Mid" "Low-Mid"
## [6] "Low-Mid" "Low-High" "Low-High" "Low-High" "Mid-Low"
## [11] "Mid-Low" "Mid-Low" "Mid-Mid" "Mid-Mid" "Mid-Mid"
## [16] "Mid-High" "Mid-High" "Mid-High" "High-Low" "High-Low"
## [21] "High-Low" "High-Mid" "High-Mid" "High-Mid" "High-High"
## [26] "High-High" "High-High"
sort(round(tapply(Thickness,TP,mean), 1))
## High-Mid Mid-Low High-Low Mid-Mid Mid-High High-High Low-Low
## 34.7 35.7 37.3 38.3 40.0 41.0 41.3
## Low-Mid Low-High
## 44.3 45.3
so we see that that the combination Temperature=High, Pressure=Mid is best.
But tukey tells us that it is not stat. significantly better than either of the next three combinations ( Mid Low, High Low or Mid Mid), at least not at these sample sizes.
Remember we made some new variables. If we are sure we won’t need them anymore we should
rm(Temperature)
rm(Pressure)
rm(TP)
A simple idea for solving this problem seems to be this one:
sort(round(tapply(Thickness, Temperature, mean), 1))
## High Mid Low
## 37.7 38.0 43.7
so Temperature=High is best
sort(round(tapply(Thickness, Pressure, mean), 1))
## Low Mid High
## 38.1 39.1 42.1
so Pressure=Low is best
This simple idea does not work because of the presence of interaction.
The effects of mining and rock type on water quality.
attach(mines)
head(mines)
## Rock Mine Iron
## 1 Sandstone Unmined 0.20
## 2 Sandstone Unmined 0.25
## 3 Sandstone Unmined 0.04
## 4 Sandstone Unmined 0.06
## 5 Sandstone Unmined 1.20
## 6 Sandstone Unmined 0.30
table(Rock, Mine)
## Mine
## Rock Abandoned Reclaimed Unmined
## Limestone 13 13 13
## Sandstone 13 13 13
bplot(Iron, Rock, new_order = "Size")
bplot(Iron, Mine)
We have a clear problem with the normal assumption, so use the log transform
bplot(log(Iron), Rock, new_order = "Size")
bplot(log(Iron), Mine)
This has solved the problem, so the analysis will be based on log(Iron)
Summary Statistics
Because we use a transformation we will base the tables on Median and IQR
stat.table(Iron, Rock, Mean=FALSE, Sort = TRUE)
## Sample Size Median IQR
## Sandstone 39 0.4 1.4
## Limestone 39 1.3 3.6
stat.table(Iron, Mine, Mean=FALSE)
## Sample Size Median IQR
## Unmined 26 0.5 2.7
## Reclaimed 26 0.7 0.8
## Abandoned 26 1.6 10.2
Note that the IQR’s are very different. This is because this data set has a lot of outliers which still effect the IQR.
Interaction
iplot(log(Iron), Rock, Mine)
There seems to be some interaction. To confirm this test for it:
twoway(log(Iron), Rock, Mine)
## Rock p = 0.0348
## Mine p = 0.000
## Interaction p = 0.000
assumptions are ok (after log transform!)
Test for Interaction:
Test for Factor Rock:
Test for Factor Mine:
Multiple Comparison The main interest is in mines, so
tukey(log(Iron), Rock, Mine, which="second")
## Groups that are statistically significantly different:
## Groups p.value
## 1 Abandoned-Reclaimed 0
## 2 Abandoned-Unmined 0
Interpretation: There is a stat. signif. difference between the mean iron content of abondoned mines and the others. The difference between unmined and reclaimed mines is not stat. sign, at least not at these sample sizes.