Non-Normal Residuals, No Equal Variance - Transformations

Categorical - Quantitative

Case Study: Cancer Survival

As we saw before we need a log transform of Survival to have normal residuals and equal variance, and then the ANOVA shows a stat. significant difference between the groups. The only thing left is to do the multiple comparison. The main point here is that this also has to be done on the log transformed data:

attach(cancersurvival)
tukey(log(Survival), Cancer)
## Groups that are statistically significantly different:
##            Groups p.value
## 1 Breast-Bronchus  0.0083
## 2  Breast-Stomach  0.0158

Case Study: Capacity of Wells

Again, we have previously done everything except the multiple comparison:

attach(rocks)
tukey(log(Capacity), Rocks)
## Groups that are statistically significantly different:
##                 Groups p.value
## 1 Dolomite-Metamorphic  0.0105
## 2 Dolomite-Siliclastic  0.0275
## 3   Dolomite-Limestone  0.0434

Interpretation: Dolomite has a stat. significant larger capacity then the other rocks. Other differences are not stat. significant, at least not at these sample sizes.

Case Study: Cultural Differences in Equipment Use

Previously we saw that none of the transformations worked for this dataset, and we ran the Kruskal-Wallis test to see that there are differences in the MTBF by countries. Can we do a multiple comparison, as we did above? The answer is yes, but that goes beyond our discussion in 3102!

Quantitative - Quantitative

Case Study: Brain and Body Weight of 62 Mammals

We saw previously that we needed log transforms of both Brain and Body.

Then we find

attach(brainsize)
slr(log(brain.wt.g), log(body.wt.kg))

## The least squares regression equation is: 
##  log(brain.wt.g)  = 2.121 + 0.746 log(body.wt.kg) 
## R^2 = 91.93%

we see that now all the assumptions are ok. We find the model

\[ \log(\text{brain.wt.g}) = 2.121 + 0.746 \log(\text{body.wt.kg}) \] If we had run the regression without the tranformations this is what the normal plot would have looked like:

slr(brain.wt.g, body.wt.kg)

## The least squares regression equation is: 
##  brain.wt.g  = 89.912 + 0.967 body.wt.kg 
## R^2 = 87.26%

Note transforming the data also has its down-side: it makes understanding the model much harder:

Model in original units: \(\text{brain.wt.g} = 89.9 + 0.967 \text{body.wt.kg}\)

Model in transformed units: \(\log(\text{brain.wt.g}) = 2.121 + 0.746 \log(\text{body.wt.kg})\)

the original model tells us that each extra kg of body weight roughly adds one gram of brain weight but what is the slope of 0.7 in the transformed model telling us?

However, sometimes with a bit of math one can rewrite the model in the original variables. Our log-log model turns out to be the same as the model

\[ \text{brain.wt.g} =8.339 \text{ body.wt.kg}^{0.746} \] but this is not necessarily easier to understand.

Equal Variance

Sometimes a transformation of the response variable can help with this problem as well. Mostly, though, a more complicated method for analysing such a dataset is needed (such as weighted regression)

detach(cancersurvival)
detach(rocks)
detach(brainsize)