Nonlinear Regression Models

Case Study: Fabric Wear

Results from an experiment designed to determine how much the speed of a washing machine effects the wear on a new fabric. The machine was run at 5 different speeds (measured in rpm) and with six pieces of fabric each.

head(fabricwear)
##   Speed Wear
## 1   110 24.9
## 2   110 24.8
## 3   110 25.1
## 4   110 26.4
## 5   110 27.0
## 6   110 26.6

The scatterplot of wear by speed shows a strong but non-linear relationship:

attach(fabricwear)
splot(Wear, Speed, add.line=1)

How strong is a difficult question, because Pearson’s correlation coefficient won’t work here. If we tried slr we would see in the residual vs fits plot that there is a problem with the assumption of a linear model:

slr(Wear, Speed)

## The least squares regression equation is: 
##  Wear  = -6.947 + 0.274 Speed 
## R^2 = 88.58%

So the question is: how do fit models other than straight lines?

There are two basic things we can try. The first is something we have already done, namely the log transformation

splot(Wear, log(Speed), add.line=1)

splot(log(Wear), log(Speed), add.line=1)

splot(log(Wear), Speed, add.line=1)

unfortunately non of these looks very good

Some of these have names:

  • log(y) vs. x is called an exponential model

  • log(y) vs. log(x) is called a power model

The other solution to our problem is to fit a Polynomial Model:

Linear \(y=\beta_0+\beta_1 x\)

Quadratic \(y=\beta_0+\beta_1 x+\beta_2 x^2\)

Cubic \(y=\beta_0+\beta_1 x+\beta_2 x^2+\beta_3 x^3\)

and so on

How do we fit such a model? We can simply use the same routine with the extra argument polydeg=…. For example for the quadratic model we do

slr(Wear, Speed, polydeg=2)

## The least squares regression equation is: 
##  Wear  = 71.199 - 0.807 Speed +0.004Speed^2 
## R^2 = 97.17%

What does such a curve look like? To draw the fitted line plot, that is the scatterplot with the fitted curve, just use

flplot(Wear, Speed, polydeg=2)

This routine also does the log transform models:

flplot(Wear, Speed, logy=TRUE)  

Similarly use flplot(Wear, Speed, logx=TRUE) or flplot(Wear, Speed, logx=TRUE, logy=TRUE) for the other log transforms

Note There are two big differences in the way transformations and polynomial models work:

  • if we do a transformation we replace an old variable with a new one, if we do a polynomial model we add a new predictor to the model.

  • we might transform the response, but a polynomial model is always a polynomial in the predictor, never the response.

Mathematical Features of these Models

What “shapes” can we fit with these models?

  • Transformations might work if the relationship between x and y is monotone, that is in the scatterplot the dots either go up or down but never turn around.

  • Polynomial models usually do turn around, quadratic models once, cubic models twice and so on. Sometimes this is not apparent because we only see the graph before the turn-around happens.

Prediction

Again we can use the slr.predict command to do prediction, but there are some things we need to be careful with:

Transformations

if we use a log transformation on the predictor we have to use the log transformation also on the newx:

slr.predict(Wear, log(Speed), newx=log(150))
##  log(Speed)   Fit
##    5.010635 34.82

if we use a log transformation on the response we are getting an estimate of the log of the response. To get back to the original we can do this:

slr(log(Wear), Speed) 

## The least squares regression equation is: 
##  log(Wear)  = 2.335 + 0.008 Speed 
## R^2 = 92.64%

so we have the equation

\[ \log(\text{Wear}) = 2.335 + 0.008 \text{Speed} \]

and now we can get an estimate with

exp(2.335 + 0.008*150)
## [1] 34.29501

All of this works ONLY for point estimation, interval estimation is much harder and needs to be done by an expert!

In contrast, prediction using polynomials works perfectly fine as is:

slr.predict(Wear, Speed, newx=150, polydeg=2, 
            interval="PI", conf.level= 90)
##  Speed   Fit Lower Upper
##    150 31.22 28.64 33.81

If you are not sure that you got the right answer, here is a quick sanity check: draw the scatterplot and do a visual guess of y.

Example: say we want to use the power model and predict the Wear for Speed=150:

slr.predict(log(Wear), log(Speed), newx=150)
##  log(Speed)    Fit
##         150 165.13

Notice that I should have written newx=log(150).

Now if I draw the fitted line plot:

splot(Wear, Speed)

it is clear that if x=150 y should be be around 32 or so, not 165!

So I better try again:

slr.predict(log(Wear), log(Speed), newx=log(150))
##  log(Speed)  Fit
##    5.010635 3.52

and that’s about right because

log(32)
## [1] 3.465736