Nonlinear Regression

We have previously discussed polynomial regression, which is one way beyond linear regression. In this section we discuss two others.

Transformations

It is often possible to turn a non-linear model into a linear one via a transformation. Say for example that we want to fit an exponential model of the form \(y=ae^{bx}\), then

\[\log y = \log ae^{bx} = \log a +bx\]

so this is equivalent to a linear model in \(\log y\).

Example (8.4.1)

In Westchester County, north of New York City, Consolidated Edison bills residential customers for electricity on a monthly basis. The company wants to predict residential usage, in order to plan purchases of fuel and budget revenue flow. The data includes information on usage (in kilowatt-hours per day) and average monthly temperature for 55 consecutive months for an all-electric home. Data on consumption of electricity and the temperature in Westchester County, NY.

attach(elusage) 
head(elusage)
##   Month Year  Usage Temperature
## 1     8 1989 24.828          73
## 2     9 1989 24.688          67
## 3    10 1989 19.310          57
## 4    11 1989 59.706          43
## 5    12 1989 99.667          26
## 6     1 1990 49.333          41
fitlin=lm(Usage~Temperature, data=elusage)
df <- data.frame(Residuals=resid(fitlin), 
            Fits = fitted(fitlin))
ggplot(data=df, aes(Fits, Residuals)) +
            geom_point() +
            geom_hline(yintercept = 0)        

so the residual vs fits plot shows a clear pattern, so a linear model is not good. Now

fitlogy=lm(log(Usage)~Temperature, data=elusage)
df <- data.frame(Residuals=resid(fitlogy), 
            Fits = fitted(fitlogy))
ggplot(data=df, aes(Fits, Residuals)) +
            geom_point() +
            geom_hline(yintercept = 0)        

is much better

exp(coef(fitlogy)[1])
## (Intercept) 
##     207.513
coef(fitlogy)[2]
## Temperature 
## -0.03187381

shows that the model is

\[y=207.5e^{-0.0319x}\]

which looks like this:

x=seq(min(elusage$Temperature), max(elusage$Temperature), length=100)
y=exp(coef(fitlogy)[1])*exp(coef(fitlogy)[2]*x)
df=data.frame(x=x,y=y)
ggplot(data=elusage, aes(Temperature, Usage)) +
  geom_point() +
  geom_line(aes(x, y), data=df, col="blue", size=1.2)

There are issues when finding confidence intervals and doing hypothesis tests, but we won’t discuss these here.

Nonlinear Parametric Models

Sometimes the model we wish to fit is known, up to parameters. Generally that is the case if there is a scientific theory that predicts the shape of the relationship. For example, radioactive decay is known to be exponential: \(y=\alpha e^{-\beta t}\)

Example (8.4.2)

Growth of Lobsters

Data from an experiment to raise Florida lobster in a controlled environment. The data shows the overall length and the age of a certain species of lobster.

kable.nice(lobster[1:10, ])
Time Length
1 14 59
2 22 92
3 28 131
4 35 175
5 40 215
6 50 275
7 56 289
8 63 269
9 71 395
10 77 434
ggplot(data=lobster, aes(Time, Length)) +
  geom_point()

Now biology suggests that the relationship should be of the form

\[ y=\frac{\beta_2}{1+(\beta_2 - \beta_0)/\beta_0 \exp (\beta_1 t)} + \epsilon \] where

  • \(\beta_0\) is the expected value of y at time t=0
  • \(\beta_1\) is a measure of the growth rate
  • \(\beta_2\) is the expected limit of y as \(t\rightarrow \infty\)

This is often called the logistic or autocatalytic model

Previously we have always used the method of least squares to estimate the parameters in our models, that is we minimized the “figure of merit”

\[ \text{RSS} = \sum (y_i - \beta_0 - \beta_1 x_i)^2 \]

the natural extension of this is to use

\[ \text{RSS} = \sum (y_i - f(x_i; \boldsymbol{\beta} ))^2 \]

now for a linear model minimizing this expression could be done with lm. This however is still a minimization problem, and we can do it with

fit <- nls(Length ~ beta[3]/(1 + ((beta[3] -
          beta[1])/beta[1]) *  exp(beta[2] * Time)), 
          start = list(beta = c(10, -0.1, 500)), 
          data = lobster)
summary(fit)
## 
## Formula: Length ~ beta[3]/(1 + ((beta[3] - beta[1])/beta[1]) * exp(beta[2] * 
##     Time))
## 
## Parameters:
##         Estimate Std. Error t value Pr(>|t|)
## beta1  32.008757   6.755720   4.738 0.000164
## beta2  -0.057557   0.004957 -11.612 8.55e-10
## beta3 465.884778   8.340739  55.857  < 2e-16
## 
## Residual standard error: 21.63 on 18 degrees of freedom
## 
## Number of iterations to convergence: 7 
## Achieved convergence tolerance: 7.722e-06
x <- seq(10, 160, 1)
df <- data.frame(x=x, 
                 y = predict(fit, 
                  newdata = data.frame(Time = x)))
 ggplot(data=lobster, aes(Time, Length)) +
  geom_point() +
  geom_line(data=df, aes(x, y), color="blue")

Example (8.4.3)

Prime Number Theorem

That there were infinitely many prime numbers was first proven by the Greek mathematician Euclid at around 300BC. A serious study of how fast they grow was first begun by Adrienne-Marie Legendre. He studied the function N(k), which gives the number of primes less or equal to k. We can do the same. The primes up to 1,000,000 are available at

primes <- scan("C://Users//Wolfgang//dropbox//teaching//Esma-6616//primes.txt")
primes <- as.integer(primes)
length(primes)
## [1] 78498
matrix(primes[1:100], ncol=10, byrow = TRUE)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    2    3    5    7   11   13   17   19   23    29
##  [2,]   31   37   41   43   47   53   59   61   67    71
##  [3,]   73   79   83   89   97  101  103  107  109   113
##  [4,]  127  131  137  139  149  151  157  163  167   173
##  [5,]  179  181  191  193  197  199  211  223  227   229
##  [6,]  233  239  241  251  257  263  269  271  277   281
##  [7,]  283  293  307  311  313  317  331  337  347   349
##  [8,]  353  359  367  373  379  383  389  397  401   409
##  [9,]  419  421  431  433  439  443  449  457  461   463
## [10,]  467  479  487  491  499  503  509  521  523   541
kable.nice(matrix(primes[1:100], ncol=10, byrow = TRUE))
## Error in sprintf("   <td%s> %s </td>", align, z): arguments cannot be recycled to the same length

A detailed study of these primes led Legendre in 1798 to propose the function

\[ N(k)=k/(\log k - \alpha) \] Here is what that looks like for several values of \(\alpha\):

N <- function(k, alpha) {
  k/(log(k)-alpha)
}
k <- seq(1000, 1e6, length=250)
exact.counts <- k
for(i in 1:250) 
  exact.counts[i] <- sum(primes<k[i])
df <- data.frame(N=c(k, k, k, k), 
      Counts=c(exact.counts, N(k, 0), N(k, 1), N(k, 2)),
      Method=rep(c("Counts", "a=0", "a=1", "a=2"),
                 each=250))
ggplot(df, aes(N, Counts, color=Method)) +
  geom_line()

and so it seems a value of \(\alpha=1\) is good.

Legendre however was not satisfied with that, he wanted to find the optimal answer. So he found the least squares solution!

fit <- nls(exact.counts ~ k/(log(k) - alpha), 
          start = list(alpha = 0))
coef(fit)
##    alpha 
## 1.082001

and so he claimed that

\[ N(k)=k/(\log k - 1.08) \]

Around the same time German mathematician Carl Friedrich Gauss also looked at this problem, and he made a different conjecture. He said

\[ N(k)=k/\log k \] That was a rather strange guess, because it looks like this:

N <- function(k, alpha) {
  k/(log(k)-alpha)
}
k <- seq(1000, 1e6, length=250)
exact.counts <- k
for(i in 1:250) 
  exact.counts[i] <- sum(primes<k[i])
df <- data.frame(N=c(k, k), 
      Counts=c(exact.counts, N(k, 0)),
      Method=rep(c("Counts", "Gauss"),   each=250))
ggplot(df, aes(N, Counts, color=Method)) +
  geom_line()

and it surely looks like the two curves are growing further apart. However, almost 100 years later in 1896 the French mathematicians Jacques-Salomon Hadamard and Charles de la Valée Poussin independently showed that Gauss was right!

From our modern point of view we might say Legendre was guilty of over-fitting!