Confidence and Prediction Intervals

Confidence Intervals for \(\beta_j\)’s

In (6.6.14) we found a test for \(H_0:\beta_j=0\). Subtracting \(\beta_j\) and inverting this test we find

Theorem (6.8.1)

A \((1-\alpha)100\%\) confidence interval for \(\beta_j\) is given by

\[\hat{\beta}_j\pm t_{\alpha/2, n-k-1}s\sqrt{g_{jj}}\]

proof From In (6.6.14) we have

\[P\left[ -t_{\alpha/2, n-k-1}< \frac{\hat{\beta}_j-\beta_j}{s\sqrt{g_{jj}}} <t_{\alpha/2, n-k-1} \right]=1-\alpha\]

Solving the corresponding equations yields the result.

Example (6.8.2)

Let’s find 90% confidence intervals for the houseprice data:

A=as.matrix(houseprice)
n=nrow(A)
k=ncol(A)-1
X=cbind(1, A[ ,-1])
y=A[, 1, drop=FALSE]
G=solve(t(X)%*%X)  
betahat= (solve(t(X)%*%X)%*%t(X))%*%y
sse = t(y)%*%(diag(n)-X%*%G%*%t(X))%*%y/(n-k-1)
crit=qt(1-(1-0.9)/2, n-k-1)
Low=c(betahat-crit*sqrt(sse*diag(G)))
High=c(betahat+crit*sqrt(sse*diag(G)))
CI= round(cbind(Low, High), 3)
rownames(CI) = c("Intercept", colnames(A)[-1])
CI

##               Low    High
## Intercept -97.969 -37.270
## Sqfeet      0.067   0.104
## Floors    -42.757 -10.229
## Bedrooms  -20.992   2.419
## Baths      16.361  58.400

Warning again we have issue of simultaneous inference: the 90% coverage applies to each interval individually, not to the collection of intervals.

Theorem (6.8.3)

A \((1-\alpha)100\%\) confidence interval for \(\pmb{a'\beta}\ne 0\) is given by

\[\pmb{a'\hat{\beta}}\pm t_{\alpha/2, n-k-1}s\sqrt{\pmb{a'(X'X)^{-1}a}}\]

proof similar to proof of (6.8.1)

Confidence Intervals for \(E[\pmb{y}]\)

Let \(\pmb{x}_0=(1\text{ } x_{01}\text{ }...\text{ }x_{0k})'\) be some point, not necessarily a point in \(\pmb{X}\). A prediction of the response at that point is given by

\[\hat{y}_0 = \pmb{x_0\hat{\beta}}\]

Corollary (6.8.4)

A \((1-\alpha)100\%\) confidence interval for \(E[\hat{y}]\) is given by

\[\pmb{x_0'\hat{\beta}}\pm t_{\alpha/2, n-k-1}s\sqrt{\pmb{x_0'(X'X)^{-1}x_0}}\]

Example (6.8.5)

Let’s find a 95% confidence interval for the average price of a two-story house with 2500 sqfeet, 3 bedrooms and 2 baths.

x0=rbind(1, 2500, 2, 3, 2)
crit=qt(1-(1-0.95)/2, n-k-1)
tmp=crit*sqrt(sse*t(x0)%*%solve(t(X)%*%X)%*%x0)
round(t(x0)%*%betahat+c(-1, 1)*tmp, 1)

## [1] 124.4 156.8

we can also use R:

fit=lm(Price~., data=houseprice)
newx=data.frame(Sqfeet=2500, Floors=2, Bedrooms=3, Baths=2)
predict(fit, newdata=newx, interval="confidence")

##        fit      lwr      upr
## 1 140.5673 124.3619 156.7727

Example (6.8.6)

Let’s consider the case of simple regression. Here we have previously found

\[(\pmb{X'X})^{-1} = \frac1{n\sum_i (x_{i}-\bar{x})^2} \begin{pmatrix} \sum_i x_{i}^2 & -\sum_i x_{i} \\ -\sum_i x_{i} & n \end{pmatrix}\]

and so if \(\pmb{x}_0 = (1\text{ }x_0)'\)

\[ \begin{aligned} &\pmb{x_0'}(\pmb{X'X})^{-1}\pmb{x_0} =\\ & \begin{pmatrix} 1 & x_0 \end{pmatrix} \frac1{n\sum_i (x_{i}-\bar{x})^2} \begin{pmatrix} \sum_i x_{i}^2 & -\sum_i x_{i} \\ -\sum_i x_{i} & n \end{pmatrix} \begin{pmatrix} 1 \\ x_0 \end{pmatrix} = \\ &\frac1{n\sum_i (x_{i}-\bar{x})^2} \begin{pmatrix} 1 & x_0 \end{pmatrix} \begin{pmatrix} \sum_i x_{i}^2 - x_0\sum_i x_{i} \\ -\sum_i x_{i} + nx_0 \end{pmatrix} = \\ &\frac1{n\sum_i (x_{i}-\bar{x})^2} \left[\sum_i x_{i}^2 - x_0\sum_i x_{i} + nx_0^2 -x_0\sum_i x_{i}\right]=\\ &\frac1{n\sum_i x_{i}^2-(\sum_i x_{i})^2} \left[\sum_i x_{i}^2 - 2x_0\sum_i x_{i} + nx_0^2 \right]=\\ &\frac1{n\sum_i (x_{i}-\bar{x})^2} \sum_i\left[ x_{i}^2 - 2x_0 x_{i} + x_0^2 \right]=\\ &\frac1{n\sum_i (x_{i}-\bar{x})^2} \sum_i\left( x_{i} - x_0 \right)^2=\\ &\frac1{n\sum_i (x_{i}-\bar{x})^2} \sum_i\left(x_0-\bar{x}+\bar{x}- x_{i} \right)^2=\\ &\frac1{n\sum_i (x_{i}-\bar{x})^2} \left(n(x_0-\bar{x})^2+ \sum_i(\bar{x}- x_{i})^2 \right)=\\ &\frac1n+\frac{(x_0-\bar{x})^2}{\sum_i (x_{i}-\bar{x})^2} \end{aligned} \]

This shows that the width of the interval increases with the distance of x₀ to \(\bar{x}\).

Prediction Intervals for Future Observations

Another type of problem is to predict a future observation, not part of the current data set. This is called a prediction problem, and we use the term prediction interval. Because the future observation is independent from the data we find

\[ \begin{aligned} &var(y_0-\hat{y}_0) = \\ &var(\pmb{x'_0\beta}+\pmb{\epsilon}_0-\pmb{x'_0\hat{\beta}}_0) = \\ &var(\pmb{\epsilon}_0-\pmb{x'_0\hat{\beta}}_0) = \\ &var(\pmb{\epsilon}_0)+var(\pmb{x'_0\hat{\beta}}_0) = \\ &\sigma^2+\sigma^2\pmb{x_0'}(\pmb{X'X})^{-1}\pmb{x_0} = \\ &\sigma^2\left[1+\pmb{x_0'}(\pmb{X'X})^{-1}\pmb{x_0}\right] \end{aligned} \]

Theorem (6.8.7)

A \((1-\alpha)100\%\) prediction interval of \(y\) at the point \(\pmb{x}_0\) is given by

\[\pmb{x_0'\hat{\beta}}\pm t_{\alpha/2, n-k-1}s\sqrt{1+\pmb{x_0'(X'X)^{-1}x_0}}\]

proof omitted

Example (6.8.8)

Let’s find a 95% prediction interval for the of a two-story house with 2500 sqfeet, 3 bedrooms and 2 baths.

x0=rbind(1, 2500, 2, 3, 2)
crit=qt(1-(1-0.95)/2, n-k-1)
tmp=crit*sqrt(sse*(1+t(x0)%*%solve(t(X)%*%X)%*%x0))
round(t(x0)%*%betahat+c(-1, 1)*tmp, 1)

## [1] 107.9 173.2

fit=lm(Price~., data=houseprice)
newx=data.frame(Sqfeet=2500, Floors=2, Bedrooms=3, Baths=2)
predict(fit, newdata=newx, interval="predict")

##        fit      lwr      upr
## 1 140.5673 107.9064 173.2282