Solution to Homework 6

Problem 1

Consider the the following artificial data set and its summary of the lm command:

set.seed(111)
x1=round(sort(runif(100)), 2)
x2=round(sort(runif(100)), 2)
x3=round(sort(runif(100)), 2)
y=round(1+2*x1+3*x2+4*x3+rnorm(100),1)
summary(lm(y~x1+x2+x3))

## 
## Call:
## lm(formula = y ~ x1 + x2 + x3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.35695 -0.51916 -0.05192  0.70102  2.33240 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.6556     0.1810   3.622  0.00047
## x1            1.8731     2.1465   0.873  0.38505
## x2           -4.8388     4.0227  -1.203  0.23199
## x3           12.4945     4.3210   2.892  0.00474
## 
## Residual standard error: 0.8695 on 96 degrees of freedom
## Multiple R-squared:  0.9148, Adjusted R-squared:  0.9121 
## F-statistic: 343.6 on 3 and 96 DF,  p-value: < 2.2e-16

Use R but not the lm command to find the following the numbers in the printout:

X=cbind(1, x1, x2, x3)
y=cbind(y)
XX=t(X)%*%X
XXinv=solve(XX)
betahat=XXinv%*%t(X)%*%y
#coefficients
round(c(betahat), 4)

## [1]  0.6556  1.8731 -4.8388 12.4945

yhat=X%*%betahat
eps=y-yhat
#Residuals:
tmp=quantile(eps, c(0, 0.25, 0.5, 0.75, 1))
tmp[3]=median(eps)
names(tmp)=c("Min", "1Q", "Median", "3Q", "Max")
round(tmp, 5)

##      Min       1Q   Median       3Q      Max 
## -2.35695 -0.51916 -0.05192  0.70102  2.33240

sse=t(eps)%*%eps
#Residual standard error
s=round(c(sqrt(sse/96)),4)
s

## [1] 0.8695

round(sqrt(diag(s^2*XXinv)), 4)

##            x1     x2     x3 
## 0.1810 2.1464 4.0226 4.3208

Problem 2

Say we just found \(\pmb{\hat{\beta}}\) with k predictor variables. Now we would like to use another predictor, together with those already used. If k is large this means we have to invert a \(k+1\times k+1\) matrix. But we just inverted a very similar \(k\times k\) matrix. Show how we can make use of that.

There are several ways to do this. We could use theorem (6.2.4) or:

Let’s denote by \((\pmb{X'X})^{-1}_k\) and \((\pmb{X'X})^{-1}_{k+1}\) the matrices without and with the new variable, then example (4.2.5) we can find

where \(a_{22}\) is a scalar, then is a scalar and

\[ (\pmb{X'X})^{-1}_{k+1} = \frac1{b}\begin{pmatrix} b(\pmb{X'X})^{-1}_k+(\pmb{X'X})^{-1}_ka_{12}a_{12}'(\pmb{X'X})^{-1}_k & -(\pmb{X'X})^{-1}_ka_{12} \\ -a_{12}'(\pmb{X'X})^{-1}_k & 1 \\ \end{pmatrix} \]

where \(\pmb{a}_{12} = \begin{pmatrix} x_{1,k+1} & .. & x_{n-1,k+1} \end{pmatrix}'\), \(a_{22} =x_{n,k+1}\) and \(b=a_{22}-a_{21}'(\pmb{X'X})^{-1}_ka_{12}\).

Let’s check with R:

k=4
X=cbind(1, matrix(rnorm(1000*k), 1000, k))
XX=t(X)%*%X
XXinv=solve(XX)
x=rnorm(1000)
X1=cbind(X,x)
XX1=t(X1)%*%X1
f=function() {
   a12=XX1[-(k+2) ,k+2, drop=FALSE]
   a22=XX1[k+2 ,k+2]
   b=c(a22-t(a12)%*%XXinv%*%a12)
   A11=b*XXinv+XXinv%*%a12%*%t(a12)%*%XXinv
   A12=-XXinv%*%a12
   A21=cbind(t(A12),1)
   A=cbind(A11, A12)
   rbind(A, A21)/b
}
round(solve(XX1)-f(), 10)

##             x
##   0 0 0 0 0 0
##   0 0 0 0 0 0
##   0 0 0 0 0 0
##   0 0 0 0 0 0
##   0 0 0 0 0 0
## x 0 0 0 0 0 0

so this works.

How much faster is it? Let’s see:

library(microbenchmark)
k=100
X=cbind(1, matrix(rnorm(1000*k), 1000, k))
XX=t(X)%*%X
XXinv=solve(XX)
x=rnorm(1000)
X1=cbind(X,x)
XX1=t(X1)%*%X1
microbenchmark(solve(XX1), f())

## Unit: microseconds
##        expr     min      lq      mean  median       uq      max neval cld
##  solve(XX1) 774.601 840.701  904.5861 848.551 893.8515 4513.202   100   a
##         f() 741.201 896.801 1017.4540 907.451 962.5510 8851.101   100   a

and this doesn’t seem to work. However, the R routine solve is really a C++ routine whereas our function f is not. If we were to rewrite f in C++ (using the Rcpp library), it would in fact be much faster.

Problem 3

Consider the following data set:

set.seed(1111)
x1=1:10
x2=sort(round(runif(10,0,10), 1))
y=round(2+3*x1+4*x2+rnorm(10),1)
cbind(y,x1,x2)

##          y x1  x2
##  [1,]  6.9  1 1.2
##  [2,] 14.3  2 1.4
##  [3,] 17.7  3 1.4
##  [4,] 31.8  4 4.1
##  [5,] 37.1  5 4.7
##  [6,] 41.0  6 5.5
##  [7,] 53.6  7 7.4
##  [8,] 59.7  8 8.8
##  [9,] 63.6  9 9.1
## [10,] 71.6 10 9.8

Use the Gram-Schmitt method to find a transformation \(\pmb{U=AX}\) such that \(\pmb{U}\) is orthogonal. Use this to verify theorem (6.3.4).

Gram-Schmitt works as follows:

\[ \begin{aligned} &\pmb{u}_1 =\pmb{x}_1 = \pmb{j} \\ &\pmb{u}_2 =\pmb{x}_2-\frac{\pmb{x_2'u_1}}{\pmb{u_1'u_1}}\pmb{u}_1 \\ &\pmb{u}_3 =\pmb{x}_3-\frac{\pmb{x_3'u_1}}{\pmb{u_1'u_1}}\pmb{u}_1-\frac{\pmb{x_3'u_2}}{\pmb{u_2'u_2}}\pmb{u}_2 \end{aligned} \]

Note that this also normalizes the vectors, which is not actually necessary for our problem.

vx1=matrix(1,length(x1),1)
vx2=cbind(x1)
vx3=cbind(x2)
u1=vx1/sqrt(c(t(vx1)%*%vx1))
u2=vx2 - c(t(vx2)%*%u1)/c(t(u1)%*%u1)*u1
u2=u2/sqrt(c(t(u2)%*%u2))
u3=vx3 - c(t(vx3)%*%u1)/c(t(u1)%*%u1)*u1 - c(t(vx3)%*%u2)/c(t(u2)%*%u2)*u2
u3=u3/sqrt(c(t(u3)%*%u3))
U=cbind(u1, u2, u3)
colnames(U)=c("Intercept", "x1", "x2")

let’s check to see whether U is orthogonal:

dim(U)

## [1] 10  3

round(t(U)%*%U, 10)

##           Intercept x1 x2
## Intercept         1  0  0
## x1                0  1  0
## x2                0  0  1

Now

coef(lm(y~U-1))

## UIntercept        Ux1        Ux2 
##  125.63729   66.42665    5.30114

coef(lm(y~U[,-2]-1))

## U[, -2]Intercept        U[, -2]x2 
##        125.63729          5.30114

coef(lm(y~U[,-3]-1))

## U[, -3]Intercept        U[, -3]x1 
##        125.63729         66.42665

and so the coefficients are the same, even though we dropped some parameters in the second and third case.