Consider the the following artificial data set and its summary of the lm command:
set.seed(111)
x1=round(sort(runif(100)), 2)
x2=round(sort(runif(100)), 2)
x3=round(sort(runif(100)), 2)
y=round(1+2*x1+3*x2+4*x3+rnorm(100),1)
summary(lm(y~x1+x2+x3))
##
## Call:
## lm(formula = y ~ x1 + x2 + x3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.35695 -0.51916 -0.05192 0.70102 2.33240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.6556 0.1810 3.622 0.00047
## x1 1.8731 2.1465 0.873 0.38505
## x2 -4.8388 4.0227 -1.203 0.23199
## x3 12.4945 4.3210 2.892 0.00474
##
## Residual standard error: 0.8695 on 96 degrees of freedom
## Multiple R-squared: 0.9148, Adjusted R-squared: 0.9121
## F-statistic: 343.6 on 3 and 96 DF, p-value: < 2.2e-16
Use R but not the lm command to find the following the numbers in the printout:
X=cbind(1, x1, x2, x3)
y=cbind(y)
XX=t(X)%*%X
XXinv=solve(XX)
betahat=XXinv%*%t(X)%*%y
#coefficients
round(c(betahat), 4)
## [1] 0.6556 1.8731 -4.8388 12.4945
yhat=X%*%betahat
eps=y-yhat
#Residuals:
tmp=quantile(eps, c(0, 0.25, 0.5, 0.75, 1))
tmp[3]=median(eps)
names(tmp)=c("Min", "1Q", "Median", "3Q", "Max")
round(tmp, 5)
## Min 1Q Median 3Q Max
## -2.35695 -0.51916 -0.05192 0.70102 2.33240
sse=t(eps)%*%eps
#Residual standard error
s=round(c(sqrt(sse/96)),4)
s
## [1] 0.8695
round(sqrt(diag(s^2*XXinv)), 4)
## x1 x2 x3
## 0.1810 2.1464 4.0226 4.3208
Say we just found \(\pmb{\hat{\beta}}\) with k predictor variables. Now we would like to use another predictor, together with those already used. If k is large this means we have to invert a \(k+1\times k+1\) matrix. But we just inverted a very similar \(k\times k\) matrix. Show how we can make use of that.
There are several ways to do this. We could use theorem (6.2.4) or:
Let’s denote by \((\pmb{X'X})^{-1}_k\) and \((\pmb{X'X})^{-1}_{k+1}\) the matrices without and with the new variable, then example (4.2.5) we can find
where \(a_{22}\) is a scalar, then is a scalar and
\[ (\pmb{X'X})^{-1}_{k+1} = \frac1{b}\begin{pmatrix} b(\pmb{X'X})^{-1}_k+(\pmb{X'X})^{-1}_ka_{12}a_{12}'(\pmb{X'X})^{-1}_k & -(\pmb{X'X})^{-1}_ka_{12} \\ -a_{12}'(\pmb{X'X})^{-1}_k & 1 \\ \end{pmatrix} \]
where \(\pmb{a}_{12} = \begin{pmatrix} x_{1,k+1} & .. & x_{n-1,k+1} \end{pmatrix}'\), \(a_{22} =x_{n,k+1}\) and \(b=a_{22}-a_{21}'(\pmb{X'X})^{-1}_ka_{12}\).
Let’s check with R:
k=4
X=cbind(1, matrix(rnorm(1000*k), 1000, k))
XX=t(X)%*%X
XXinv=solve(XX)
x=rnorm(1000)
X1=cbind(X,x)
XX1=t(X1)%*%X1
f=function() {
a12=XX1[-(k+2) ,k+2, drop=FALSE]
a22=XX1[k+2 ,k+2]
b=c(a22-t(a12)%*%XXinv%*%a12)
A11=b*XXinv+XXinv%*%a12%*%t(a12)%*%XXinv
A12=-XXinv%*%a12
A21=cbind(t(A12),1)
A=cbind(A11, A12)
rbind(A, A21)/b
}
round(solve(XX1)-f(), 10)
## x
## 0 0 0 0 0 0
## 0 0 0 0 0 0
## 0 0 0 0 0 0
## 0 0 0 0 0 0
## 0 0 0 0 0 0
## x 0 0 0 0 0 0
so this works.
How much faster is it? Let’s see:
library(microbenchmark)
k=100
X=cbind(1, matrix(rnorm(1000*k), 1000, k))
XX=t(X)%*%X
XXinv=solve(XX)
x=rnorm(1000)
X1=cbind(X,x)
XX1=t(X1)%*%X1
microbenchmark(solve(XX1), f())
## Unit: microseconds
## expr min lq mean median uq max neval cld
## solve(XX1) 774.601 840.701 904.5861 848.551 893.8515 4513.202 100 a
## f() 741.201 896.801 1017.4540 907.451 962.5510 8851.101 100 a
and this doesn’t seem to work. However, the R routine solve is really a C++ routine whereas our function f is not. If we were to rewrite f in C++ (using the Rcpp library), it would in fact be much faster.
Consider the following data set:
set.seed(1111)
x1=1:10
x2=sort(round(runif(10,0,10), 1))
y=round(2+3*x1+4*x2+rnorm(10),1)
cbind(y,x1,x2)
## y x1 x2
## [1,] 6.9 1 1.2
## [2,] 14.3 2 1.4
## [3,] 17.7 3 1.4
## [4,] 31.8 4 4.1
## [5,] 37.1 5 4.7
## [6,] 41.0 6 5.5
## [7,] 53.6 7 7.4
## [8,] 59.7 8 8.8
## [9,] 63.6 9 9.1
## [10,] 71.6 10 9.8
Use the Gram-Schmitt method to find a transformation \(\pmb{U=AX}\) such that \(\pmb{U}\) is orthogonal. Use this to verify theorem (6.3.4).
Gram-Schmitt works as follows:
\[ \begin{aligned} &\pmb{u}_1 =\pmb{x}_1 = \pmb{j} \\ &\pmb{u}_2 =\pmb{x}_2-\frac{\pmb{x_2'u_1}}{\pmb{u_1'u_1}}\pmb{u}_1 \\ &\pmb{u}_3 =\pmb{x}_3-\frac{\pmb{x_3'u_1}}{\pmb{u_1'u_1}}\pmb{u}_1-\frac{\pmb{x_3'u_2}}{\pmb{u_2'u_2}}\pmb{u}_2 \end{aligned} \]
Note that this also normalizes the vectors, which is not actually necessary for our problem.
vx1=matrix(1,length(x1),1)
vx2=cbind(x1)
vx3=cbind(x2)
u1=vx1/sqrt(c(t(vx1)%*%vx1))
u2=vx2 - c(t(vx2)%*%u1)/c(t(u1)%*%u1)*u1
u2=u2/sqrt(c(t(u2)%*%u2))
u3=vx3 - c(t(vx3)%*%u1)/c(t(u1)%*%u1)*u1 - c(t(vx3)%*%u2)/c(t(u2)%*%u2)*u2
u3=u3/sqrt(c(t(u3)%*%u3))
U=cbind(u1, u2, u3)
colnames(U)=c("Intercept", "x1", "x2")
let’s check to see whether U is orthogonal:
dim(U)
## [1] 10 3
round(t(U)%*%U, 10)
## Intercept x1 x2
## Intercept 1 0 0
## x1 0 1 0
## x2 0 0 1
Now
coef(lm(y~U-1))
## UIntercept Ux1 Ux2
## 125.63729 66.42665 5.30114
coef(lm(y~U[,-2]-1))
## U[, -2]Intercept U[, -2]x2
## 125.63729 5.30114
coef(lm(y~U[,-3]-1))
## U[, -3]Intercept U[, -3]x1
## 125.63729 66.42665
and so the coefficients are the same, even though we dropped some parameters in the second and third case.