Notation, Formulas

Notation

Throughout this course we will use the following notational conventions, unless otherwise indicated:

  • small letters a, b, c, ..: numbers (or scalars)

  • small letters i,j, k, n, m: integers

  • greek letters \(\alpha, \beta, \gamma, ..\): parameters

  • small letters x, y, z, u, v, …: variables

  • bold face small letters \(\pmb{a, b, ..}\): column vectors of numbers

  • bold face large letters \(\pmb{A, B, ..}\): matrices of numbers

  • bold face greek letters \(\pmb{\alpha, \beta, ..}\): column vectors of parameters

  • bold face large letters \(\pmb{X, Y, ..}\): column vectors or matrices of random variables

  • exception 1: we will also use bold small letter \(\pmb{y}\) for vectors of random variables, rather than just \(\pmb{Y}\)

  • exception 2: bold large \(\pmb{X}\) for design matrix, a matrix of constants

Formulas (2.2.1)

Here are some formulas which we will eventually derive that then will be used extensively. As a numerical example we use

n=100
x1=1:n/10
x2=sort(round(runif(n, 0, 10), 1))
y=round(2+x1+3*x2+rnorm(n), 1)
ybar=mean(y)
X=cbind(1, x1, x2)
Xc=cbind(x1-mean(x1), x2-mean(x2))
yvec=cbind(y)
  • least squares estimator of \(\pmb{\beta}\):

\[\pmb{\hat{\beta}} = \pmb{(X'X)}^{-1}\pmb{X'y}\]

betahat=solve(t(X)%*%X)%*%t(X)%*%yvec
beta1hat=solve(t(Xc)%*%Xc)%*%t(Xc)%*%yvec
round(c(betahat), 4)
## [1] 1.9228 0.8055 3.2170
round(c(beta1hat), 4)
## [1] 0.8055 3.2170
yhat = X%*%betahat
  • SST total sums of squares

\[ \begin{aligned} &\text{SST}=\sum_{i=1}^n (y_i-\bar{y})^2 \end{aligned} \]

sst=round(sum((y-mean(y))^2), 4)
sst
## [1] 12281.89
  • SSE error sums of squares

\[ \begin{aligned} &\text{SSE}=\sum (y_i-\hat{y})^2=\\ &\pmb{y'y}-\pmb{\hat{\beta}'X'y} = \\ &\pmb{y'y-\hat{\beta}'X'X\beta} = \\ &\text{SST} -\pmb{\hat{\beta}_1'X_c'X_c\pmb{\hat{\beta}_1}} =\\ &\text{SST} -\pmb{\hat{\beta}_1'X_c'y} \end{aligned} \]

c(sum((y-yhat)^2),
t(y)%*%y-t(betahat)%*%t(X)%*%yvec,
t(yvec)%*%yvec-t(betahat)%*%t(X)%*%X%*%betahat,
sst-t(beta1hat)%*%t(Xc)%*%Xc%*%beta1hat,
sst-t(beta1hat)%*%t(Xc)%*%yvec)
## [1] 72.51688 72.51688 72.51688 72.51688 72.51688
  • SSR regression sums of squares

\[ \begin{aligned} &\text{SSR}=\sum_i(\hat{y}_i-\bar{y})^2=\\ &\pmb{\hat{\beta}_1'X_c'y} = \\ &(\pmb{X_c\hat{\beta}_1})'(\pmb{X_c\hat{\beta}_1})=\\ &\pmb{\hat{\beta}_1'X_c'}\pmb{X_c\hat{\beta}_1}=\\ &\pmb{\hat{\beta}'X'y}-n\bar{y}^2 \end{aligned} \]

ssr=sum( (yhat-ybar)^2 )
c(
ssr,
t(beta1hat)%*%t(Xc)%*%yvec,
t(beta1hat)%*%t(Xc)%*%Xc%*%beta1hat,
t(betahat)%*%t(X)%*%yvec-n*ybar^2
)
## [1] 12209.37 12209.37 12209.37 12209.37
  • Coefficient of Determination \(R^2\)

\[ \begin{aligned} &R^2 = \frac{\text{SSR}}{\text{SST}} = \frac{\sum_i(\hat{y}_i-\bar{y})^2}{\sum(y_i-\bar{y})^2} =\\ &\frac{(\pmb{X_c\hat{\beta}_1})'(\pmb{X_c\hat{\beta}_1})}{\pmb{y'y}-n\bar{y}^2} = \\ &\frac{\pmb{\hat{\beta}'Xy}-n\bar{y}^2}{\pmb{y'y}-n\bar{y}^2} = \\ &cor(\pmb{y,\hat{y}})^2 \end{aligned} \]

round( c(ssr/sst,  cor(y, yhat)^2), 4)
## [1] 0.9941 0.9941