regression-normal

Normal Model and Coefficient of Determination R²

Normal Model

We now add the assumption

\[\pmb{y}\sim N_n(\pmb{X\beta}, \sigma^2\pmb{I})\]

Under this assumption it is now possible to use maximum likelihood for estimation. We denote the likelihood function by \(L(\pmb{\beta}, \sigma^2)\) and the log likelihood function by \(l(\pmb{\beta}, \sigma^2)= \log L(\pmb{\beta}, \sigma^2)\).

Theorem (6.4.1)

If \(\pmb{y}\sim N_n(\pmb{X\beta}, \sigma^2\pmb{I})\) where \(\pmb{X}\) is \(n\times k+1\) with full rank k+1<n, the maximum likelihood estimators are

\[ \begin{aligned} &\pmb{\hat{\beta}} = (\pmb{X}'\pmb{X})^{-1}\pmb{X}'\pmb{y} = \\ &\hat{\sigma}^2 =\frac1n (\pmb{y}-\pmb{X\hat{\beta}})'(\pmb{y}-\pmb{X\hat{\beta}}) \end{aligned} \]

so the mle of \(\pmb{\beta}\) is the least squares estimator.

proof (sketch)

\[l(\pmb{\beta}, \sigma^2) = -\frac{n}2\log(2\pi)-\frac{n}2\log\sigma^2-\frac1{2\sigma^2}(\pmb{y}-\pmb{X\beta})'(\pmb{y}-\pmb{X\beta})\]

taking partial derivatives and setting them equal to 0 yields the desired result.

Theorem (6.4.2)

Under the assumptions of theorem (6.4.1) we find

\(\pmb{\hat{\beta}}\sim N_{k+1}(\pmb{\beta}, \sigma^2(\pmb{X}'\pmb{X})^{-1})\)
\(n\hat{\sigma}^2 /\sigma^2\sim \chi^2(n-k-1)\)
\(\pmb{\hat{\beta}}\) and \(\hat{\sigma}^2\) are independent

proof

i follows from (5.2.8) and ii and iii have been shown before for the least squares estimators

Theorem (6.4.3)

Under the assumptions of theorem (6.4.1) \(\pmb{\hat{\beta}}\) and \(\hat{\sigma}^2\) are jointly sufficient statistics

proof omitted

Theorem (6.4.4)

Under the assumptions of theorem (6.4.1) \(\pmb{\hat{\beta}}\) and \(\hat{\sigma}^2\) have minimum variance among all unbiased estimators.

proof omitted

Note that this is a much stronger result than the Gauss-Markov theorem, which was about linear unbiased estimators.

R² in fixed-x regression

We can partition SST as follows: SST=SSR+SSE, where

\[\text{SSR} = \pmb{\hat{\beta}}_1'\pmb{X}_c'\pmb{y}\]

is called the regression sum of squares.

Definition (6.4.5)

\[R^2 = \frac{\text{SSR}}{\text{SST}} = \frac{\pmb{\hat{\beta}}_1'\pmb{X}_c'\pmb{X}_c\pmb{\hat{\beta}}_1}{\sum (y_i-\bar{y})^2} =\frac{\pmb{\hat{\beta}}'\pmb{X}'\pmb{y}-n\bar{y}^2}{\pmb{y'y}-n\bar{y}^2}=\frac{\sum_i(\hat{y}_i-\bar{y})^2}{\sum_i(y_i-\bar{y})^2} \]

is called the coefficient of determination. It provides a measure of how well the model fits the data.

Example (6.4.6)

For the houseprice data we find

A=as.matrix(houseprice)
n=nrow(A)
y=A[, 1, drop=FALSE]
X=cbind(1, A[, -1])
betahat= (solve(t(X)%*%X)%*%t(X))%*%y
(t(betahat)%*%t(X)%*%y-n*mean(y)^2)/(t(y)%*%y-n*mean(y)^2)

##           Price
## Price 0.8862443

so this model has an R² of 88.6%.

Of course we can also use R:

summary(lm(Price~., data=houseprice))

## 
## Call:
## lm(formula = Price ~ ., data = houseprice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.018  -5.943   1.860   5.947  30.955 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -67.61984   17.70818  -3.819 0.000882
## Sqfeet        0.08571    0.01076   7.966 4.62e-08
## Floors      -26.49306    9.48952  -2.792 0.010363
## Bedrooms     -9.28622    6.82985  -1.360 0.187121
## Baths        37.38067   12.26436   3.048 0.005709
## 
## Residual standard error: 13.71 on 23 degrees of freedom
## Multiple R-squared:  0.8862, Adjusted R-squared:  0.8665 
## F-statistic:  44.8 on 4 and 23 DF,  p-value: 1.558e-10

Here are some properties of R²:

\(0<R^2<1\)
if \(\pmb{\beta}=0\) \(R^2=0\)
\(R = cor(\pmb{y},\pmb{\hat{y}})^2\), that is \(R^2\) is the square of the correlation between the observed and the predicted y’s.

Example (6.4.7)

fit=lm(Price~., data=houseprice)
summary(fit)

## 
## Call:
## lm(formula = Price ~ ., data = houseprice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.018  -5.943   1.860   5.947  30.955 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -67.61984   17.70818  -3.819 0.000882
## Sqfeet        0.08571    0.01076   7.966 4.62e-08
## Floors      -26.49306    9.48952  -2.792 0.010363
## Bedrooms     -9.28622    6.82985  -1.360 0.187121
## Baths        37.38067   12.26436   3.048 0.005709
## 
## Residual standard error: 13.71 on 23 degrees of freedom
## Multiple R-squared:  0.8862, Adjusted R-squared:  0.8665 
## F-statistic:  44.8 on 4 and 23 DF,  p-value: 1.558e-10

yhat=predict(fit)
cor(houseprice$Price, yhat)^2

## [1] 0.8862443

Some more properties of \(R^2\):

Adding a variable to the predictors never decreases R².
if \(\pmb{\beta}=0\), then \(E[R^2]=k/(n-1)\)
R² is invariant under a full-rank linear transformation of the x’s and under a scale transformation of the y’s, but not a linear transformation of the x’s and the y’s simultaneously.
\(R^2 = \pmb{s}_{yx}'\pmb{S}_{xx}^{-1}\pmb{s}_{yx}/s_y^2\)

Example (6.4.8)

x1=sort(runif(100))
x2=sort(runif(100))
y= cbind(1+2*x1+rnorm(100))
X=cbind(1, x1)
betahat= (solve(t(X)%*%X)%*%t(X))%*%y
yhat=X%*%betahat
cor(y,yhat)^2

##           [,1]
## [1,] 0.2402542

X=cbind(1, x1, x2)
betahat= (solve(t(X)%*%X)%*%t(X))%*%y
yhat=X%*%betahat
cor(y,yhat)^2

##           [,1]
## [1,] 0.2516723

and so we see that although x2 is independent of y, including it in the model still increases \(R^2\), at least by a little bit.

R2=rep(0, 1000)
for(i in 1:1000) {
  x1=sort(runif(10))
  x2=sort(runif(10))
  y= cbind(1+rnorm(10))
  X=cbind(1, x1, x2)
  betahat= (solve(t(X)%*%X)%*%t(X))%*%y
  yhat=X%*%betahat
  R2[i]=cor(y,yhat)^2
}
round(c(mean(R2), 2/9), 3)

## [1] 0.218 0.222

x1=sort(runif(100))
x2=sort(runif(100))
y= cbind(1+2*x1+3*x2+rnorm(100))
X=cbind(1, x1, x2)
betahat= (solve(t(X)%*%X)%*%t(X))%*%y
yhat=X%*%betahat
cor(y,yhat)^2

##           [,1]
## [1,] 0.6903922

#scale transformation of y
y1=rnorm(1)*y
betahat= (solve(t(X)%*%X)%*%t(X))%*%y1
yhat=X%*%betahat
cor(y1,yhat)^2

##           [,1]
## [1,] 0.6903922

#linear transformation of x's
z1=rnorm(1)*x1+rnorm(1)*x2
z2=rnorm(1)*x1+rnorm(1)*x2
Z=cbind(1, z1, z2)
betahat= (solve(t(Z)%*%Z)%*%t(Z))%*%y
yhat=Z%*%betahat
cor(y,yhat)^2

##           [,1]
## [1,] 0.6903922

Normal Model and Coefficient of Determination R2