regression-testing

Hypothesis Tests in Regression

Test for Overall Regression

We start with a test to see whether any of the predictor variables is useful. So let \(\pmb{\beta}_1 = (\beta_1\text{ }...\text{ }\beta_k)'\), then we wish to test

\[H_0: \pmb{\beta}_1 = \pmb{0}\]

To find a test we will use the centered model

\[\pmb{y} = \begin{pmatrix} \pmb{j} & \pmb{X}_c \end{pmatrix} \begin{pmatrix} \alpha \\ \pmb{\beta}_1\end{pmatrix}+\pmb{\epsilon}\]

where \(\pmb{X}_c = [\pmb{I}-(1/n)\pmb{J}]\pmb{X}_1\) is the centered matrix and \(\pmb{X}_1\) is \(\pmb{X}\) without the column of 1’s. The corrected total sum of squares is given by

\[ \begin{aligned} &\sum (y_i-\bar{y})^2 = \\ &\pmb{\hat{\beta}}_1'\pmb{X}_c'\pmb{y} +\left[\sum (y_i-\bar{y})^2 - \pmb{\hat{\beta}}_1'\pmb{X}_c'\pmb{y}\right]= \\ &\pmb{\hat{\beta}}_1'\pmb{X}_c'\pmb{X}_c\pmb{\hat{\beta}}_1 + \text{SSE} = \\ &\text{SSR}+\text{SSE} \end{aligned} \]

using (6.4.5).

Recall the following formulas:

\[ \begin{aligned} &\sum (y_i-\bar{y})^2 = \pmb{y}'[\pmb{I}-(1/n)\pmb{J}]\pmb{y}\\ &\pmb{\hat{\beta}}_1 = (\pmb{X}_c'\pmb{X}_c)^{-1}\pmb{X}_c'\pmb{y}\\ &\text{SSE} =\sum (y_i-\bar{y})^2- \pmb{\hat{\beta}}_1\pmb{X}_c'\pmb{y}\\ \end{aligned} \]

so we have

\[ \begin{aligned} &\text{SSR}+\text{SSE} =\sum (y_i-\bar{y})^2 = \pmb{y}'[\pmb{I}-(1/n)\pmb{J}]\pmb{y} =\\ &\pmb{y'X}_c(\pmb{X}_c'\pmb{X}_c)^{-1}\pmb{X}_c'\pmb{y} + \pmb{y}'[\pmb{I}-(1/n)\pmb{J}]\pmb{y} -\pmb{y'X}_c(\pmb{X}_c'\pmb{X}_c)^{-1}\pmb{X}_c'\pmb{y}= \\ &\pmb{y}'\pmb{H}_c\pmb{y} +\pmb{y}'[\pmb{I}-(1/n)\pmb{J}-\pmb{H}_c]\pmb{y} = \\ \end{aligned} \]

where \(\pmb{H}_c=\pmb{X}_c(\pmb{X}_c'\pmb{X}_c)^{-1}\pmb{X}_c'\).

Theorem (6.6.1)

The matrices above have the following properties:

\(\pmb{H}_c[\pmb{I}-(1/n)\pmb{J}] =\pmb{H}_c\)
\(\pmb{H}_c\) is idempotent of rank k
\(\pmb{I}-(1/n)\pmb{J}-\pmb{H}_c\) is idempotent of rank n-k-1
\(\pmb{H}_c[\pmb{I}-(1/n)\pmb{J}-\pmb{H}_c]=\pmb{O}\)

proof follow from direct calculation

Theorem (6.6.2)

If \(\pmb{y}\sim N_n(\pmb{X\beta}, \sigma^2\pmb{I})\) then

\(\text{SSR}/\sigma^2=\pmb{\hat{\beta}}_1'\pmb{X}_c'\pmb{X}_c\pmb{\hat{\beta}}_1/\sigma^2 \sim \chi^2(k, \lambda_1)\) where \(\lambda_1=\pmb{\beta}_1'\pmb{X}_c'\pmb{X}_c\pmb{\beta}_1/(2\sigma^2)\)
\(\text{SSE}/\sigma^2=[\sum (y_i-\bar{y})^2-\pmb{\hat{\beta}}_1'\pmb{X}_c'\pmb{X}_c\pmb{\hat{\beta}}_1]/\sigma^2 \sim \chi^2(n-k-1)\)
SSR and SSE are independent

proof i and ii follow from the calculation above and (5.2.2). The proof of iii is omitted.

Theorem (6.6.3)

Under the conditions of theorem (6.6.2) let

\[F=\frac{\text{SSR}/k}{\text{SSE}/(n-k-1)} \]

then

if \(H_0:\pmb{\beta}_1=\pmb{0}\) is false \(F\sim F(k, n-k-1, \lambda_1)\), where \(\lambda_1=\pmb{\beta}_1'\pmb{X}_c'\pmb{X}_c\pmb{\beta}_1/(2\sigma^2)\)
if \(H_0:\pmb{\beta}_1=\pmb{0}\) is true \(F\sim F(k, n-k-1)\)

proof

see (5.4.6) and (6.6.1)

The test \(H_0:\pmb{\beta}_1=\pmb{0}\) is done as follows: Reject H₀ if \(F\ge F_{\alpha, k, n-k-1}\), where \(F_{\alpha, k, n-k-1}\) is the upper \(\alpha\) percentile of a (central) F distribution with k and n-k-1 degrees of freedom.

The p value of the test is given by \(P(X>F)\), where \(X\sim F(k, n-k-1)\).

The result of such a test is usually presented in the form of an ANOVA table, which looks as follows:

\[ \begin{array}{ccccc} \text{Source} & \text{df} & \text{SS} & \text{F} & \text{TS}\\ \hline\\ \text{Due to }\beta_1 & k & \text{SSR} & \text{SSR}/k & F\\ \text{Error} & n-k-1 & \text{SSE} & \text{SSE}/(n-k-1) & \\ \text{Total} & n-1 & \text{SST}\\ \hline \\ \end{array} \]

Example (6.6.4)

We run this the test for the houseprice data

A=as.matrix(houseprice)
n=nrow(A);k=ncol(A)-1
c(n, k, n-k-1)

## [1] 28  4 23

y=A[, 1, drop=FALSE]
Xc=A[, -1]
xbar=apply(Xc, 2, mean)
for(j in 1:4) Xc[ ,j]=Xc[ ,j]-xbar[j]
beta1hat= (solve(t(Xc)%*%Xc)%*%t(Xc))%*%y
round(c(beta1hat), 4)

## [1]   0.0857 -26.4931  -9.2862  37.3807

\[SSR=\pmb{\hat{\beta}}_1'\pmb{X}_c'\pmb{y}\]

ssr=t(beta1hat)%*%t(Xc)%*%y
ssr/c(1, k)

## [1] 33670.654  8417.663

\[SSE=\sum(y_i-\bar{y})^2-SSR\]

sst=sum((y-mean(y))^2)
sst

## [1] 37992.52

sse=sst-ssr
sse/c(1, n-k-1)

## [1] 4321.8636  187.9071

FTS = (ssr/k)/(sse/(n-k-1))
FTS

##          Price
## Price 44.79694

\[ \begin{array}{ccccc} \text{Source} & \text{df} & \text{SS} & \text{F} & \text{TS}\\ \hline \\ \text{Due to }\beta_1 & 4 & 33670 & 8418 & 44.8\\ \text{Error} & 23 & 4322 & 187.9 & \\ \text{Total} & 27 & 37992\\ \hline &\\ \end{array} \]

If we test at the 5% level the critical value is

qf(0.95, k, n-k-1)

## [1] 2.795539

\(F=44.8>2.8\), and so we reject the null hypothesis, at least some of the variables are useful for predicting the house prices.

The p value of the test is

1-pf(FTS, k, n-k-1)

##             Price
## Price 1.55808e-10

Here is the printout of the lm command:

summary(lm(Price~., data=houseprice))

## 
## Call:
## lm(formula = Price ~ ., data = houseprice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.018  -5.943   1.860   5.947  30.955 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -67.61984   17.70818  -3.819 0.000882
## Sqfeet        0.08571    0.01076   7.966 4.62e-08
## Floors      -26.49306    9.48952  -2.792 0.010363
## Bedrooms     -9.28622    6.82985  -1.360 0.187121
## Baths        37.38067   12.26436   3.048 0.005709
## 
## Residual standard error: 13.71 on 23 degrees of freedom
## Multiple R-squared:  0.8862, Adjusted R-squared:  0.8665 
## F-statistic:  44.8 on 4 and 23 DF,  p-value: 1.558e-10

and we find the F statistic, degrees of freedom and p value in the last line.

Test on Subsets of \(\pmb{\beta}\)

Say we have split \(\pmb{\beta} = (\pmb{\beta}_1\text{ }\pmb{\beta}_2)'\) and we wish to test \(H_0:\pmb{\beta}_2=0\). Then we can partition \(\pmb{X}\) accordingly, so the model becomes

\[\pmb{y}=\pmb{X_1\beta_1}+\pmb{X_2\beta_2}+\pmb{\epsilon}\]

where the intercept \(\beta_0\) is included in \(\pmb{\beta}_1\).

We define matrices

\[\pmb{H}=\pmb{X}(\pmb{X}'\pmb{X})^{-1}\pmb{X}'\] \[\pmb{H}_1=\pmb{X}_1(\pmb{X}_1'\pmb{X}_1)^{-1}\pmb{X}_1'\]

so that \(\pmb{Hy}\) is the least squares estimator of \(\pmb{\beta}\) under the full model and \(\pmb{H_1y}\) is the least squares estimator of \(\pmb{\beta}_1\) under reduced model assuming the null hypothesis is true.

Theorem (6.6.5)

\(\pmb{H}-\pmb{H}_1\) is idempotent with rank h, where h is the number of elements in \(\pmb{\beta}_2\)

proof omitted

Theorem (6.6.6)

If \(\pmb{y}\sim N_n(\pmb{X\beta}, \sigma^2\pmb{I})\) then

\(\pmb{y'(I-H)y}/\sigma^2\sim \chi^2(n-k-1)\)
\(\pmb{y'(H-H_1)y}/\sigma^2\sim \chi^2(h, \lambda_1)\), where

\[\lambda_1=\pmb{\beta}_2'[\pmb{X}_2'\pmb{X}_2-\pmb{X}_2'\pmb{X}_1(\pmb{X}_1'\pmb{X}_1)^{-1}\pmb{X}_1'\pmb{X}_2]\pmb{\beta}_2/(2\sigma^2)\]

\(\pmb{y'(I-H)y}\) and \(\pmb{y'(H-H_1)y}\) are independent

proof omitted

Theorem (6.6.7)

If \(\pmb{y}\sim N_n(\pmb{X\beta}, \sigma^2\pmb{I})\) and define

\[F=\frac{\pmb{y'(H-H_1)y}/h}{\pmb{y'(I-H)y}/(n-k-1)}\]

then

if \(H_0:\pmb{\beta_2}=0\) is false \(F\sim F(h, n-k-1,\lambda_1)\)
if \(H_0:\pmb{\beta_2}=0\) is true \(F\sim F(h, n-k-1)\)

proof same as above

so the test rejects the null hypothesis if \(F\ge F_{\alpha, h, n-k-1}\)

Example (6.6.8)

Let’s check this on some simulated data:

n=100
x1=runif(n)
x2=runif(n)
x3=runif(n)
x4=runif(n)
y=x1+x2+rnorm(n,0,0.1)
round(c(cor(y, x1), cor(y, x2), cor(y, x3), cor(y, x4)), 3)

## [1]  0.634  0.612 -0.093  0.094

Null Hypothesis is true

X=cbind(1,x1,x2,x3,x4)
X1=cbind(1,x1,x2)
k=ncol(X)-1
h=ncol(X1)-1
H=X%*%solve(t(X)%*%X)%*%t(X)
H1=X1%*%solve(t(X1)%*%X1)%*%t(X1)
num=t(y)%*%(H-H1)%*%y/h
denom=t(y)%*%(diag(nrow(H))-H)%*%y/(n-k-1)
FTS=num/denom
c(FTS, qf(0.95, h, n-k-1), 1-pf(FTS, h, n-k-1))

## [1] 0.8161282 3.0922174 0.4452157

Null Hypothesis is false

X=cbind(1,x1,x3,x2,x4)
X1=cbind(1,x1,x3)
k=ncol(X)-1
h=ncol(X1)-1
H=X%*%solve(t(X)%*%X)%*%t(X)
H1=X1%*%solve(t(X1)%*%X1)%*%t(X1)
num=t(y)%*%(H-H1)%*%y/h
denom=t(y)%*%(diag(nrow(H))-H)%*%y/(n-k-1)
FTS=num/denom
c(FTS, qf(0.95, h, n-k-1), 1-pf(FTS, h, n-k-1))

## [1] 248.023486   3.092217   0.000000

Example (6.6.8a)

Let’s consider the houseprice data. First note

round(cor(houseprice)[1, -1], 3)

##   Sqfeet   Floors Bedrooms    Baths 
##    0.915    0.291    0.605    0.653

that the variable with the smallest correlation with Price is Floors, so one might want to test whether Floors is a useful predictor variable. Let’s see:

A=as.matrix(houseprice)
n=nrow(A)
k=ncol(A)-1
X=cbind(1, A[, -1])
y=A[, 1, drop=FALSE]
X1=X[, -3]
H=X%*%solve(t(X)%*%X)%*%t(X)
H1=X1%*%solve(t(X1)%*%X1)%*%t(X1)
h=1
num=t(y)%*%(H-H1)%*%y/h
denom=t(y)%*%(diag(nrow(H))-H)%*%y/(n-k-1)
FTS=num/denom
c(FTS, qf(0.95, h, n-k-1))

## [1] 7.794275 4.279344

\(7.79>4.28\), so we reject the null hypothesis, Floors is (borderline) useful.

The p-value is

1-pf(FTS, 1, n-k-1)

##            Price
## Price 0.01036282

Comment using correlations with the response to see whether a variable is useful in a multiple regression problem turns out to be a bad idea. We will return to this important topic soon.

Example (6.6.9)

We have the results from an experiment designed to determine how much the speed of a washing machine effects the wear on a new fabric. The machine was run at 5 different speeds (measured in rpm) and with six pieces of fabric each.

ggplot(data=fabricwear, aes(Speed, Wear)) +
  geom_point()

The scatterplot makes it clear that a linear model is not going to work, so we will try a polynomial model. Because the predictor variable has large values we first standardize it:

n=length(fabricwear$Speed)
x=(fabricwear$Speed-110)/(190-110)
X=cbind(1, x, x^2)
y=cbind(fabricwear$Wear)
k=ncol(X)
X1=X[, -k]
H=X%*%solve(t(X)%*%X)%*%t(X)
H1=X1%*%solve(t(X1)%*%X1)%*%t(X1)
num=t(y)%*%(H-H1)%*%y
denom=t(y)%*%(diag(nrow(H))-H)%*%y/(n-k-1)
round(c(num/denom, qf(0.95, 1, n-k-1)), 2)

## [1] 78.91  4.23

\(78.91\ge 4.23\), and so we reject the null hypothesis, the quadratic term improves the fit significantly.

How about a cubic term?

X=cbind(1, x, x^2,x^3)
k=ncol(X)
X1=X[, -k]
H=X%*%solve(t(X)%*%X)%*%t(X)
H1=X1%*%solve(t(X1)%*%X1)%*%t(X1)
num=t(y)%*%(H-H1)%*%y
denom=t(y)%*%(diag(nrow(H))-H)%*%y/(n-k-1)
round(c(num/denom, qf(0.95, 1, n-k-1)), 2)

## [1] 0.78 4.24

\(0.78<4.24\), and so a cubic term is not needed.

Recall the definition of R²:

\[R^2=\frac{\pmb{\hat{\beta}'X'y}-n\bar{y}^2}{\pmb{y'y}-n\bar{y}^2}\]

Now

Theorem (6.6.10)

when testing \(H_0:\pmb{\beta}_1=0\) we have

\[F=\frac{R^2/k}{(1-R^2)/(n-k-1)}\]

when testing \(H_0:\pmb{\beta}_2=0\) we have

\[F=\frac{(R^2-R_r^2)/h}{(1-R^2)/(n-k-1)}\]

where \(R_r^2\) is is \(R^2\) for the reduced model

proof direct calculation

Testing the General Linear Hypothesis

Definition (6.6.11)

Let \(\pmb{C}\) be a \(q\times (k+1)\) matrix of constants, then a test of a hypothesis of the form

\[H_0: \pmb{C\beta}=0\]

is called the general linear hypothesis

Example (6.6.12)

The test for the full model corresponds to \(\pmb{C}= \begin{pmatrix} \pmb{0} & \pmb{I}_k \end{pmatrix}\).

The test for the a set of predictors corresponds to \(\pmb{C}= \begin{pmatrix} \pmb{0} & \pmb{I}_h \end{pmatrix}\).

We can also use this to test many other hypothesis:

Say have \(\pmb{\beta}=(\beta_0, \beta_1, .., \beta_4)'\) and we want to test

\[H_0: \beta_2-2\beta_3=\beta_2+\beta_4=0\]

then we can write this as follows

\[ H_0: \begin{pmatrix} 0 & 0 & 1 & -2 & 0 \\ 0 & 0 & 1 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \end{pmatrix}=\pmb{0} \]

Theorem (6.6.13)

If \(\pmb{y}\sim N_n(\pmb{X\beta}, \sigma^2\pmb{I})\) and \(\pmb{C}\) be a \(q\times (k+1)\) matrix of constants, then

\(\pmb{C\hat{\beta}}\sim N_q\left( \pmb{C\beta}, \sigma^2\pmb{C(X'X)^{-1}C'} \right)\)
SSH/\(\sigma^2\) = \((\pmb{C\hat{\beta}})'[\pmb{C(X'X)^{-1}C'}]^{-1} \pmb{C\hat{\beta}}\sim \chi^2(q,\lambda)\)

where \(\lambda=(\pmb{C\beta)})'[\pmb{C(X'X)^{-1}C'}]^{-1}\pmb{C\beta}/(2\sigma^2)\)

SSE/\(\sigma^2\) = \(\pmb{y'[I-X(X'X)^{-1}X']y}/\sigma^2\sim \chi^2(n-k-1\)
SSH and SSE are independent

proof omitted

Theorem (6.6.14)

Say \(\pmb{y}\sim N_n(\pmb{X\beta}, \sigma^2\pmb{I})\) and \(\pmb{C}\) be a \(q\times (k+1)\) matrix of constants.

Let

\[F=\frac{\text{SSH}/q}{\text{SSE}/(n-k-1)}\]

If \(H_0: \pmb{C\beta}=0\) is false, then \(F\sim F(q, n-k-1, \lambda )\), where \(\lambda=(\pmb{C\beta)})'[\pmb{C(X'X)^{-1}C'}]^{-1}\pmb{C\beta}/(2\sigma^2)\)
If \(H_0: \pmb{C\beta}=0\) is true, then \(F\sim F(q, n-k-1)\),

proof same as theorems above

If we want to test a single condition, for example \(H_0: \beta_1=\beta_2\), we can use the test above but the calculations simplify. In this case \(\pmb{C}\) is a matrix of one row, so we can write the null hypothesis as \(H_0: \pmb{a'\beta}=0\), and the test statistic becomes

\[F=\frac{(\pmb{a'\hat{\beta}})^2}{s^2\pmb{a'(X'X)^{-1}a}}\]

where s²=SSE/(n-k-1). Under the null hypothesis F now has an F distribution with 1 and n-k-1 degrees of freedom.

Also consider a test for a single variable, that is the test \(H_0:\beta_j=0\). This can be done with the above and \(\pmb{a}=(0,..,1,..,0)'\), which leads to

\[F=\frac{\hat{\beta}_j^2}{s^2g_{jj}}\]

where \(g_{jj}\) is the j^th diagonal element of \(\pmb{(X'X)}^{-1}\).

Since an F distribution with 1 and n degrees of freedom is equal to a t distribution with n degrees of freedom we also have the following test: Let

\[T=\frac{\hat{\beta}_j}{s\sqrt{g_{jj}}}\]

and reject the null hypothesis if \(|T|\ge t(\alpha/2,n-k-1)\)

Example (6.6.15)

We can now make our own summary(lm(..)) routine:

mysummarylm=function(formula, data) {
   cl <- match.call()
   cat("Call:\n")
   print(cl)
   mf <- match.call(expand.dots = FALSE)
   m <- match(c("formula", "data", "subset", 
        "weights", "na.action", "offset"), 
        names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    y=mf[, 1]
    X=mf[, -1]   
   if(is.data.frame(X)) X=as.matrix(X)
   lbls=c("(Intercept)", colnames(X))
   n=length(y)
   k=ncol(X)
   X=cbind(1, X)
   y=cbind(y)
   ybar=mean(y)
   G=solve(t(X)%*%X)
   betahat= (solve(t(X)%*%X)%*%t(X))%*%y
   yhat=X%*%betahat
   residuals=y-yhat
   sse = sum((y-yhat)^2)
   ssr=sum( (yhat-ybar)^2 )
   qres=c(round(quantile(residuals, c(0, 0.25, 0.5, 0.75, 1)), 3))
   names(qres)=c("Min", "1Q", "Median", "3Q", "Max")
   cat("\nResiduals:\n")
   print(qres)
   out=as.data.frame(matrix(0, 5, 4))
   colnames(out)=c("Estimate", " Std.Error", "t value", "p-value")
   rownames(out)=lbls
   for(i in 1:5) {
      TS=betahat[i]/sqrt(sse/(n-k-1)*G[i, i])
      out[i, ]=c(round(betahat[i], 5), 
              round(sqrt(sse/(n-k-1)*G[i, i]), 5),
              round(TS, 3),  
              round(2*(1-pt(abs(TS), n-k-1)), 6)
            )
   }
   cat("\nCoefficients:\n")
   print(out)
   cat("\nResidual standard error: ", round(sqrt(sse/(n-k-1)), 2), " on ", n-k-1, " degrees of freedom\n")
   R2=cor(y,yhat)^2
   R2adj=1-((n-1)*(1-R2)/(n-k-1))
   cat("Multiple R-squared : ", round(R2, 4), ", Adjusted R-squared: ", round(R2adj, 4), "\n")
   FTS= (ssr/k)/(sse/(n-k-1))
   cat("F-statistic: ", round(FTS, 1), " on ", k, " and ", n-k-1, " DF,  p-value: ", 1-pf(FTS, k, n-k-1),"\n")

}

summary(lm(Price~., data=houseprice))

## 
## Call:
## lm(formula = Price ~ ., data = houseprice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.018  -5.943   1.860   5.947  30.955 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -67.61984   17.70818  -3.819 0.000882
## Sqfeet        0.08571    0.01076   7.966 4.62e-08
## Floors      -26.49306    9.48952  -2.792 0.010363
## Bedrooms     -9.28622    6.82985  -1.360 0.187121
## Baths        37.38067   12.26436   3.048 0.005709
## 
## Residual standard error: 13.71 on 23 degrees of freedom
## Multiple R-squared:  0.8862, Adjusted R-squared:  0.8665 
## F-statistic:  44.8 on 4 and 23 DF,  p-value: 1.558e-10

mysummarylm(Price~., data=houseprice)

## Call:
## mysummarylm(formula = Price ~ ., data = houseprice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.018  -5.943   1.860   5.947  30.955 
## 
## Coefficients:
##              Estimate  Std.Error t value  p-value
## (Intercept) -67.61984   17.70818  -3.819 0.000882
## Sqfeet        0.08571    0.01076   7.966 0.000000
## Floors      -26.49306    9.48952  -2.792 0.010363
## Bedrooms     -9.28622    6.82985  -1.360 0.187121
## Baths        37.38067   12.26436   3.048 0.005709
## 
## Residual standard error:  13.71  on  23  degrees of freedom
## Multiple R-squared :  0.8862 , Adjusted R-squared:  0.8665 
## F-statistic:  44.8  on  4  and  23  DF,  p-value:  1.55808e-10

The one item we have not discussed, and won’t, is the adjusted \(R^2\):

\[R^2_{adj}=1-\left[\frac{(n-1)(1-R^2)}{n-k-1}\right]\]

The idea here is to penalize a model for the number of predictors. As k increases \(R^2_{adj}\) (might) go down, whereas \(R^2\) never does. In this way one could use \(R^2_{adj}\) for model selection. However, there are better ways to do this, as we will see shortly.