Random Predictors

Up to now we have assumed that the design matrix \(\pmb{X}\) was fixed. Often however the values of the predictor variables are themselves random. In fact, that was the case in the wine and houseprice examples. It turns out that treating a random X case as if the X’s were fixed is acceptable in many cases, in fact we will see that most of the results we have obtained so far still hold. Moreover, it often makes sense to analyze a regression problem conditional on the the X’s, in which case the predictors are treated as fixed although they originated in some random fashion.

The usual theoretical justification for treating a random predictor as fixed is that in terms of the parameter vector \(\pmb{\beta}\) the predictor random vector is an ancillary statistic, that is the distribution of \(\pmb{X}\) does not depend on \(\pmb{\beta}\).

If the x’s are to be treated as random, one clearly needs to specify the joint distribution of \((y, \pmb{x})\). As usual the most common choice is a multivariate normal distribution, in which case we need to study

\[ cov\begin{pmatrix} y \\ x_1 \\ \vdots \\ x_k \end{pmatrix} = \pmb{\Sigma}\]

Multivariate Normal Regression Model

We will assume that \(\pmb{y}\) and \(\pmb{X}\) have a joint multivariate normal distribution with mean vector

\[\pmb{\mu} = \begin{pmatrix} \mu_y \\ \pmb{\mu}_x \end{pmatrix}\]

and covariance matrix

\[\pmb{\Sigma} = \begin{pmatrix}\sigma_{yy} & \pmb{\sigma}_{yx}' \\ \pmb{\sigma}_{yx} & \pmb{\sigma}_{xx} \end{pmatrix}\]

By (5.2.13) we have

\[E[y|\pmb{x}] = \mu_y+\pmb{\sigma}_{yx}'\pmb{\Sigma}_{xx}^{-1}(\pmb{x}-\pmb{\mu}_x)=\beta_0+\pmb{\beta_1'x}\]

where

\[\beta_0=\mu_y-\pmb{\sigma}_{yx}'\pmb{\Sigma}^{-1}\pmb{\mu}_x\]

\[\pmb{\beta_1'}=\pmb{\Sigma}_{xx}^{-1}\pmb{\sigma}_{yx}\]

Also from (5.2.13) we have

\[var(y|\pmb{x}) = \pmb{\sigma}_{yy}-\pmb{\sigma}_{yx}'\pmb{\Sigma}_{xx}^{-1}\pmb{\sigma}_{yx}=\sigma^2\]

Note that under this model y is not only linear in \(\pmb{\beta}\) but also linear in the x’s, so this does not allow for a model (say) quadratic in x.

Estimation and Testing

Theorem (6.10.1)

Under the multivariate normal model the maximum likelihood estimators are given by

\[\pmb{\hat{\mu}} = \begin{pmatrix} \bar{y} \\ \pmb{\bar{x}} \end{pmatrix}\]

and

\[\pmb{\hat{\Sigma}} = \frac{n-1}{n}\begin{pmatrix}s_{yy} & \pmb{s}_{yx}' \\ \pmb{s}_{yx} & \pmb{s}_{xx} \end{pmatrix}\]

proof omitted

Theorem (6.10.2)

(Invariance of MLEs)

Let g be some function. Under mild conditions on g, if \(\pmb{\hat{\theta}}\) is the mle of \(\pmb{\theta}\), then \(g(\pmb{\hat{\theta}})\) is the mle of \(g(\pmb{\theta})\).

proof any book on theory of statistics

Theorem (6.10.3)

The mle’s of \(\beta_0\), \(\pmb{\beta}_1\) and \(\sigma^2\) are given by

\[\hat{\beta}_0=\bar{y}-\pmb{s}_{yx}'\pmb{S}^{-1}\pmb{\bar{x}}\]

\[\pmb{\hat{\beta}_1'}=\pmb{S}_{xx}^{-1}\pmb{s}_{yx}\] and

\[\hat{\sigma}^2 = \frac{n-1}{n}\left(s_{yy}-\pmb{s}_{yx}'\pmb{S}_{xx}^{-1} \pmb{s}_{yx}\right)\] proof follows from the invariance of mle’s by applying the functions for the corresponding parameters above.


Notice that these estimators are the same as the least-squares estimators in the fixed x case. However, their distributions are no longer multivariate normal but multivariate t.

The F tests discussed in section 6.6 work equally well in the random x case since they are based on the conditional distributions.

Standardized Regression Coefficients

The sample correlation matrix can be written as

\[ \pmb{R} = \begin{pmatrix} 1 & r_{y1}& ... & r_{yk}\\ r_{1y} & 1 & ... & r_{1k}\\ \vdots & \vdots& & ... & \vdots\\ r_{ky} & r_{k1} & ... & 1 \end{pmatrix}= \begin{pmatrix} 1 & \pmb{r}'_{yx} \\ \pmb{r}_{yx} & \pmb{R}_{xx} \end{pmatrix} \] here (for example)

\[r_{y1}=\frac{s_{y2}}{\sqrt{s^2_ys^2_2}}=\frac{\sum(y_i-\bar{y})(x_{i2}-\bar{x}_2)}{\sqrt{\sum(y_i-\bar{y})^2\sum(x_{i2}-\bar{x}_2)^2}}\]

and

\[r_{12}=\frac{s_{12}}{\sqrt{s^2_1s^2_2}}=\frac{\sum(x_{i1}-\bar{x}_1)(x_{i2}-\bar{x}_2)}{\sqrt{\sum(x_{i1}-\bar{x}_1)^2\sum(x_{i2}-\bar{x}_2)^2}}\]

we have \(\pmb{S=DRD}\), where \(\pmb{D}=[diag(S)]^{1/2}\)

\[ \pmb{D} = \begin{pmatrix} s_y & 0& ... & 0\\ 0 & \sqrt{s_{11}} & ... & 0\\ \vdots & \vdots& & ... & \vdots\\ 0 & 0 & ... & \sqrt{s_{kk}} \end{pmatrix}= \begin{pmatrix} s_y & \pmb{0}' \\ \pmb{0} & \pmb{D}_{xx} \end{pmatrix} \] and we can write

\[\pmb{\hat{\beta}}_1=s_y\pmb{D}^{-1}\pmb{R}_{xx}^{-1}\pmb{r}_{yx}\]

Definition (6.10.4)

Let \(\pmb{x}\) be sample. The the z scores are defined as

\[\pmb{z}=\frac{\pmb{x}-\bar{x}}{{s_x}}\]

Recall that the model in centered form is

\[\hat{y}_i=\bar{y}+\sum_i \hat{\beta}_j(x_{ij}-\bar{x}_j)\]

and so

\[\frac{\hat{y}_i-\bar{y}}{s_y}=\sum_i \frac{s_j}{s_y}\hat{\beta}_j\left(\frac{x_{ij}-\bar{x}_j}{s_j}\right)\]

Definition (6.10.5)

The coefficients \(\hat{\beta}_j^*=\frac{s_j}{s_y}\hat{\beta}_j\) are called the beta weights or beta coefficients. They can also be found as

\[\pmb{\hat{\beta}}_1^*=\frac1{s_y}\pmb{D}_x\pmb{\hat{\beta}}_1=\pmb{R}_{xx}^{-1}\pmb{r}_{yx}\]

Example (6.10.6)

For the houseprice data we find

A=as.matrix(houseprice)
y=A[, 1, drop=FALSE]
ybar=mean(y)
X=cbind(1, A[, -1])
xbar=apply(X[, -1], 2, mean)
sxx=cov(A[, -1])
syx=cov(A)[-1, 1]
betahat= solve(t(X)%*%X)%*%t(X)%*%y
round(c(betahat), 3)
## [1] -67.620   0.086 -26.493  -9.286  37.381
round(c(solve(sxx)%*%cbind(syx)), 3)
## [1]   0.086 -26.493  -9.286  37.381
round(ybar-rbind(syx)%*%solve(sxx)%*%xbar, 3)
##       [,1]
## syx -67.62

\(R^2\)

Definition (6.10.7)

The sample coefficient of determination is defined by

\[R^2=\frac{\pmb{s}'_{yx}\pmb{S}^{-1}_{xx}\pmb{s}_{yx}}{s_{yy}}\]

Example (6.10.8)

For the houseprice data we find

A=as.matrix(houseprice)
y=A[, 1, drop=FALSE]
sxx=cov(A[, -1])
tmp=cov(A)[, 1]
syy=tmp[1]
syx=tmp[-1]
round(rbind(syx)%*%solve(sxx)%*%cbind(syx)/syy, 3)
##       syx
## syx 0.886
summary(lm(Price~., data=houseprice))
## 
## Call:
## lm(formula = Price ~ ., data = houseprice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.018  -5.943   1.860   5.947  30.955 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -67.61984   17.70818  -3.819 0.000882
## Sqfeet        0.08571    0.01076   7.966 4.62e-08
## Floors      -26.49306    9.48952  -2.792 0.010363
## Bedrooms     -9.28622    6.82985  -1.360 0.187121
## Baths        37.38067   12.26436   3.048 0.005709
## 
## Residual standard error: 13.71 on 23 degrees of freedom
## Multiple R-squared:  0.8862, Adjusted R-squared:  0.8665 
## F-statistic:  44.8 on 4 and 23 DF,  p-value: 1.558e-10