Bayesian Inference for Regression

We will now analyze a multiple regression problem using Bayesian analysis. The models are often expressed in terms of the precision \(\tau\) instead of the variance, where

\[\tau=\frac1{\sigma^2}\]

A multiple regression model with k predictor variables has k+2 parameters, all of whom need a prior distribution. As an example it might look like this:

\[ \begin{aligned} &\pmb{y\vert \beta,\tau} \sim N_n(\pmb{X\beta}, \frac1\tau \pmb{I}) \\ &\pmb{\beta\vert\tau} \sim N_{k+1}(\pmb{\phi}, \frac1\tau \pmb{V}) \\ &\tau \sim Gamma(\alpha, \delta) \\ \end{aligned} \]

We will assume that \(\pmb{\phi, V, \alpha, \delta}\) are known, but in real live they are often treated as hyperparameters with their own distributions in what are called hierarchical models. \(\pmb{\phi, V, \alpha, \delta}\) are chosen so as to express some prior knowledge of the experiment.

Theorem (6.11.1)

The priors in the model above are conjugate priors, that is the posterior distribution is again a normal distribution.

proof

Ignoring all constants the joint density can be written as

\[ \begin{aligned} &f(\pmb{y}, \pmb{\beta}, \tau) = \\ &K\exp \left\{ -\frac{\tau}2 (\pmb{y}-\pmb{X\beta})'(\pmb{y}-\pmb{X\beta}) \right\} \exp \left\{ -\frac{\tau}2 (\pmb{\beta}-\pmb{\theta})'\pmb{V}^{-1}(\pmb{\beta}-\pmb{\theta}) \right\} \tau^{\alpha-1}\exp \left\{-\delta \tau \right\}=\\ &K\exp \left\{ -\frac{\tau}2 \left[ (\pmb{y}-\pmb{X\beta})'(\pmb{y}-\pmb{X\beta}) +(\pmb{\beta}-\pmb{\theta})'\pmb{V}^{-1}(\pmb{\beta}-\pmb{\theta})+2\delta \right]\right\} \tau^{\alpha-1}\\ \text {}\\ &g(\pmb{\beta}, \tau|\pmb{y}) = K_1 \exp \left\{ -\frac\tau2 (\pmb{\beta}-\pmb{\theta_1})'\pmb{V}_1^{-1}(\pmb{\beta}-\pmb{\theta_1})+\delta_1\right\} \end{aligned} \]

where

\[ \begin{aligned} &\pmb{V}_1 = (\pmb{V}^{-1}+\pmb{X'X})^{-1}\\ &\pmb{\phi}_1 = \pmb{V}_1(\pmb{V}^{-1}\pmb{\phi}+\pmb{X'y})\\ &\delta_1 =-\pmb{\phi_1'V}_1\pmb{\phi}_1+\pmb{\phi'V}^{-1}\pmb{\phi}+\pmb{y'y}+2\delta \end{aligned} \]

and so the posterior distribution is again a normal distribution.

Theorem (6.11.2)

With the Bayesian model above we have

\[\pmb{\beta\vert y}\sim t(n+2\alpha, \pmb{\phi}_1, \pmb{W}_1)\]

where \(t\) is a multivariate t distribution and

\[ \begin{aligned} &\pmb{\phi}_1 = (\pmb{V}^{-1}+\pmb{X'X})^{-1}(\pmb{V}^{-1}\pmb{\phi}+\pmb{X'y})\\ &\pmb{W}_1 = \left( (\pmb{y-X\phi})'(\pmb{I}+\pmb{XVX'})^{-1} (\pmb{y-X\phi})+2\delta \right)(\pmb{V}^{-1}+\pmb{X'X})^{-1}/(n+2\alpha) \end{aligned} \]

proof omitted

Bayesian Inference for \(\pmb{\beta}\)

Theorem (6.11.3)

  1. Based on the above model a reasonable estimator is the mean of the posterior distribution:

\[\pmb{\hat{\beta}}=\pmb{\phi}_1 = (\pmb{V}^{-1}+\pmb{X'X})^{-1}(\pmb{V}^{-1}\pmb{\phi}+\pmb{X'y})\]

  1. for any vector \(\pmb{a}\) we have

\[\frac{\pmb{a'\hat{\beta}}-\pmb{a'\phi}_1}{\pmb{a'W_1a}}\sim t(n+2\alpha)\]

\[\frac{\hat{\beta}_i-\theta_{1i}}{W_{1ii}}\sim t(n+2\alpha)\]

  1. A \((1-w)100\%\) credible interval is given by

\[\theta_{1i}\pm t_{w/2, n+2\alpha}W_{1ii}\]

proof omitted

Example (6.11.4)

If we choose \(\pmb{V} =K\pmb{I}\) where \(K\) is some large constant, and \(\pmb{\phi=0}\), then

\[ \begin{aligned} &\pmb{\hat{\beta}}= (\pmb{V}^{-1}+\pmb{X'X})^{-1}(\pmb{V}^{-1}\pmb{\phi}+\pmb{X'y}) = \\ &(\frac1{K}\pmb{I}+\pmb{X'X})^{-1}(\frac1{K}\pmb{I}\pmb{0}+\pmb{X'y}) \approx \\ &(\pmb{X'X})^{-1}\pmb{X'y} \end{aligned} \] and the Bayesian and Frequentist estimators are almost the same.

Example (6.11.5)

Let’s apply this method to our houseprice data. To do so we need values for \(\pmb{\phi, V}, \alpha, \delta\). We will use non-informative priors, so sensible choices are

\[ \begin{aligned} &\pmb{\phi} = (0,0,0,0)' \\ &\pmb{V} =1000\pmb{I} \\ &\alpha =0.001; \delta =0.001 \\ \end{aligned} \]

then

phi=cbind(rep(0, 5))
V=1000*diag(5)
alp=0.001;del=0.001
A=as.matrix(houseprice)
y=A[, 1, drop=FALSE]
n=length(y)
X=cbind(1, A[, -1])
phi1 = solve(solve(V)+t(X)%*%X)%*%(solve(V)%*%phi+t(X)%*%y)
round(cbind(phi1, solve(t(X)%*%X)%*%t(X)%*%y), 3)
##            Price   Price
##          -67.503 -67.620
## Sqfeet     0.086   0.086
## Floors   -26.463 -26.493
## Bedrooms  -9.284  -9.286
## Baths     37.326  37.381

and we see that the estimators are quite similar to the frequentist ones, a typical result when non-informative priors are used.

Modern Bayesian Analysis

Calculating the posterior distribution analytically as we did above is generally not possible. Instead one uses so called Markov Chain Monte Carlo simulation to generate data from the posterior distribution. For more on this take my course ESMA 5015 Simulation.