We will now analyze a multiple regression problem using Bayesian analysis. The models are often expressed in terms of the precision \(\tau\) instead of the variance, where
\[\tau=\frac1{\sigma^2}\]
A multiple regression model with k predictor variables has k+2 parameters, all of whom need a prior distribution. As an example it might look like this:
\[ \begin{aligned} &\pmb{y\vert \beta,\tau} \sim N_n(\pmb{X\beta}, \frac1\tau \pmb{I}) \\ &\pmb{\beta\vert\tau} \sim N_{k+1}(\pmb{\phi}, \frac1\tau \pmb{V}) \\ &\tau \sim Gamma(\alpha, \delta) \\ \end{aligned} \]
We will assume that \(\pmb{\phi, V, \alpha, \delta}\) are known, but in real live they are often treated as hyperparameters with their own distributions in what are called hierarchical models. \(\pmb{\phi, V, \alpha, \delta}\) are chosen so as to express some prior knowledge of the experiment.
The priors in the model above are conjugate priors, that is the posterior distribution is again a normal distribution.
proof
Ignoring all constants the joint density can be written as
\[ \begin{aligned} &f(\pmb{y}, \pmb{\beta}, \tau) = \\ &K\exp \left\{ -\frac{\tau}2 (\pmb{y}-\pmb{X\beta})'(\pmb{y}-\pmb{X\beta}) \right\} \exp \left\{ -\frac{\tau}2 (\pmb{\beta}-\pmb{\theta})'\pmb{V}^{-1}(\pmb{\beta}-\pmb{\theta}) \right\} \tau^{\alpha-1}\exp \left\{-\delta \tau \right\}=\\ &K\exp \left\{ -\frac{\tau}2 \left[ (\pmb{y}-\pmb{X\beta})'(\pmb{y}-\pmb{X\beta}) +(\pmb{\beta}-\pmb{\theta})'\pmb{V}^{-1}(\pmb{\beta}-\pmb{\theta})+2\delta \right]\right\} \tau^{\alpha-1}\\ \text {}\\ &g(\pmb{\beta}, \tau|\pmb{y}) = K_1 \exp \left\{ -\frac\tau2 (\pmb{\beta}-\pmb{\theta_1})'\pmb{V}_1^{-1}(\pmb{\beta}-\pmb{\theta_1})+\delta_1\right\} \end{aligned} \]
where
\[ \begin{aligned} &\pmb{V}_1 = (\pmb{V}^{-1}+\pmb{X'X})^{-1}\\ &\pmb{\phi}_1 = \pmb{V}_1(\pmb{V}^{-1}\pmb{\phi}+\pmb{X'y})\\ &\delta_1 =-\pmb{\phi_1'V}_1\pmb{\phi}_1+\pmb{\phi'V}^{-1}\pmb{\phi}+\pmb{y'y}+2\delta \end{aligned} \]
and so the posterior distribution is again a normal distribution.
With the Bayesian model above we have
\[\pmb{\beta\vert y}\sim t(n+2\alpha, \pmb{\phi}_1, \pmb{W}_1)\]
where \(t\) is a multivariate t distribution and
\[ \begin{aligned} &\pmb{\phi}_1 = (\pmb{V}^{-1}+\pmb{X'X})^{-1}(\pmb{V}^{-1}\pmb{\phi}+\pmb{X'y})\\ &\pmb{W}_1 = \left( (\pmb{y-X\phi})'(\pmb{I}+\pmb{XVX'})^{-1} (\pmb{y-X\phi})+2\delta \right)(\pmb{V}^{-1}+\pmb{X'X})^{-1}/(n+2\alpha) \end{aligned} \]
proof omitted
\[\pmb{\hat{\beta}}=\pmb{\phi}_1 = (\pmb{V}^{-1}+\pmb{X'X})^{-1}(\pmb{V}^{-1}\pmb{\phi}+\pmb{X'y})\]
\[\frac{\pmb{a'\hat{\beta}}-\pmb{a'\phi}_1}{\pmb{a'W_1a}}\sim t(n+2\alpha)\]
\[\frac{\hat{\beta}_i-\theta_{1i}}{W_{1ii}}\sim t(n+2\alpha)\]
\[\theta_{1i}\pm t_{w/2, n+2\alpha}W_{1ii}\]
proof omitted
If we choose \(\pmb{V} =K\pmb{I}\) where \(K\) is some large constant, and \(\pmb{\phi=0}\), then
\[ \begin{aligned} &\pmb{\hat{\beta}}= (\pmb{V}^{-1}+\pmb{X'X})^{-1}(\pmb{V}^{-1}\pmb{\phi}+\pmb{X'y}) = \\ &(\frac1{K}\pmb{I}+\pmb{X'X})^{-1}(\frac1{K}\pmb{I}\pmb{0}+\pmb{X'y}) \approx \\ &(\pmb{X'X})^{-1}\pmb{X'y} \end{aligned} \] and the Bayesian and Frequentist estimators are almost the same.
Let’s apply this method to our houseprice data. To do so we need values for \(\pmb{\phi, V}, \alpha, \delta\). We will use non-informative priors, so sensible choices are
\[ \begin{aligned} &\pmb{\phi} = (0,0,0,0)' \\ &\pmb{V} =1000\pmb{I} \\ &\alpha =0.001; \delta =0.001 \\ \end{aligned} \]
then
phi=cbind(rep(0, 5))
V=1000*diag(5)
alp=0.001;del=0.001
A=as.matrix(houseprice)
y=A[, 1, drop=FALSE]
n=length(y)
X=cbind(1, A[, -1])
phi1 = solve(solve(V)+t(X)%*%X)%*%(solve(V)%*%phi+t(X)%*%y)
round(cbind(phi1, solve(t(X)%*%X)%*%t(X)%*%y), 3)
## Price Price
## -67.503 -67.620
## Sqfeet 0.086 0.086
## Floors -26.463 -26.493
## Bedrooms -9.284 -9.286
## Baths 37.326 37.381
and we see that the estimators are quite similar to the frequentist ones, a typical result when non-informative priors are used.
Calculating the posterior distribution analytically as we did above is generally not possible. Instead one uses so called Markov Chain Monte Carlo simulation to generate data from the posterior distribution. For more on this take my course ESMA 5015 Simulation.