OpenBugs is a high-level language for specifying priors and simulating from the prior distribution using the Gibbs sampler.
Start by downloading the program from http://openbugs.net/w/Downloads and the R package R2Openbugs.
We start with a simple example. The data set survey of the MASS library has information on smoking of University students:
library(MASS)
tbl <- table(survey$Smoke)
tbl
##
## Heavy Never Occas Regul
## 11 189 19 17
so there are 189 students who never smoked. We want to find a 95% credible interval for the true proportion using a Beta prior. To sart we have to define a model:
model <- function() {
# Prior
p ~ dbeta(1, 1)
# Likelihood
y ~ dbin(p, N)
}
Notice that this is Openbugs notation, the tilde is not the usual tilde from R.
This model needs to be written to a file on disk:
library(R2OpenBUGS)
model.file <- file.path(tempdir(), "model.txt")
write.model(model, model.file)
Next we need to define some objects:
N <- as.numeric(sum(tbl)) # Number of Students
y <- N - as.numeric(tbl["Never"]) #Number of Smokers
data <- list("N", "y") #Names of Variables
params <- c("p") #Names of Parameters
inits <- function() { list(p=0.5) } #Starting Value
Next we let Openbugs do the works:
out <- bugs(data, inits, params, model.file, n.iter=10000)
As a check whether 10000 iterations was enough consider th \(\hat{R}\). They should all be less than 1.1:
all(out$summary[,"Rhat"] < 1.1)
## [1] TRUE
We can get the posterior mean and standard deviation of p from the output.
c(out$mean["p"], out$sd["p"] )
## $p
## [1] 0.2014755
##
## $p
## [1] 0.02574684
The full info is at
print(out, digits=5)
## Inference for Bugs model at "C:\Users\Wolfgang\AppData\Local\Temp\RtmpojhR52/model.txt",
## Current: 3 chains, each with 10000 iterations (first 5000 discarded)
## Cumulative: n.sims = 15000 iterations saved
## mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff
## p 0.20148 0.02575 0.1531 0.1838 0.2009 0.2184 0.25450 1.00129 5100
## deviance 6.45097 1.38373 5.4710 5.5700 5.9070 6.7770 10.47025 1.00119 6900
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = Dbar-Dhat)
## pD = 0.97340 and DIC = 7.42400
## DIC is an estimate of expected predictive error (lower deviance is better).
For a more detailed analysis with can use the coda library:
out <- bugs(data, inits, params, model.file,
n.iter=10000, codaPkg = TRUE)
out.coda <- read.bugs(out)
## Abstracting deviance ... 5000 valid values
## Abstracting p ... 5000 valid values
## Abstracting deviance ... 5000 valid values
## Abstracting p ... 5000 valid values
## Abstracting deviance ... 5000 valid values
## Abstracting p ... 5000 valid values
library(coda)
library(lattice)
xyplot(out.coda)
densityplot(out.coda)
acfplot(out.coda)
Finally the 95% credible interval:
ci <- summary(out.coda, q=c(0.025, 0.975))$q["p", ]
ci
## 2.5% 97.5%
## 0.1531 0.2545
Another variable in survey is the height of the students:
y <- as.numeric(na.omit(survey$Height))
n <- length(y)
bw <- diff(range(y))/50
ggplot(data.frame(y=y), aes(y)) +
geom_histogram(aes(y = ..density..),
color = "black",
fill = "white",
binwidth = bw) +
labs(x = "x", y = "Density")
We want to find a credible interval for the mean height. We will use a U[-1010, -1010] as the prior for \(\mu\) (essentially a flat prior) and a \(\Gamma(0.001, 0.001)\) prior for \(1/\sigma^2\). Here is the model specification:
model <- function() {
# Prior
mu ~ dunif(-1e10, 1e10)
tau ~ dgamma(0.001, 0.001)
# Likelihood
for (i in 1:n) {
y[i] ~ dnorm(mu, tau)
}
# Derived
sigma <- sqrt(1/tau)
}
model.file <- file.path(tempdir(), "model.txt")
write.model(model, model.file)
data <- list("y", "n")
params <- c("mu", "sigma")
inits <- function() {list(mu=0, tau=1)}
out <- bugs(data, inits, params, model.file,
n.iter=10000)
Again we check for convergence:
all(out$summary[,"Rhat"] < 1.1)
## [1] TRUE
and get the point estimates
round(unlist(out$mean[params]), 1)
## mu sigma
## 172.4 9.9
round(unlist(out$sd[params]), 3)
## mu sigma
## 0.687 0.487
out <- bugs(data, inits, params, model.file,
n.iter=10000, codaPkg = TRUE)
out.coda <- read.bugs(out)
## Abstracting deviance ... 5000 valid values
## Abstracting mu ... 5000 valid values
## Abstracting sigma ... 5000 valid values
## Abstracting deviance ... 5000 valid values
## Abstracting mu ... 5000 valid values
## Abstracting sigma ... 5000 valid values
## Abstracting deviance ... 5000 valid values
## Abstracting mu ... 5000 valid values
## Abstracting sigma ... 5000 valid values
ci <- summary(out.coda, q=c(0.025, 0.975))$q["mu", ]
ci
## 2.5% 97.5%
## 171.0 173.7
We want to find a regression model of Waiting time on Eruptions for the Old Faithful data. So the model is
\[y=\alpha+\beta x+\epsilon;\epsilon\sim N(0,\sigma)\] We will use normal priors for \(\alpha\) and \(\beta\) and as above a \(\Gamma(0.001, 0.001)\) prior for \(1/\sigma^2\)
model <- function() {
# Priors
alpha ~ dnorm(0, 0.001)
beta ~ dnorm(0, 0.001)
tau ~ dgamma(0.001, 0.001)
# Likelihood
for (i in 1:n) {
y[i] ~ dnorm(mu[i], tau)
mu[i] <- alpha + beta*x[i]
}
}
model.file <- file.path(tempdir(), "model.txt")
write.model(model, model.file)
x <- faithful$Eruptions
y <- faithful$Waiting.Time
n <- length(x)
data <- list("x", "y", "n")
params <- c("alpha", "beta", "mu")
inits <- function() {list(alpha=0, beta=0, tau=1)}
out <- bugs(data, inits, params, model.file, n.iter=5000)
all(out$summary[,"Rhat"] < 1.1)
## [1] TRUE
95% credible intervals for the coefficients are:
out$summary[c("alpha", "beta"), c("2.5%", "97.5%")]
## 2.5% 97.5%
## alpha 31.08475 35.54
## beta 10.16000 11.38
Here is the fitted line plot with both the least squares and the Bayesian line:
cf <- cbind(unlist(out$mean[c("alpha", "beta")]))
ggplot(data=data.frame(x=x, y=y), aes(x, y)) +
geom_point() +
geom_smooth(method = "lm", se=FALSE) +
geom_abline(intercept = cf[1] ,slope = cf[2],
color="red")
and we see they are just about the same.