Monitoring Convergence

The most difficult part of using an MCMC method for generating data is to make sure that the chain has reached its stationary distribution. In this chapter we will discuss a number of ways to do so.

The first we have already use before, namely a graphical check of the estimator vs the run number.

Example

We want to generate data from a density \(f(x;k)=1+\sin(2\pi x)\). Of course we can just use accept-reject here, but let’s instead use MH with a random walk proposal:

B <- 1e5
rexample1 <- function(eps, B=1e5, start) {
  f <- function(x) {
    (1+sin(2*pi*x))*ifelse(x<0|x>1, 0, 1)
  }
  x <- rep(0, B)
  x[1] <- ifelse(missing(start), runif(1), start) 
  for(i in 2:B) {
    u <- x[i-1]+runif(1, -eps, eps)
    if(runif(1)< f(u)/f(x[i-1]))
      x[i] <- u
    else
      x[i] <- x[i-1]
  }
  x
}

How can we monitor the convergence? The fist thing to do is to plot n vs \(T(n)=\frac1n \sum_{i=1}^n x_i\). By the WLLN \(T(n)\rightarrow E[X]\), so this should settle down at some point.

x <- rexample1(0.1)
z <- cumsum(x)/1:B
which <- 10*1:length(x)/10
df <- data.frame(x=(1:B)[which], y=z[which]) 
ggplot(data=df, aes(x, y)) +
  geom_line()

Notice that ggplot2 graphs are a bit slow when there are so many points, and it is better to “thin” the data frame a bit.

Notice how the average seems to settle down for a while, but then moves again? It can in fact be quite tricky to be sure that it has stopped.

It is often a good idea to run the MC several times, each time starting if from a different randomly chosen point. This is helped on modern computers by running it on several processors in parallel:

library(parallel)
num_cores <- detectCores()-1
cl <- makeCluster(num_cores)
y0.1 <- clusterCall(cl, rexample1, eps=0.1)

clusterCall returns a list, so for the graph we do

which=10*1:(4*B)
df <- data.frame(
   x=rep(1:B, 5)[which],
   Tn=unlist(lapply(y0.1, function(x) cumsum(x)/1:length(x)))[which],
   Run=factor(rep(1:5, each=B))[which])
ggplot(data=df, aes(x, Tn, color=Run)) +
  geom_line() 

Now let’s try a bigger eps:

y0.3 <- clusterCall(cl, rexample1, eps=0.3)

clusterCall returns a list, so for the graph we do

which=10*1:(4*B)
df <- data.frame(
   x=rep(1:B, 5)[which],
   Tn=unlist(lapply(y0.3, function(x) cumsum(x)/1:length(x)))[which],
   Run=factor(rep(1:5, each=B))[which])
ggplot(data=df, aes(x, Tn, color=Run)) +
  geom_line() 

and we can see that convergence has occurred much faster.

There is an R package that has a number of routines designed for this purpose called coda:

library(coda)
mc <- mcmc.list(mcmc(y0.1[[1]], thin=10),
               mcmc(y0.1[[2]], thin=10),
               mcmc(y0.1[[3]], thin=10),
               mcmc(y0.1[[4]], thin=10),
               mcmc(y0.1[[5]], thin=10))
summary(mc)
## 
## Iterations = 1:999991
## Thinning interval = 10 
## Number of chains = 5 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      0.3536858      0.2537129      0.0003588      0.0068203 
## 
## 2. Quantiles for each variable:
## 
##    2.5%     25%     50%     75%   97.5% 
## 0.02403 0.17136 0.30122 0.45390 0.97789
plot(mc)

Notice that we can let mcmc take care of the thinning.

As a formal test we can use a variant of the Kolmogorov-Smirnov test for the so-called two-sample problem. Let’s say we have \(X_1,..,X_n \sim F\) and \(Y_1,..,Y_m\sim G\) and we want to test \(H_0:F=G\). Then the KS test uses the test statistic

\[D=\max\{|F_n(x)-G_m(x)|\}\]

where \(F_n, G_m\) are the empirical distribution functions. We can use this to test whether an earlier sequence has the same distribution as the one generated at the end. Let’s say the graph above for eps=0.3 suggests that the stationary distribution was reached after about 10000 iterations, then

x1 <- y0.3[[1]][1:10000]
x2 <- y0.3[[1]][10001:11000]
y <- y0.3[[1]][99001:100000]
ks.test(x1, y)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  x1 and y
## D = 0.0485, p-value = 0.02777
## alternative hypothesis: two-sided
ks.test(x2, y)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  x2 and y
## D = 0.035, p-value = 0.5727
## alternative hypothesis: two-sided

and so it seems that indeed after about 10000 iterations the stationary distribution is reached.

There is an interesting question concerning the sample size. It appears that after having run the chain 5 times with B=1e5 and the discarding the first 10000, we should have a sample size of 450,000. Recall however that our observations are dependent. In the most extreme case all the observations could be the same, and the actual sample size might be just 1!. coda has a way to estimate the actual sample size:

mc1 <- mcmc.list(mcmc(y0.3[[1]][-c(1:10000)]),
               mcmc(y0.3[[2]][-c(1:10000)]),
               mcmc(y0.3[[3]][-c(1:10000)]),
               mcmc(y0.3[[4]][-c(1:10000)]),
               mcmc(y0.3[[5]][-c(1:10000)]))
effectiveSize(mc1)
##     var1 
## 10917.45

and we see it is just 10748! The fact that it is so much smaller indicates that our chain is quite dependent.

Example

Say we have a random vector \(\mathbf{X}=(X_1,..,X_5)\) with joint density proportional to \(g(x_1,..,x_5)=1+\sum_{i=1}^5 ix_i, 0<x_1<..<x_5<1\). We want to find an estimate of \(cor(x_1,x_5)\).

We begin by writing an MCMC routine:

B <- 1e5
rexample2 <- function(B=1e5) {
  f <- function(x) {
    (1+sum(c(1:5)*x))*ifelse(min(x)<0, 0, 1)*
      ifelse(max(x)>1, 0, 1)*prod(ifelse(diff(x)<0, 0, 1))
  }
  x <- matrix(0, B, 5)
  x[1, ] <- sort(runif(5))
  u <- numeric(5)
  for(i in 2:B) {
    u[1] <- runif(1, 0, x[i-1, 2])
    for(j in 2:4) u[j] <- runif(1, u[j-1], x[i-1, j+1])
    u[5] <- runif(1, u[4], 1)
    if(runif(1)< f(u)/f(x[i-1, ]))
      x[i, ] <- u
    else
      x[i, ] <- x[i-1, ]
  }
  x
}

In this case it makes the most sense to do the monitoring based on the object of interest, namely \(cov(x_1, x_5)\):

x <- rexample2()
cr <- rep(0, B/10)
for(i in seq_along(cr))
  cr[i] <- cor(x[1:(10*i), c(1, 5)])[1, 2]
df <- data.frame(x=10*1:(B/10), Cor=cr)
ggplot(data=df, aes(x, Cor)) +
  geom_line() 

it seems the chain reached its stationary distribution after about 25000 iterations. Then

cor(x[-c(1:25000), c(1, 5)])[1, 2]
## [1] 0.1863119
mc <- mcmc(x[-c(1:25000), c(1, 5)])
effectiveSize(mc)
##     var1     var2 
## 19834.21 19579.77