The Fundamental Theorem of Simulation

Let’s take another look at the Accept-Reject algorithm. Let’s say we have a f which has finite support on some interval [A,B] and we want to generate data from f. So we use Y~U[A,B] and we accept an observation if U < f(Y) . If we draw a graph similar to the accrej.ill routines above but with many runs it would look like this

There is another way to think of this: consider the rectangle and generate a pair of uniform rvs on it. A point is accepted if it falls into the gray region. This idea leads to the following:

First we write \[ f(x) = \int_0^{f(x)} du \]

Next consider the random vector (X, U) which has a joint that is uniform on the set {(x,u): 0 < u < f(x)}. Now f is the marginal of (X,U)!

It does not seem that we have gained much by this rewriting of the accept - reject algorithm, but in fact this way of thinking has a great many consequences. First of all it is much more general than it seems, for example f need not be a univariate density but can be a of a random vector on an arbitrary space!

Also it is clear from this description that in general the auxiliary variable can be chosen to be a uniform, although in practice then some transformations might be needed. That this is more important than it seems at first is clear because of the

Theorem (Fundamental Theorem of Simulation)

Simulating

\(X\sim F\)

is equivalent to simulating

\((X, U)\sim U\{ (x,u): 0 < u < f(x) \}\)

proof trivial

One thing that is made clear by this theorem is that we can generate X in three ways:

first generate \(X \sim F\), and then \(U|X=x\), but this is useless because we already have X and don’t need U.
first generate U, and then \(X|U=u\), that is just the accept - reject algorithm
generate (X, U) jointly, which will eventually turn out be the smart idea because it allows us to generate data on a larger set were simulation is easier, and then to use the pair if the constraint is satisfied.

The full generality of this approach will have to wait a bit, but here is a simple case: Say X has support [A, B], and \(f(x) < M\) for all x. So we generate a pair \((Y, U)\sim U\{ 0 < u < M \}\) by simulating \(Y\sim U[a,b]\) and \(U|Y=y\sim U[0,m]\) and take the pair iff \(u<f(y)\).

This works because

\[ P \left( X\le x \right) = \\ P(Y \le x | U<f(Y)) = \\ \frac{P(Y \le x , U<f(Y))}{P(U<f(Y))} = \\ \frac{\int_a^x \int_0^{f(y)} dudy}{\int_a^b \int_0^{f(y)} dudy} = \\ \frac{\int_a^x f(y) dy}{\int_a^b f(y) dy} = \\ \int_a^x f(y) dy \]

Example

Say \(X\sim \text{Beta}( \alpha, \beta)\) with \(\alpha, \beta > 1\).

Now take \(Y\sim U[0,1]\) and we find m by

so \(U\sim U[0, f((\alpha-1)/(\alpha+\beta-2)) ]\)

this is implemented in genbeta:

genbeta <- function(n=10000, alpha=2, beta=3) {
  x <- (alpha-1)/(alpha+beta-2)
  M <- dbeta(x, alpha, beta)
  xu <- matrix(0, n, 2)
  for(i in 1:n) {
    repeat {  
      Y <- runif(1)
      U <- runif(1, 0, M)
      if(U<dbeta(Y, alpha, beta)) 
        {xu[i, ] <- c(Y, U); break}
    }  
  }
  xu[, 1]
}
hist(genbeta(), 100, freq=FALSE, main="")
curve(dbeta(x, 2, 3), 
      lwd=2, col="blue", add=TRUE)

The biggest restriction on the usefulness of this idea is that we need (X, U) to be in a box, so we can simulate from uniforms. This can be overcome if we allow simulating from a larger set on which uniform simulation is possible. Let’s say this larger set is of the form \[ L = \left\{ (y, u): 0<u<m(x) \right\} \] then the constraint is of the form \(m(x)<f(x)\).

Obviously because \(m(x)<f(x)\) m(x) will not be a density (except if m(x)=f(x), where we are back at accept-reject) but that

\[ \int m(x) dx = M < \infty \]

(if \(\int m(x) dx = \infty\) uniform simulation from L would not be possible)

and so we can define \(g(x) = m(x)/M\) and then g is a density.

If uniform simulation on L is possible we can then use the third bullet above, that is generate \(Y \sim F\) and then \(U|Y=y \sim U(0, m(x) )\). Now if we only accept y’s with \(u<f(y)\) we have

and we now have a generlization of the fundamental theorem:

Corollary Let \(X\sim F\) and let g be a density that satisfies

\[ f(x) < Mg(x) \] for all x and some constant \(M\ge 1\). Then, to simulate from f it is sufficient to generate from

\(Y\sim g\)

and

\(U|Y=y\sim U(0,Mg(y))\)

until \(0 < u < f(y)\)

As with the basic accept-reject algorithm, it can be shown that the number of trials until acceptance has an exponential distribution with rate 1/M , so the smaller M can be chosen, the quicker the sample is generated.

Let’s start with a simple

Example

say we want to simulate from this density:

f <- function(x) 1.87243*(1-x^3)^4
curve(f, 0, 1, lwd=2)

We can of course just use U[0,1]’s here. But let’s try do do this a little faster. So we need a density g we can easily generate, that has \(f(x)<Mg(x)\) and “matches” f a bit better.

How about a linear function, say

g=function(x) 2*(1-x)
curve(f, 0, 1, ylim=c(0,3))
curve(g, 0, 1, add=TRUE, col="blue")
x=seq(0,0.9999,length=1000)
M=max(f(x)/g(x))
lines(x, M*g(x), col="red")

Here the blue line is g and the rd one is Mg, so clealry we have f<Mg.

To generate g is easy because it is the density of a Beta(1, 2)!

Now so we have

simex <- function () {
    n <- 1e+05
    x <- rep(0, n)
    for (i in 1:n) {
        repeat {
            y <- rbeta(1, 1, 2)
            u <- runif(1, 0, M * g(y))
            if (u < f(y)) {
                  x[i] <- y
                  break
            }
        }
    }
    hist(x, breaks = 250, freq = FALSE, main="")
    curve(f, 0, 1, add=TRUE, col="blue", lwd=2)
}
simex()

Example

say we want to simulate from this density:

f <- function(x) exp(-x^2/2)*(sin(6*x^2) + 3*cos(x)^2*sin(4*x)^2 + 1)
curve(f, -4, 4)

Notice that in order to use accept-reject we would need to find \(\int f(x)dx\), already a no-trivial problem even using a numerical method.

In order to use the corollary, we need a g with \(f(x)<Mg(x)\) for some \(M\ge 1\). Obviously \[ \sin (6x^2) + 3\cos (x)^2 \sin (4x)^2 + 1 \le 5 \]

so if we use the standard normal for g we have

and so \(M = 5\sqrt{(2\pi)} = 12.54\).

fundex <- function (which = 1) {
    f <- function(x) 
      exp(-x^2/2)*(sin(6*x)^2+3*cos(x)^2*
                      sin(4*x)^2+1)
    if(which==1) 
        curve(f, -4, 4, lwd = 2, n = 500)
    M <- 5*sqrt(2*pi)
    if(which==2) {
        curve(f, -4, 4, lwd=2, n=500, ylim=c(0, 5))
        x <- seq(-4, 4, length=200)
        lines(x, M*dnorm(x), lwd = 2, col = "blue")
    }
    if(which == 3) {
        n <- 1e+05
        x <- rep(0, n)
        for (i in 1:n) {
            repeat {
                y <- rnorm(1)
                u <- runif(1, 0, M * dnorm(y))
                if (u < f(y)) {
                  x[i] <- y
                  break
                }
            }
        }
        hist(x, breaks = 250, freq = FALSE, main="")
    }
    
}

Here are f and g:

fundex(2)

and here is the simulation:

fundex(3)

Example

say we want to generate standard normal variates. As g we will use the double exponential distribution, that is \[ g(x; \lambda) = \frac{1}{2} \lambda \exp(-\lambda |x|) \] which, as we will see shortly, can be generated with

sample(c(-1, 1), size=n)*lambda*\log(U)

Now

\(\lambda |x|-x^2/2\) is symmetric in x, and \(\lambda|x|-x^2/2\) has a maximum at \(x=\lambda\), so we have

\[ f(x)/g(x; \lambda) \le \sqrt{2/\pi } \frac{1}{\lambda} exp(\lambda^2/2 ) \]

we are free to choose any \(\lambda\), so it makes sense to choose it to minimize M, that is

and we find \(\lambda=1\) is optimal. With it we have

\(M = \sqrt{2/\pi}/\exp(1/2) = 1.3\)

fundex1() draws the curve for f and M*g and does the simulation:

fundex1 <- function (n=1e+05) {
    curve(dnorm(x), -3, 3, 
          ylim=c(0, 0.65), lwd=2, ylab="f/g")
    curve(1.3*0.5*exp(-abs(x)), -3, 3, 
          add = TRUE, col = "blue", lwd=2)
    x <- rep(0, n)
    M <- sqrt(2*exp(1)/pi)
    g <- function(x) 0.5*exp(-abs(x))
    for (i in 1:n) {
        repeat {
            y <- sample(c(-1, 1), 1)*log(runif(1))
            u <- runif(1, 0, M*g(y))
            if (u < dnorm(y)) {
                x[i] <- y
                break
            }
        }
    }
    hist(x, breaks = 250, freq = FALSE, main = "")
    z <- seq(-3, 3, length = 250)
    lines(z, dnorm(z), lwd = 2)
    
}
fundex1()

Example

We want to generate data from the random vector (X,Y) with joint density

\[f(x,y)=\frac1{12}\exp\{-\sqrt{x+y}\}\]

Quite often when trying to generate a random vector, the proposal distribution can be chosen to be independent, which simplifies things quite a bit. We also need a density that goes to 0 slower than f. Something like \(1/x^2\) would work, but \(\int_0^\infty 1/x^2 =\infty\), so lets try this:

\[ g(x)= \left\{ \begin{array}[ll] .\frac12 & 0<x<1 \\ \frac1{2x^2} & x>1 \end{array} \right. \]

How can we find M? Let’s do it numerically:

f=function(x, y) exp(-sqrt(x+y))/12*ifelse(x>0, 1, 0)*ifelse(x>0, 1, 0)
g=function(x) ifelse(x<1, 1, 1/x^2)/2*ifelse(x>0, 1, 0)
f.g=function(x) -exp(-sqrt(x[1]+x[2]))/12/g(x[1])/g(x[2])
M=-optim(c(10,10), f.g, lower = c(0,0))$value
M

## [1] 117.2527

How do we generate data from g? Say \(Z\sim G\). Note that P(Z<1) = 1/2 and \(Z|Z<1\sim U[0,1]\). Also, I will use a method called the inverse method, which will be discussed in the next section.

\[ \begin{aligned} &\int_1^x 1/(2t^2) dt = -1/t\vert_1^x = 1-1/x\\ &y =1-1/x \\ &x = 1/(1-y) \\ \end{aligned} \]

so here is a routine that generates data from g:

gen.g <- function(n=1e4) {
  x=rep(0, n)
  for(i in 1:n) {
    if(runif(1)<0.5) x[i]=runif(1)
    else x[i] = 1/(1-runif(1))
  }
  x
}
x=gen.g()
hist(x[x<10], 100, freq=FALSE, main="")
curve(g, 0, 10, add=TRUE, lwd=2, col="blue")

and so now

simex <- function (n=1e4) {
    xy <- matrix(0, n, 2)
    for (i in 1:n) {
        repeat {
            uv <- gen.g(2)  
            u <- runif(1, 0, M*g(uv[1])*g(uv[2]))
            if (u < f(uv[1], uv[2])) {
                xy[i, ] <- uv
                break
            }
        }
    }
    xy
}
xy = simex()
hist(xy[, 1], breaks = 250, freq = FALSE, main = "", xlab="x")

If we want to add the marginal densities we have to find them numerically. Because of the symmetry of the density in x and y the two marginals are the same. I will also truncate the histogram to make it easier to read:

y=seq(0, 50, length=250)
a=y
for(i in 1:250) a[i]=integrate(f, 0, Inf, y=y[i])$value
par(mfrow=c(1, 2))
hist(xy[, 1], breaks = 250, freq = FALSE, 
     xlab="x", main = "", xlim=c(0, 50))
lines(y, a, col="blue", lwd=2)
hist(xy[, 2], breaks = 250, freq = FALSE, 
     xlab="y", main = "", xlim=c(0, 50))
lines(y, a, col="blue", lwd=2)

Example

We previously discussed the case of a 10-dimensional rv with joint pdf

\[f(x_1, ..,x_{10}) = K \prod x_i, 0<x_1<x_2< ... < x_{10}<1\]

where we had the problem of not being able to find the constant K.

Let’s consider the so-called order statistic of the uniform distribution: say \(U_1,..,U_n\) are iid U[0,1] and set \(X_i=U_k\) if \(U_k\) is the i_th largest of the U’s. In other words, the \(X_i\)’s are the sorted U’s.

It is easy to show that

\[g(x_1,..,x_n)=n!, 0<x_1<x_2< ... < x_{10}<1\]

that is the joint distribution is uniform on the set with \(0<x_1<x_2< ... < x_{10}<1\). Now

\[ \begin{aligned} &\frac{f(x)}{g(x)} = \frac{\prod x_i}{n!}\le \frac1{n!}\\ &M=\frac1{n!}\\ &Mg(x)=1 \end{aligned} \]

so we can generate data with

gen.prod <- function(n=1e4) {
    f <- function(x) prod(x)
    x <- matrix(0, n, 10)
    for (i in 1:n) {
        repeat {
            y <- sort(runif(10))
            u <- runif(1, 0, 1)
            if (u < f(y)) {
              x[i, ] <- y
              break
            }
        }
    }
    x
}

One problem is that we can’t really check this anymore. Any check really needs the constant K, which we don’t know. However:

\[ \begin{aligned} &P(X_i\in (\frac{i}{11}-0.1,\frac{i}{11}+0.1)) = \\ &\int_{\frac1{11}-0.1}^{\frac1{11}+0.1}...\int_{\frac{10}{11}-0.1}^{\frac{10}{11}+0.1}K\prod x_i dx_{10}..dx_1 \approx \\ &K0.2^{10} \prod \frac{i}{11} = 1.4*10^{-11}K\\ \end{aligned} \]

now we can use simulation to estimate the same probability:

B <- 1e4
prod.x <- gen.prod(B)

x <- prod.x
for(i in 1:10) {
  x <- x[x[, i]>i/11-0.1 &x[, i]<i/11+0.1, ]
}
nrow(x)/1e4

## [1] 0.0083

and so

K <- (nrow(x)/1e4)/(1.4*10^(-11))
K

## [1] 592857143

so it seems \(K\approx 60\)million.

So we got something. There is still a problem with this approach: it is very slow! After all, \(M=10!=3628800.\)

Here we tried to estimate a probability. Another approach is to use a non-parametric density estimate, however in higher dimensions is is also a tricky problem.