generate2

Some Special Methods

Exponential Distribution - General Inverse Method

Example

say \(U\sim U[0, 1]\) and let \(\lambda > 0\). Set \(Y = -\lambda \log(U)\). Then

\[ F_Y(y)=P(Y<y)= \\ P(- \lambda \log U <y)=\\ P( \log U > -y/\lambda)=\\ P( U > \exp \left\{ -y/\lambda \right\} )=\\ 1-P( U < \exp \left\{ -y/\lambda \right\} )=\\ 1- \exp \left\{ -y/\lambda \right\} \\ f_Y(y)=\frac{d}{dx}F_Y(y)= 1/\lambda \exp \left\{ -y/\lambda \right\} \] so

\(Y\sim \text{Exp}(1/\lambda)\)

This is actually a special case of a general method called the *inverse method**:

let X be a continuous r.v. with cdf F. Let F^-1 be the generalized inverse of F, that is

\(F^{-1}(y) = \inf \{x: F(x) \ge y\}\)

Note that if F is strictly increasing the generalized inverse is just the regular inverse, and that

F(F^-1(x)) = x

Now say we want to generate a r.v. X with cdf F. Let \(U\sim U[0,1]\), then \(X = F^{-1}(U)\sim F\) because

Unfortunately the exponential is one of just a few applications of this method because it is one of the few r.v’s with an explicit formula for the cdf. We saw another at the end of the last section:

Example

We want to generate data from the density \(g(x)=1/x^2,x>1\). To do so using th iverse method we need the distribution function and its inverse:

\[ \begin{aligned} &G(x) = \int_1^x 1/t^2 dt = -1/t\vert_1^x = 1-1/x\\ &y = G(x) = 1-1/x \\ &G^{-1}(y) = 1/(1-y) \\ \end{aligned} \]

so here is a routine that generates data from g, where we also make use of the fact that if \(X\sim U[0,1]\), then \(1-X\sim U[0,1]\)

x=1/runif(1e4)
hist(x[x<10], 100, freq=FALSE, main="")
curve(1/x^2, 1, 10, add=TRUE, lwd=2, col="blue")

This example is useful when we need a density whith a very heavy tail, that is one that goes to 0 slowly.

It is however possible to write a general routine to generate data from a continuous univariate distribution using this method as follows:

say we want to generate data from a density f(x) on a finite interval [A, B]. First we need to find the cdf F, that is

\[ F(x) = \int_A^x f(t) dt \] because this can not (in general) be done analytically we will find F on a fine grid of points numerically. We could use the R function integrate for that:

m <- 1000 
x <- seq( A, B, length = m)
y <- rep(0, m)
for( i in 1:m) 
  y[i] <- integrate(f, A, [i])$value

alternatively (and much faster) we can use our own numerical integration routine:

y <- f(x)
F <- (x[2]-x[1])/6*cumsum((y[-1]+4*y[-2]+y[-3]))

which uses Simpon’s rule.

if f is not a proper , that is if \(\int_A^B f(t) dt \ne 1\), we can normalize it now very easily :

F = F/F(m)

If we need to evaluate F at an intermediate point we can use the R function approximate:

approx(x, F, xout = ...)$value

but to get the inverse function all we have to do is exchange x and F:

approx(F, x, xout = ...)$value

and finally the generation of a random variate is done with

approx(F, x, xout = runif(1))$value

All of this is done in the routine rpit:

rpit <- function (n, fun, A, B, New = TRUE) {
  if (New) {
      m <- min(2*n, 1000)
      x <- seq(A, B, length=m)
      y <- fun(x)
      z <- (x[2]-x[1])/6*cumsum((y[-1]+4*y[-2]+y[-3]))
      z <- z/max(z)
      y <- c(0, z)
      xyTmp <- cbind(x, y)
      assign("xyTmp", xyTmp, pos = 1)
  }
  approx(xyTmp[, 2], xyTmp[, 1], runif(n))$y
}

Here are some examples:

f <- function(x) 6*x*(1-x)
x <- rpit(1e4, f, 0, 1)
hist(x, 50, freq=FALSE, main="")
curve(f, 0, 1, col="blue", lwd=2, add=TRUE)

f <- function(x) 1+sin(2*pi*x)
x <- rpit(1e4, f, 0, 1)
hist(x, 50, freq=FALSE, main="")
I <- integrate(f, 0, 1)$value
curve(f(x)/I, 0, 1, col="blue", lwd=2, add=TRUE)

This routine has a lot of over-head, to generate just one variate we need to do 1000 function evaluations. On the other hand once we have found the F values we can store them, and from now on we all we need is the last line of the routine, so we get one variate for each call to runif!

The exponential has a relationship with some of the other r.v.s we have discussed and this can be used to generate some of them. For example

Binomial Distribution

Say we want to generate \(X\sim Bin(n,p)\). Now we know that if

Y₁,..,Y_n are iid Ber(p)

then

Y₁+..+Y_n \(\sim B(n,p)\)

so let

U_i\(\sim U[0,1]\), Y_i = I_(0,p)(U_i) and X = Y₁ +..+ Y_n, then

\(X\sim B(n, p)\)

Normal Distribution (Box-Muller algorithm)

Say U₁ and U₂ are iid U[0,1] and set

then X and Y are independent standard normal r.v.s

the Jacobian of this transform is:

The problem with this algorithm is that it requires the computation of the sin and the cos functions. Here is a similar and much faster algorithm:

generate U₁ and U₂ are iid U[0,1]
set V₁ = 2U₁ -1, V₂ = 2U₂ -1 and S =V₁² + V₂²
If S>1, return to step 1
otherwise set

then X and Y are iid standard normal. (This is called the polar method)