say \(U\sim U[0, 1]\) and let \(\lambda > 0\). Set \(Y = -\lambda \log(U)\). Then
\[ F_Y(y)=P(Y<y)= \\ P(- \lambda \log U <y)=\\ P( \log U > -y/\lambda)=\\ P( U > \exp \left\{ -y/\lambda \right\} )=\\ 1-P( U < \exp \left\{ -y/\lambda \right\} )=\\ 1- \exp \left\{ -y/\lambda \right\} \\ f_Y(y)=\frac{d}{dx}F_Y(y)= 1/\lambda \exp \left\{ -y/\lambda \right\} \] so
\(Y\sim \text{Exp}(1/\lambda)\)
This is actually a special case of a general method called the *inverse method**:
let X be a continuous r.v. with cdf F. Let F-1 be the generalized inverse of F, that is
\(F^{-1}(y) = \inf \{x: F(x) \ge y\}\)
Note that if F is strictly increasing the generalized inverse is just the regular inverse, and that
F(F-1(x)) = x
Now say we want to generate a r.v. X with cdf F. Let \(U\sim U[0,1]\), then \(X = F^{-1}(U)\sim F\) because
Unfortunately the exponential is one of just a few applications of this method because it is one of the few r.v’s with an explicit formula for the cdf. We saw another at the end of the last section:
We want to generate data from the density \(g(x)=1/x^2,x>1\). To do so using th iverse method we need the distribution function and its inverse:
\[ \begin{aligned} &G(x) = \int_1^x 1/t^2 dt = -1/t\vert_1^x = 1-1/x\\ &y = G(x) = 1-1/x \\ &G^{-1}(y) = 1/(1-y) \\ \end{aligned} \]
so here is a routine that generates data from g, where we also make use of the fact that if \(X\sim U[0,1]\), then \(1-X\sim U[0,1]\)
x=1/runif(1e4)
hist(x[x<10], 100, freq=FALSE, main="")
curve(1/x^2, 1, 10, add=TRUE, lwd=2, col="blue")
This example is useful when we need a density whith a very heavy tail, that is one that goes to 0 slowly.
It is however possible to write a general routine to generate data from a continuous univariate distribution using this method as follows:
say we want to generate data from a density f(x) on a finite interval [A, B]. First we need to find the cdf F, that is
\[ F(x) = \int_A^x f(t) dt \] because this can not (in general) be done analytically we will find F on a fine grid of points numerically. We could use the R function integrate for that:
m <- 1000
x <- seq( A, B, length = m)
y <- rep(0, m)
for( i in 1:m)
y[i] <- integrate(f, A, [i])$value
alternatively (and much faster) we can use our own numerical integration routine:
y <- f(x)
F <- (x[2]-x[1])/6*cumsum((y[-1]+4*y[-2]+y[-3]))
which uses Simpon’s rule.
if f is not a proper , that is if \(\int_A^B f(t) dt \ne 1\), we can normalize it now very easily :
F = F/F(m)
If we need to evaluate F at an intermediate point we can use the R function approximate:
approx(x, F, xout = ...)$value
but to get the inverse function all we have to do is exchange x and F:
approx(F, x, xout = ...)$value
and finally the generation of a random variate is done with
approx(F, x, xout = runif(1))$value
All of this is done in the routine rpit:
rpit <- function (n, fun, A, B, New = TRUE) {
if (New) {
m <- min(2*n, 1000)
x <- seq(A, B, length=m)
y <- fun(x)
z <- (x[2]-x[1])/6*cumsum((y[-1]+4*y[-2]+y[-3]))
z <- z/max(z)
y <- c(0, z)
xyTmp <- cbind(x, y)
assign("xyTmp", xyTmp, pos = 1)
}
approx(xyTmp[, 2], xyTmp[, 1], runif(n))$y
}
Here are some examples:
f <- function(x) 6*x*(1-x)
x <- rpit(1e4, f, 0, 1)
hist(x, 50, freq=FALSE, main="")
curve(f, 0, 1, col="blue", lwd=2, add=TRUE)
f <- function(x) 1+sin(2*pi*x)
x <- rpit(1e4, f, 0, 1)
hist(x, 50, freq=FALSE, main="")
I <- integrate(f, 0, 1)$value
curve(f(x)/I, 0, 1, col="blue", lwd=2, add=TRUE)
This routine has a lot of over-head, to generate just one variate we need to do 1000 function evaluations. On the other hand once we have found the F values we can store them, and from now on we all we need is the last line of the routine, so we get one variate for each call to runif!
The exponential has a relationship with some of the other r.v.s we have discussed and this can be used to generate some of them. For example
Say we want to generate \(X\sim Bin(n,p)\). Now we know that if
Y1,..,Yn are iid Ber(p)
then
Y1+..+Yn \(\sim B(n,p)\)
so let
Ui\(\sim U[0,1]\), Yi = I(0,p)(Ui) and X = Y1 +..+ Yn, then
\(X\sim B(n, p)\)
Say U1 and U2 are iid U[0,1] and set
then X and Y are independent standard normal r.v.s
the Jacobian of this transform is:
The problem with this algorithm is that it requires the computation of the sin and the cos functions. Here is a similar and much faster algorithm:
then X and Y are iid standard normal. (This is called the polar method)