Generating Random Variables

General Methods

Random Numbers

Everything starts with generating X₁, X₂, .. iid U[0,1]. There are simply called random numbers. There are some ways to get these:

random number tables
numbers taken from things like the exact (computer) time
quantum random number generators
…

The R package random has the routine randomNumbers which gets random numbers from a web site which generates them based on (truely random) admospheric phenomena.

require(random)
randomNumbers(20, 0, 100)

##      V1 V2 V3 V4 V5
## [1,] 83 32 59 51 36
## [2,]  7 66 96 53 26
## [3,] 83 80 88 79 26
## [4,]  2  4 73 47 80

Pseudo-Random Numbers

These are numbers that look random, smell random …

Of course a computer can not do anything truly random, so all we can do is generate X₁, X₂, .. that appear to be iid U[0,1], so-called pseudo-random numbers.

Luckily, some people are really good at that!

Example

A linear congruential generator works as follows:

start with a seed X₀, calculate

X_n+1=(aX_n+c) mod m

where a, c and m are chosen such that

c and m are relatively prime
a-1 is divisible by all prome factors of m
a-1 is a multiple of 4 if m is a multiple of 4

A well known algorithm called PRNG in Numerical Recipes in C uses a=1664525, c=1013904223 and m=2³²

Some computer programs (like R) have them already built in, most general computer languages (like C) do not. There are many excellent ones availabe on the Internet.

Some issues to be aware of:

All pseudo random number generators are cyclic, that is there is an N such that X₁=X_N, X₂=X_N+1 etc. For any decent method we have N in the billions. For the one above N=m=2³²
All pseudo random number generators have a SEED, usually an integer. If you want to generate the same sequence you can do this by specifying this SEED.

in R if you want to generate the same sequence again then use the command set.seed(SEED) where SEED is an integer.

sample(1:3, size=10, replace=TRUE)

##  [1] 3 1 1 2 1 2 1 3 1 2

sample(1:3, size=10, replace=TRUE)

##  [1] 2 1 3 3 2 1 2 1 2 3

set.seed(1111)
sample(1:3, size=10, replace=TRUE)

##  [1] 2 2 3 1 3 3 3 1 2 1

set.seed(1111)
sample(1:3, size=10, replace=TRUE)

##  [1] 2 2 3 1 3 3 3 1 2 1

This can be very useful when writing a simulation and getting an error every 10000 or so runs!

There are often subtle differences between compilers so don’t expect the same program to generate the same sequence on different computers.

The Inverse Transform Method

Say we want to generate a random variable X from a distribution with \[ f(x_j)=P(X=x_j)=p_j \] j = 1, 2, 3, ..

Here is a simple algorithm to do this:

Step 1: generate \(U\sim U[0,1]\), set j = 1, p = p₁
Step 2: if U < p, set X = x_j, done
Step 3: set j = j+1, p = p+p_j, goto Step 2

Why this works:

Here we have p₁ < U < p₁ + p₂, so we set X = x₂

The routine gendisc1 runs this algorithm:

gendisc1 <- function (n, x, p) {
  y <- rep(0, n)
  m <- length(x)
  p <- p/sum(p)
  cdf <- cumsum(p)
  for(i in 1:n) {
    U <- runif(1)
    for(j in 1:m) {
        if(U < cdf[j]) {
            y[i] <- x[j]
            break
        }
    }
  }
  y
}
table(gendisc1(n=1000, x=letters[1:5], p=1:5))/1000

## 
##     a     b     c     d     e 
## 0.063 0.134 0.189 0.265 0.349

How this works:

generate data from the following rv:

P(X = 0) = 0.1, P(X = 1) = 0.3, P(X = 2) = 0.5 and P(X = 4) = 0.1

Now we generate U~U[0,1] and we get:

Case 1: U=0.0512, then we have the following:

so U < p₁, and we set X = x₁ = 0

Case 2: U=0.3502, then p₁ < U < p₁+p₂, and we set X = x₂ = 1

Case 3: U = 0.9542, then U > p₁+p₂+..+p_k-1, and we set X = x₄ = 4

Notice that the values of X (x₁,..) are almost irrelevant, in fact we can just generate data with values 1, 2, .., and in the end change the “labels”: “1”→x₁, “2”→x₂ etc.

Theorem the above algorithm generates the required rv.
proof
Remember that \(U\sim U[0,1]\), so P(U < x) = x if 0 < x < 1

P(X = x₁) = P(U < p₁) = p₁

say k>1, then

P(X = x_k) = P(p₁+..+p_k-1 < U < p₁+..+p_k) =
(p₁+..+p_k) - (p₁+..+p_k-1) = p_k

Example

Generate 100000 observations from \(X\sim Bin(10, 0.65)\)

Of course there is a routine in R built in to do this:

rbinom(10000, 10, 0.65)

You can check that the correct data was generated by comparing

tmp <- proc.time()
x <- table(rbinom(100000, 10, 0.65))
proc.time() - tmp

##    user  system elapsed 
##    0.02    0.00    0.01

y <- round(100000*dbinom(0:10, 10, 0.65))
rbind(x, y)

##   0  1   2    3    4     5     6     7     8    9   10
## x 2 51 461 2196 7070 15130 23679 25173 17619 7253 1366
## y 3 51 428 2120 6891 15357 23767 25222 17565 7249 1346

Or you can use our routine above:

tmp <- proc.time() 
x <- table(gendisc1(100000, 0:10, dbinom(0:10, 10, 0.65)))
proc.time() - tmp

##    user  system elapsed 
##    0.25    0.03    0.28

rbind(x, y)

##   0  1   2    3    4     5     6     7     8    9   10
## x 2 46 447 2140 6949 15370 23840 25138 17530 7217 1321
## y 3 51 428 2120 6891 15357 23767 25222 17565 7249 1346

So this works but is a bit slow. Can we speed it up? Consider this:

\(\text{dbinom}(0,10,0.65) = 2.76 \times 10^{-5}\), so we almost never choose 0, but we check it in the computer program every single time.

dbinom(6,10,0.65) - dbinom(5,10,0.65) = 0.2522 is the biggest interval and about 25% of the time U is in there, but in order to get there our program first needs to check 0, 1, 2, ,3 , 4 and 5.

This give us an idea for a slight improvement:

order p and x by decreasing size of p. Here is the table of x and p:

	x	p
1	0	0.000
2	1	0.001
3	2	0.004
4	3	0.021
5	4	0.069
6	5	0.154
7	6	0.238
8	7	0.252
9	8	0.176
10	9	0.072
11	10	0.013

Here is the same table, ordered by the p’s:

	x	p
1	7	0.252
2	6	0.238
3	8	0.176
4	5	0.154
5	9	0.072
6	4	0.069
7	3	0.021
8	10	0.013
9	2	0.004
10	1	0.001
11	0	0.000

so now if U < 0.252 we set x = 7, if 0.252 < U < 0.252+0.238 = 0.49 set x = 6 and so on

The routine gendisc2 does this.

gendisc2 <- function (n, x, p) {
  y <- rep(0, n)
  m <- length(x)
  x <- x[order(p, decreasing = TRUE)]
  p <- sort(p, decreasing = TRUE)
  print(rbind(x, p))
  p <- cumsum(p)
  for (i in 1:n) {
      U <- runif(1)
      for (j in 1:m) {
          if (U < p[j]) {
              y[i] <- x[j]
              break
          }
      }
  }
  y
}

Check

tmp <- proc.time() 
x <- table(gendisc2(100000, 0:10, 
                    dbinom(0:10, 10, 0.65)))

##     [,1]   [,2]   [,3]   [,4]    [,5]    [,6]   [,7]     [,8]     [,9]
## x 7.0000 6.0000 8.0000 5.0000 9.00000 4.00000 3.0000 10.00000 2.000000
## p 0.2522 0.2377 0.1757 0.1536 0.07249 0.06891 0.0212  0.01346 0.004281
##       [,10]     [,11]
## x 1.0000000 0.000e+00
## p 0.0005123 2.759e-05

proc.time() - tmp

##    user  system elapsed 
##    0.29    0.00    0.28

rbind(x, y)

##   0  1   2    3    4     5     6     7     8    9   10
## x 3 51 417 2120 6904 15334 23775 25327 17487 7180 1402
## y 3 51 428 2120 6891 15357 23767 25222 17565 7249 1346

rbinom is still much faster. Another way in R to speed things up is to “vectorize” the program.

Example

Generate observations from \(X\sim G(0.01)\).

Here we have the additional problem that the vector p is infinite.

Computers cannot handle infinitly large objects, so we need to “truncate” p. Here is one way to do this:

Find x₁ and x₂ such that P(x₁ < X < x₂) = 0.999999

qgeom(c(0.0000005, 0.9999995), 0.01) + 1

## [1]    1 1444

If U < 0.0000005 + P(X = 1) = 0.0100005, set x = 1

If U > 0.9999995, set x = 1444

otherwise do as above.

The Accept-Reject Algorithm

Suppose we have a method for generating a random variable having {q_j, j = 1, 2, ..} and we want to generate r.v. from a distribution with {p_j, j = 1, 2, ..}. We can do this by first simulating a r.v. Y from {q_j} and then “accepting” this simulated value with probability proportional to p_Y/q_Y.

Specifically, let c be a constant such that \(p_j/q_j \le c\) for all j such that p_j > 0. Then

Step 1: generate Y from {q_j}

Step 2: generate \(U\sim U[0,1]\)

Step 3: If U < p_Y/(cq_Y), set X = Y and stop. Otherwise go back to 1

Notation Y is often called an auxiliary variable. In a somewhat different context later it will also be called a proposal , and sometimes either of these terms is used.

Example

Say we want to generate a r.v X with values x in {1, 3, 5, 7} and probabilities p = (0.1, 0.5, 0.1, 0.3).

First we need a r.v. Y which is easy to generate and takes 4 values (not necessarily the same as in X though!). We can use for this the r.v. that chooses a number from 1 to 4 at random, using the sample(1:4, 1) command. This has q = (1/4, 1/4, 1/4, 1/4), so

p/q = (0.4, 2, 0.4, 1.2)

and if we set c = 2 we have

p_j/q_j \(\le\) c for all j.

with this the accept-reject algorithm for this problem is implemented in gendisc3:

gendisc3 <- function (n, x, p) {
  z <- rep(0, n)
  m <- length(x)
  q <- rep(1/m, m) # Uniform density
  f.c <- function(x) (p[x]/(2*q[x])) # p/(cq)
  for (i in 1:n) {
      for (j in 1:100) {
          Y <- sample(1:m, 1) # proposal rv
          if (runif(1) < f.c(Y)) {
              z[i] <- x[Y]
              break
          }
      }
  }
  z
}
tmp <- rbind(
  "True" = c(0.1, 0.5, 0.1, 0.3),
  "Simulation"=table(gendisc3(n=10000, 
      x=c(1, 3, 5, 7), 
      p=c(0.1, 0.5, 0.1, 0.3)))/10000)
kable.nice(tmp)

	1	3	5	7
True	0.1000	0.5000	0.1000	0.3000
Simulation	0.0996	0.4955	0.0997	0.3052

Why this works: if our “candidate” r.v Y picks a value y which has a high probability in p, it will often be accepted and we get many of these. If on the other hand Y picks a value which has a low probability in p, it will rarely be accepted and we get only a few of those. The method is illustrated in accrej.ill.

accrej.ill <- function (n) {
  x <- c(1, 3, 5, 7)
  p <- c(0.1, 0.5, 0.1, 0.3)
  q <- rep(1/4, 4)
  X <- rep(0, n)
  plot(c(0, 8), c(0, 1), type = "n", 
       xlab = "x", ylab = "P/(cq)")
  segments(x - 0.2, 2 * p, x + 0.2, 2 * p, lwd = 3)
  for (i in 1:n) {
      for (j in 1:100) {
          Y <- sample(1:4, 1)
          U <- runif(1)
          if (U < p[Y]/(2 * q[Y])) {
              X[i] <- x[Y]
              points(x[Y], U, pch = "x", col="blue")
              break
          }
          else {
              points(x[Y], U, pch = "o", col="red")
          }
      }
  }
  X
}

accrej.ill(20)

##  [1] 7 7 3 3 3 3 3 7 3 5 1 7 3 3 1 1 3 3 3 1

Theorem

The accept-reject algorithm generates a r.v. X such that

P(X = x_j) = p_j

In addition, the number of iterations of the algorithm needed until X is found is a geometric r.v. with mean c.

proof

\[ \begin{aligned} &P(Y=j, Y \text{ is accepted}) = \\ &P(Y=j)P(Y \text{ is accepted}|Y=j) = \\ &q_j\frac{p_j}{cq_j} = \frac{p_j}{c}\\ &P(Y \text{ is accepted}) = \\ &\sum_{j=1}^\infty P(Y=j, Y \text{ is accepted}) = \\ &\sum_{j=1}^\infty \frac{p_j}{c} =\frac1{c} \end{aligned} \]

Now each iteration is a Bernoulli trial with success probability 1/c, and successive trials are independent. Therefore the number of trials needed until the first success is a geometric r.v. with mean c. Also

\[ \begin{aligned} &P(X=x_j) = \\ &\sum_n P(j \text{ is accpeted at }n^{th} \text{ trial}) = \\ &\sum_n (1-1/c)^{n-1}\frac{p_j}{c} = p_j\\ \end{aligned} \]

Example

say we want to generate a rv X with P(X = k) = a_k, k = 1, 2, .., N for some fixed and known N > 1. (Of course the sample command with the argument prob = .. will do just that, but let’s write our own routine).

Note that we here we know the p_j’s only up to a constant.

For the rv Y we will simply use U[1, 2, .., N], that is

\[P(Y = k) = 1/N\]

Now \[ \begin{aligned} &P(X=j)=a_{j}, j=1,..,N; a_{j}\geq 0 \\ &p_{j}=\frac{a_{j}}{\sum a_{i}} \\ &P(Y=j)=\frac{1}{N}, j=1,..,N \\ &\frac{p_{j}}{q_{j}}=\frac{Na_{j}}{\sum a_{i}} \\ &\max \{\frac{p_{j}}{q_{j}}\}=\max \{\frac{Na_{j}}{\sum \\ a_{i}}\}=\frac{N\max \{a_{j}\}}{\sum a_{i}} \\ &\frac{p_{j}}{cq_{j}}=\left( \frac{Na_{j}}{\sum a_{i}}\right) /\left( \frac{ N\max \{a_{j}\}}{\sum a_{i}}\right) =\\ &\frac{a_{j}}{\max \{a_{j}\}} \end{aligned} \] so we get that we don’t actually need to find \(\sum a_i\)! In general we don’t need to have the probabilities “normalized”, that is sum up to 1.

Now for the routine:

N <- 6
a <- c(1, 3, 1, 6, 10, 4)
n <- 10000
x <- rep(0, n)
counter <- 0
for (i in 1:n) {
  repeat {
    counter <- counter+1
    y <- sample( 1:N, 1)
    if (runif(1) <= a[y]/max(a)) {
       x[i] <- y
       break
    }
  }  
}
out <- data.frame( 
  "Simulation"=c(table(x)),
  "Exact"=a/sum(a)*n)/n
rownames(out) <- 1:N
kable.nice(out)

	Simulation	Exact
1	0.0400	0.04
2	0.1177	0.12
3	0.0364	0.04
4	0.2411	0.24
5	0.3999	0.40
6	0.1649	0.16

Also the theorem says that a new X is generated every c tries, and the routine shows this to be true:

c(counter/n, N*max(a)/sum(a))

## [1] 2.391 2.400

Example

say we want to generate r.v.’s X such that

\(P(X = k) = (\frac12)^k, k=1,2, ...\)

We have the same issue as before, namely that we need a cut-off, an x so large that \(P(X>x)<0.0001\) (say):

\[ \begin{aligned} &0.0001 = P(X>x) = \sum_{i=x+1}^\infty (\frac12)^i =\\ &1-\sum_{i=1}^{x} (\frac12)^i = 1-(\sum_{i=0}^{x} (\frac12)^i-1)\\ &2- \frac{1-(\frac12)^{x+1}}{1-(\frac12)}\\ &2-(2-2(\frac12)^{x+1}) =(\frac12)^{x} \\ &x=\frac{\log 0.0001}{\log \frac12} = 13.3 \end{aligned} \]

and so we can generate this with

N <- 14
a <- (1/2)^(1:N)
n <- 10000
x <- rep(0, n)
counter <- 0
for (i in 1:n) {
  repeat {
    counter <- counter+1
    y <- sample(1:N, 1)
    if (runif(1) <= a[y]/max(a)) {
       x[i] <- y
       break
    }
  }  
}
tbl.x <- c(table(x))/n
out <- data.frame( 
  "Simulation"=tbl.x,
  "Exact"=(a/sum(a))[1:length(tbl.x)])
rownames(out) <- (1:N)[1:length(tbl.x)]
kable.nice(out)

	Simulation	Exact
1	0.4982	0.5000
2	0.2544	0.2500
3	0.1243	0.1250
4	0.0625	0.0625
5	0.0296	0.0313
6	0.0150	0.0156
7	0.0069	0.0078
8	0.0042	0.0039
9	0.0030	0.0020
10	0.0014	0.0010
11	0.0002	0.0005
12	0.0003	0.0002

Notice that this is very inefficient, though. Half the time Y>6, but any of these is accepted only with probability

1-sum((a/sum(a))[1:6])

## [1] 0.01556

and the avergage number of iterations to generate one rv is fairly large:

counter/n

## [1] 6.932

so we can speed things up a lot. For example let’s use sample(1:N, 1, prob=N:1). Now

\[ \begin{aligned} &\sum_1^N q_j = \sum_1^N (N+1-j) = \\ &\sum_1^N j = \frac{N(N+1)}{2}\\ &\frac{p_j}{q_j} = \frac{0.5^j}{(N+1-j)/(N(N+1)/2)} = \\ &\frac{N(N+1)0.5^{j-1}}{N+1-j}\\ &c=\max\{p_j/q_j\}=p_1/q_1=N+1\\ &\frac{p_j}{cq_j} = \frac{N0.5^{j-1}}{N+1-j} \end{aligned} \]

f.cg <- function(x) N*0.5^(x-1)/(N+1-x)
N <- 14
f.cg <- function(x) N*0.5^(x-1)/(N+1-x)
a <- (1/2)^(1:N)
n <- 10000
x <- rep(0, n)
counter <- 0
for (i in 1:n) {
  repeat {
    counter <- counter+1
    Y <- sample(1:N, 1, prob=N:1)
    if (runif(1) <= f.cg(Y)) {
       x[i] <- Y
       break
    }
  }  
}
tbl.x <- c(table(x))/n
out <- data.frame( 
  "Simulation"=tbl.x,
  "Exact"=(a/sum(a))[1:length(tbl.x)])
rownames(out) <- (1:N)[1:length(tbl.x)]
kable.nice(out)

	Simulation	Exact
1	0.4981	0.5000
2	0.2546	0.2500
3	0.1190	0.1250
4	0.0681	0.0625
5	0.0272	0.0313
6	0.0157	0.0156
7	0.0084	0.0078
8	0.0044	0.0039
9	0.0025	0.0020
10	0.0008	0.0010
11	0.0003	0.0005
12	0.0005	0.0002
13	0.0003	0.0001
14	0.0001	0.0001

and now the average is

counter/n

## [1] 3.746

Example

say we want to generate r.v.’s X such that

\(P(X = k) = 6/(\pi^2 k^2), k=1,2, ...\)

Recall

\(\sum_{k=1}^\infty \frac1{k^2} = \frac{\pi^2}6\)

We need a r.v Y on {1,2,..} which we can generate. There are two discrete rv’s we know which have infinitely many values, the geometric and the Poisson. But there is a problem with those two:

Poisson:

\[\frac{p_j}{q_j}=\frac{6j^2/\pi^2}{\lambda^j/j!\exp(-\lambda)}\rightarrow \infty\]

Geometric

\[\frac{p_j}{q_j}=\frac{6j^2/\pi^2}{pq^j}\rightarrow \infty\] so \(c=\infty\) in either case.

We do know, though, a continuous r.v. that goes to 0 very slowly, namely the Cauchy. Actually, its \(f(x)=1/(\pi (1+x^2))\) already has the right “size” for our problem. We can do this, then by “discretizing” the Cauchy: Let \(Z\sim \text{Cauchy}\) and define

Y = i(I_[-i,-i+1](Z)+I_[i-1,i](Z)) for i=1,2,..

if |Z|<1 → Y=1

if 1<|Z|<2 → Y=2

and so on. Now

\[ \begin{aligned} &q_i = P(Y=i) =\\ &P(-i<Z<-i+1 \text{ or } i-1<Z<i) = \\ &2P(i-1<Z<i) = \\ &2\int_{i-1}^i \frac1{\pi(1+z^2)}dz=\\ &\frac2\pi \left(\arctan(i)-\arctan(i-1) \right) \end{aligned} \]

Now we need max{p_j/q_j}. Doing this via calculus would be quite difficult because we would end up with a nonlinear equation which we would need to solve numerically. A different approach is to just plot j vs. p_j/q_j and see what it looks like:

gendisc4 <- function (n, findc = FALSE) {
  if (findc) {
      i <- 1:n
      p <- 6/pi^2/i^2
      q <- 2/pi * (atan(i) - atan(i - 1))
      plot(i, p/q)
      return(max(p/q))
  }
  x <- rep(0, n)
  const <- 3/pi/1.216
  for (i in 1:n) {
      for (j in 1:100) {
          z <- rcauchy(1)
          y <- floor(abs(z)) + 1
          u <- runif(1)
          if (u <= const/(y^2*(atan(y)-atan(y-1)))) {
              x[i] <- y
              break
          }
      }
  }
  table(x)/n
}
gendisc4(100, TRUE)

## [1] 1.216

We see that p_j/q_j appears to approach a limit a little below 1 as j goes to infinity, but it has a value of 1.216 at j=1, so we can reasonably guess that c = 1.216 should work for us. (Of course we can verify all that by taking derivatives and using de L’Hospital’s rule).

With this we can generate these r.v.’s, again using gendisc4:

tmp <- gendisc4(10000)
head(tmp)

## x
##      1      2      3      4      5      6 
## 0.6091 0.1531 0.0675 0.0377 0.0238 0.0175

tail(tmp)

## x
##  2148  2299  2708  3054  6774 13860 
## 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04

Alternatively we could of course have used the inverse transform method, but notice that in this r.v. occasionally we see very large values, and so the inverse transform method would be quite slow.

In the theorem above it says that the mean number of trials until the first success, which is the number of “tries” until we find an acceptable candidate, has a geometric rv with mean c. Obviously the smaller c is, the faster we find a Y that is accepted, the faster we generate our observations.

Example

say we want to generate data from a discrete rv f with \(P(X = x) = K\exp (- x^{3/2} )\), x = 1, 2, …

First we need the constant K. To find it we will need to use R. Not that \(\exp (- x^{3/2} )\) goes to 0 very fast, so we don’t need very many terms:

x <- 1:25
K <- 1/sum(exp(-x^(3/2)))
K

## [1] 2.31

Let’s assume we know how to generate data from a Geom(0.5). Then

\[ \begin{aligned} &\frac{\exp(-x^{3/2})}{(1/2)^{x}} =\\ &\frac{\exp(-x^{3/2})}{\exp (-x\log (2))}=\\ &\exp \left( x\log (2)-x^{3/2} \right) \\ & \frac{d}{dx} \left\{ \exp \left( x\log (2)-x^{3/2} \right) \right\}= \\ &\exp \left( x\log (2)-x^{3/2} \right) \left( \log(2) - \frac32 \sqrt{x}\right)<0 \end{aligned} \]

so max{ p_j/q_j } = p₁/ q₁ = exp(-1)/0.5 = 0.7357589

gendisc5 <- function(n=1e4) {
  K1 <- 1/(K*exp(-1)/0.5)
  f.c <- function(x) 
    K1*exp(-x^(3/2))/0.5^x
  x <- rep(0, n)
  for(i in 1:n) {
    repeat {
      y <- rgeom(1, 0.5)+1
      if(runif(1)<f.c(y)) {
        x[i] <- y
        break
      }
    }
  }
  x
}

x <- table(gendisc5())/1e4
p <- K*exp(-(seq_along(x))^(3/2))
tmp <- data.frame("Theory"=round(p, 4),
                  "Simulation"=round(as.numeric(x), 4))
kable.nice(tmp, do.row.names = FALSE)

Theory	Simulation
0.8499	0.8456
0.1365	0.1417
0.0128	0.0121
0.0008	0.0006

Continuous Distributions

This is very similar (actually the same) as the method for discrete r.v. Assume want to generate a r.v. X with f. We have a way to generate a r.v. Y with g. Let c be a constant such that

f(x)/g(x) \(\le\) c for all x.

Then the accept-reject algorithm is as follows:

Step 1: generate Y from pdf g
Step 2: generate \(U\sim U[0,1]\)
Step 3: If U < f(Y)/(cg(Y)), set X = Y and stop. Otherwise go back to 1

We have the same theorem as for the discrete case:

Theorem

The accept-reject algorithm generates a r.v. X with f.

In addition, the number of iterations of the algorithm needed until X is found is a geometric r.v. with mean c.

proof same as above.

Note: we do have to be careful because of course g(x) can be 0.

This is ok as long as f(x) is 0 as well but not if f(x) > 0.

Basically we need Y to live on the same set as X. (We say X and Y have the same support)

Example

generate a r.v. X with f(x) = 6x(1-x) 0 < x < 1.

Here we can use \(Y\sim U[0,1]\), with g(x) = 1.

Let’s find max{f(x)/g(x) | 0 < x < 1}. Taking derivatives we find

d/dx {6x(1-x)} = 6-12x = 0, x = 1/2.

This is a maximum (?) so we have

\[ c=\max \left\{ \frac{f(x)}{g(x)};0<x<1\right\} =f(\frac{1}{2})=\frac{3}{2} \\ \frac{f(x)}{cg(x)} = \frac{6x(1-x)}{3/2*1}/ = 4x(1-x) \]

So here is the routine:

gencont1 <- function(n, findc = FALSE, Show = FALSE) {
  if (findc) {
      x <- seq(0, 1, length = 100)
      plot(x, 6 * x * (1 - x), type = "l")
      return(max(6 * x * (1 - x)))
  }
  counter <- 0
  X <- rep(0, n)
  for (i in 1:n) {
      repeat {
          counter <- counter+1
          Y <- runif(1)
          if (runif(1) <= 4*Y*(1-Y)) {
              X[i] <- Y
              break
          }
      }
  }
  if (Show) {
      hist(X, 50, freq = FALSE, main="")
      x <- seq(0, 1, length=250)
      lines(x, 6*x*(1-x),
            lwd=2, col="blue")
      cat("Average number of tries: ", counter/n)
  }
  X
}
tmp <- gencont1(10000, Show=TRUE)

## Average number of tries:  1.501

why the accept-reject algorithm works is illustrated in accrej1.ill

accrej1.ill <- function(n) {
  f <- function(x) 6*x*(1-x)
  g <- function(x) 1
  c <- 3/2
  x <- seq(0, 1, length = 100)
  plot(x, f(x)/(c * g(x)), 
       xlab = "x", ylab = "f/(cg)", type = "l",
        xlim = c(0, 1), ylim = c(0, 1))
  for(i in 1:n) {
    Y <- runif(1)
    U <- runif(1)
    if (U <= f(Y)/(c * g(Y))) {
      points(Y, U, pch = "X", col = 3)
    }
    else {
      points(Y, U, pch = "0", col = 2)
    }
  }  
    
}
accrej1.ill(50)

Example

we want to generate data from \(f(x)=6x^5,0<x<1\). If we use \(Y\sim U[0,1]\) we find

\[c=\max\{\frac{f(x)}{g(x)}\}=6\]

and so

gencont2 <- function(n=1e4, Show = FALSE) {
  counter <- 0
  X <- rep(0, n)
  for (i in 1:n) {
      repeat {
          counter <- counter+1
          Y <- runif(1)
          if (runif(1) <= Y^5) {
              X[i] <- Y
              break
          }
      }
  }
  if (Show) {
      hist(X, 50, freq = FALSE, main="")
      x <- seq(0, 1, length=250)
      lines(x, 6*x^5,
            lwd=2, col="blue")
      cat("Average number of tries: ", counter/n)
  }
  X
}
tmp <- gencont2(Show=TRUE)

## Average number of tries:  6.034

As we see this works but is quite slow, we need on average six tries to find an x. We can try to speed things up by using a g that “matches” the f a bit better. Let’s consider the following g: \(g(x)=1/5\) if \(0<x<0.5\) and \(9/5\) if \(x>0.5\). Here is how this looks:

f <- function(x) 6*x^5
g <- function(x) {
  ifelse(x<0.5, 1/5, 9/5)
}
integrate(g, 0, 1)$value

## [1] 1

curve(f(x), 0, 1, ylim=c(0, 2))
curve(g(x), 0, 1, add=TRUE)

Y’s like this we can generate as follows

rstep <- function(n) {
  z <- sample(0:1, size=n, replace = TRUE,
          prob=c(1/5, 9/5))
  y <- ifelse(z==0,
         runif(n, 0, 0.5),
         runif(n, 0.5, 1))
  y
}
hist(rstep(1e4), 50 ,freq=FALSE,main="")
curve(g(x), 0, 1, lwd=2, col="blue",add=TRUE)

Now

curve(f(x)/g(x), 0, 1)

shows that that the maximum is at x=1, or

f(1)/g(1)

## [1] 3.333

so we find

gencont2alt <- function(n=1e4, Show=FALSE) {
  f.cg <- function(x) f(x)/(3.334*g(x))
  X <- rep(0, n)
  counter <- 0
  for (i in 1:n) {
      repeat {
          counter <- counter+1
          Y <- rstep(1)
          if(runif(1) <= f.cg(Y)) {
              X[i] <- Y
              break
          }
      }
  }
  if (Show) {
      hist(X, 50, freq = FALSE, main="")
      x <- seq(0, 1, length=250)
      lines(x, 6*x^5,
            lwd=2, col="blue")
      cat("Average number of tries: ", counter/n)
  }
  X
}
tmp <- gencont2alt(10000, Show=TRUE)

## Average number of tries:  3.364

and so we have cut the nummer of tries by half!

There is nothing special about this g, we could use any step function with other step sizes and improve this even further.

Example

We want to generate a r.v. \(X\sim \text{Gamma}(3/2, 1)\).

Now f(x) > 0 for x > 0, so we need a Y that “lives” on \((0, \infty)\) and that we already know how to generate. Let’s say we know how to generate \(Y\sim \text{Exp} (1/\lambda)\).

What value of \(\lambda\) should we use? Note that

EX = 3/2

EY = \(\lambda\)

so maybe \(\lambda=3/2\) is a good idea.

with this we need to find c. Again we will try to find max{f(x)/g(x)}. We have

and so

The routine is implemented in gencont2:

gencont2 <- function (n, findc = F, Show = FALSE) {
  if (findc) {
    x <- seq(0, 10, length = 100)
    plot(x, 3/sqrt(pi)*sqrt(x)*exp(-x/3), type = "l")
    return(max(3/sqrt(pi) * sqrt(x) * exp(-x/3)))
  }
  X <- rep(0, n)
  for (i in 1:n) {
    for (j in 1:100) {
      Y <- rexp(1, 2/3)
      if(runif(1)<=1.34617*sqrt(Y)*exp(-Y/3)) {
          X[i] <- Y
          break
      }
   }
  }
  if (Show) {
        hist(X, breaks=100, freq=FALSE, main="")
        x <- seq(0, 10, length=250)
        lines(x, dgamma(x, 3/2, 1),
            lwd=2, col="blue")
    }
    X
}
tmp <- gencont2(10000, Show = TRUE)

why the accept-reject algorithm works for this example is illustrated on this example in

accrej2.ill <- function (n) {
  f <- function(x) dgamma(x, 3/2, 1)
  g <- function(x) 2/3*exp(-2/3*x)
  c <- 3^(3/2)/sqrt(2*pi*exp(1))
  x <- seq(0, 10, length = 100)
  par(mfrow = c(1, 2))
  plot(x, f(x), 
       xlab = "x", ylab = "", type = "l", 
       ylim = c(0, 2/3), col = 2, lwd = 2)
  lines(x, g(x), col = 3, lwd = 2)
  legend(3, 2/3, c("f", "g"), lty=c(1, 1), col=c(2, 3))
  plot(x, f(x)/(c * g(x)), 
       xlab = "x", ylab = "f/(cg)", type = "l")
  for (i in 1:n) {
      Y <- rexp(1, 2/3)
      U <- runif(1)
      if (U <= f(Y)/(c * g(Y))) {
          points(Y, U, pch = "X", col = 3)
      }
      else {
          points(Y, U, pch = "0", col = 2)
      }
  }

}
accrej2.ill(50)

Above we picked \(\lambda = 3/2\) because this matched up the means of X and Y, a reasonable choice. But is it an optimal choice? Let’s see:

Optimal here means a choice of \(\lambda\) that minimizes c. Now if we repeat the above calculation with \(\lambda\) instead of 3/2 we find

This is the minimum (?) and so our choice was in fact optimal.

Generating Random Vectors

as one might expect, generating data from random vectors is generally harder than for one dimensional random variables. To begin with though, at least for the case of finite rv’s there is nothing new:

say we have X = (X₁, .., X_n) where X_j takes values

\[ x_{j1}\quad ... \quad x_{j n_{j}}\quad \]

Then all we need to do is generate data from the random variable X’ with values

\[ x_{11}\quad x_{1n_{1}}\quad x_{21}\quad ... \quad x_{n n_{n}}\quad \] and their respective probabilities.

Example generate data from the random vector (X,Y) with

	y=0	y=1
x=0	0.1	0.5
x=1	0.2	0.2

Instead generate data from X’ with values 1-4 and probabilities (0.1, 0.5, 0.2, 0.2). Then

if X’ = 1 set X = 0, Y = 0
if X’ = 2 set X = 0, Y = 1
if X’ = 3 set X = 1, Y = 0
if X’ = 4 set X = 1, Y = 1

n <- 1e6
xprime <- sample(1:4, size=n, 
        replace=TRUE, prob=c(0.1, 0.5, 0.2, 0.2))
round(table(xprime)/n, 4)

## xprime
##      1      2      3      4 
## 0.0996 0.4999 0.2003 0.2002

x <- rep(0, n)
y <- x
x[xprime == 3 | xprime == 4] <- 1
y[xprime == 2 | xprime == 4] <- 1
round(table(x,y)/n, 4)

##    y
## x        0      1
##   0 0.0996 0.4999
##   1 0.2003 0.2002

Example

generate data from the random vector (X,Y,Z) with

\(f(x, y, z) = (x+y+z)/162, x, y, z \in \{1, 2, 3\}\)

Note that there are 27 different combinations of values of (x, y, z), so we begin by generating data from a random variable X’ with values 1 - 27 and probabilities as above.

gendisc5 <- function(i, j, k, n=10000) {
    xyz <- matrix(0, 27, 3)
    xyz[, 1] <- rep(1:3, 9)
    xyz[, 2] <- rep(1:3, each = 3)
    xyz[, 3] <- rep(1:3, rep(9, 3))
    p <- apply(xyz, 1, sum)/162
    xprime <- sample(1:27, size=n, 
                 replace=TRUE, prob=p)
    x <- xyz[, 1][xprime]
    y <- xyz[, 2][xprime]
    z <- xyz[, 3][xprime]
    a <- 1:n
    a1 <- a[x == i]
    a2 <- a[y == j]
    a3 <- a[z == k]
    a <- table(c(a1, a2, a3))
    a <- as.numeric(names(a[a == 3]))
    c(length(a)/n, (i + j + k)/162)
}
gendisc5(1, 1, 2)

## [1] 0.02430 0.02469

For infinite discrete and for continous random vectors we still have the Accept-Reject algorithm:

Example

say we want to generate data from a continuous rv (X, Y, Z) with

\(f(x, y, z) = 4xy; 0 \le x,y,z \le 1\).

Here we can generate data from (U₁, U₂, U₃) = 1 on [0,1]³. Now

c = max{f(x,y,z)/g(x,y,z)} = max{4xy/1} = 4

One problem in the multivariate case is to make sure that our program generates the correct data. One idea is to check the histograms of the marginals, but of course this is not sufficient proof that there is no mistake. Here the marginals are given by

The algorithm is in gen_xyz(1).

gen_xyz <- function (which = 1, n = 10000) {
  xyz <- matrix(0, n, 3)
  if(which==1) {
      for (i in 1:n) {
         repeat {
            u <- runif(3)
            if (runif(1) <= u[1] * u[2]) 
               break
          }
          xyz[i, ] <- u
      }
  }
  if(which==2) {
    for (i in 1:n) {
      repeat {
        u <- runif(1)
        if (runif(1) <= u) 
          break
      }
      xyz[i, 1] <- u
      repeat {
        u <- runif(1)
        if(runif(1)<=u) 
          break
      }
      xyz[i, 2] <- u
    }
    xyz[, 3] <- runif(n)
  }
  par(mfrow = c(2, 2))
  hist(xyz[, 1], breaks = 100, 
       xlab = "X", ylab = "", freq = FALSE, 
        main = "")
  abline(0, 2)
  hist(xyz[, 2], breaks = 100, 
       xlab = "Y", ylab = "", freq = FALSE, 
        main = "")
  abline(0, 2)
  hist(xyz[, 3], breaks = 100, 
       xlab = "Z", ylab = "", freq = FALSE, 
        main = "")
  abline(h = 1)
  
}
gen_xyz(1)

In this case of course X,Y and Z are independent because

f(x, y, z) = f_X(x) f_Y(y) f_Z(z)

so we can just generate data from the marginals, see gen_xyz(2).

Another idea is to generate the data from conditional distributions:

Example

Say (X, Y) is a discrete rv with joint

f(x, y) = (1-p)²p^x, x,y \(\in \{0, 1, ..\}, y \le x\), and 0 < p < 1.

Note that

so we have that

Y = G-1

and

X|Y=y = G+y-1

where \(G\sim G(1-p)\). The method is implemented in gen_px.

gen_px <- function(p, n = 10000) {
  y <- rgeom(n, 1 - p)
  x <- rgeom(n, 1 - p) + y
  z <- table(x, y)/n
  xvals <- sort(unique(x))
  yvals <- sort(unique(y))
  z1 <- matrix(0, length(xvals), length(yvals))
  dimnames(z1) <- list(xvals, yvals)
  for(i in 1:length(xvals)) 
    for (j in 1:length(yvals)) 
      if(yvals[j]<= xvals[i]) 
        z1[i, j] <- (1-p)^2*p^xvals[i]
  print(round(z1, 3))
  print(round(z, 3))
  z
}
gen_px(p=0.2)

##       0     1     2     3     4 5 6
## 0 0.640 0.000 0.000 0.000 0.000 0 0
## 1 0.128 0.128 0.000 0.000 0.000 0 0
## 2 0.026 0.026 0.026 0.000 0.000 0 0
## 3 0.005 0.005 0.005 0.005 0.000 0 0
## 4 0.001 0.001 0.001 0.001 0.001 0 0
## 5 0.000 0.000 0.000 0.000 0.000 0 0
## 6 0.000 0.000 0.000 0.000 0.000 0 0
##    y
## x       0     1     2     3     4     5     6
##   0 0.643 0.000 0.000 0.000 0.000 0.000 0.000
##   1 0.125 0.130 0.000 0.000 0.000 0.000 0.000
##   2 0.025 0.024 0.026 0.000 0.000 0.000 0.000
##   3 0.004 0.005 0.005 0.005 0.000 0.000 0.000
##   4 0.002 0.001 0.002 0.001 0.001 0.000 0.000
##   5 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##   6 0.000 0.000 0.000 0.000 0.000 0.000 0.000

##    y
## x        0      1      2      3      4      5      6
##   0 0.6430 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
##   1 0.1246 0.1300 0.0000 0.0000 0.0000 0.0000 0.0000
##   2 0.0253 0.0243 0.0260 0.0000 0.0000 0.0000 0.0000
##   3 0.0043 0.0048 0.0051 0.0052 0.0000 0.0000 0.0000
##   4 0.0017 0.0006 0.0015 0.0010 0.0010 0.0000 0.0000
##   5 0.0004 0.0001 0.0003 0.0000 0.0002 0.0002 0.0000
##   6 0.0001 0.0001 0.0000 0.0000 0.0000 0.0001 0.0001

Example

we want to generate data from \(f(x,y)=8xy,0<x<y<1\). We can use a uniform on \([0,1]^2\) as proposal density. Then

\[ \begin{aligned} &\frac{f(x,y)}{g(x,y)} = \frac{8xy}{1}=8xy\\ &c = \max\{\frac{f(x,y)}{g(x,y)};0<x<y<1\} = f(1,1)=8\\ &\frac{f(x,y)}{cg(x,y)} = xy\text{, }0<x<y<1\\ \end{aligned} \]

and so

gencont3a <- function(n=1e4) {
  f.c <- function(x) ifelse(x[1]<x[2], x[1]*x[2], 0)
  x <- matrix(0, n, 2)
  for(i in 1:n) {
    repeat {
      u <- runif(2)
      if(runif(1)<f.c(u)) {
        x[i, ] <- u
        break
      }
    }
  }
  x
}

we saw before that the marginals are given by

\[ \begin{aligned} &f_X(x) = 4x(1-x^2),0<x<1\\ &f_Y(x) = 4x^3,0<x<1 \end{aligned} \]

so we can check

x <- gencont3a()
plot(x)

hist(x[,1], 50, freq=FALSE)
curve(4*x*(1-x^2), 0, 1, 
      col="blue", lwd=2, add=TRUE)

hist(x[,2], 50, freq=FALSE)
curve(4*x^3, 0, 1, 
      col="blue", lwd=2, add=TRUE)

Notice however that half of the time u will be rejected because \(u[1]>u[2]\), so we can improve the routine by generating only those. This will be a uniform on the upper triangle, and so has density \(g(u,v)=2\):

gencont3b <- function(n=1e4) {
  f.c <- function(x) x[1]*x[2]/2
  x <- matrix(0, n, 2)
  for(i in 1:n) {
    repeat {
      u <- runif(2)
      if(u[1]>u[2]) u <- u[2:1]
      if(runif(1)<f.c(u)) {
        x[i, ] <- u
        break
      }
    }
  }
  x
}

x <- gencont3b()
hist(x[,1], 50, freq=FALSE)
curve(4*x*(1-x^2), 0, 1, 
      col="blue", lwd=2, add=TRUE)

hist(x[,2], 50, freq=FALSE)
curve(4*x^3, 0, 1, 
      col="blue", lwd=2, add=TRUE)

Example

say we have a 10-dimensional rv with joint pdf

\[f(x_1, ..,x_{10}) = c \prod x_i, 0<x_1<x_2< ... < x_{10}<1\]

For the methods we know sofar we need c:

so this is ugly. Doable, but ugly. Of course, if we needed this for 100 instead of 10… It turns out, though, that we can actually generate such data even without knowing c, but this discussion has to wait a bit.