Solution to Homework 8

Problem 1

Generate data from the random vector (X,Y) with joint density \(f(x,y)=\frac14(2+x+y),-1<x<y<1\) using two different methods.

We start by calculating the marginals, which we will need anyway for the graphs.

\[ \begin{aligned} &f_X(x) =\int_{x}^1 \frac14(2+x+y) dy= \\ &\frac18(2+x + y)^2\vert_{x}^1 = \\ &\frac{1}{8}(3+x)^2-\frac12(1+x)^2;-1<x<1 \\ &f_Y(x) =\int_{-1}^y \frac14(2+x+y) dx= \\ &\frac18(2+x + y)^2\vert_{-1}^y = \\ &\frac12(1+y)^2-\frac18(1 + y)^2=\\ &\frac38(1+y)^2;-1<y<1 \end{aligned} \]

Accept-reject with U,V distributed U[-1,1] and independent

\[ \begin{aligned} &\frac{f(x,y)}{g(x)g(y)} = \frac{\frac{1}{4}(2+x+y)I_{x<y}(x,y)}{\frac12\frac12} \\ &(2+x+y)^2I_{x<y}(x,y) \le (2+1+1)^2I_{x<y}(x,y) = 16 \\ &\frac{f(x,y)}{cg(x)g(y)}=\frac1{16}(2+x+y)I_{x<y}(x,y) \\ \end{aligned} \]

hw8a <- function(n=1e4) {
   f.cg=function(x) (2+x[1]+x[2])/16*ifelse(x[1]<x[2], 1, 0)
   xy=matrix(0, n, 2)
   for(i in 1:n) {
     repeat {
       uv=runif(2, -1, 1)
       if(runif(1)<f.cg(uv)) {
         xy[i, ]=uv
         break
       } 
     }   
   }
   xy
}
xy <- hw8a()
par(mfrow=c(1, 2))
hist(xy[, 1], 100, freq=FALSE, main="", xlab="x")
curve((3+x)^2/8-(1+x)^2/2, -1, 1, add=TRUE, lwd=2, col="blue")
hist(xy[, 2], 100, freq=FALSE, main="", xlab="y")
curve(3*(1+x)^2/8, -1, 1, add=TRUE, lwd=2, col="blue")

Now I will use the marginal and the conditional distributions to generate the data. We already have everything:

\[ \begin{aligned} &f_Y(y)=\frac38(1+y)^2;-1<y<1\\ &f_{X|Y=y}(x|y) =\frac{\frac{1}{4}(2+x+y)I_{x<y}(x,y)}{3(1+y)^2/8} =\\ &\frac{2(x+2+y)}{3(1+y)^2};-1<x<y\\ \end{aligned} \]

We can generate both of these using the inverse method. First Y:

\[ \begin{aligned} &F_Y(y)=\int_{-1}^y \frac38(1+t)^2 dt = \\ &\frac18(1+t)^3|_{-1}^y = \frac18(1+y)^3 \\ &x= \frac18(1+y)^3 \\ &y=2x^{1/3}-1 \end{aligned} \]

and so this command generates y’s:

y=2*runif(n)^(1/3)-1

Now for the conditional rv:

\[ \begin{aligned} &F_{X|Y=y}(x|y) = \int_{-1}^x\frac{2(t+2+y)}{3(1+y)^2} dt =\\ &\frac{(t+2+y)^2}{3(1+y)^2}|_{-1}^x = \\ &\frac{(x+2+y)^2}{3(1+y)^2}-\frac{(1+y)^2}{3(1+y)^2} = \\ &\frac{(x+2+y)^2}{3(1+y)^2}-1/3;-1<x<y \\ &z=\frac{(x+2+y)^2}{3(1+y)^2}-1/3 \\ &(z+1/3)(3(1+y)^2)=(x+2+y)^2 \\ &x=\sqrt{(z+1/3)(3(1+y)^2)}-2-y \\ \end{aligned} \]

n=1e4
y=2*runif(n)^(1/3)-1
x=sqrt(3*(1+y)^2*(runif(n)+1/3))-2-y
par(mfrow=c(1, 2))
hist(x, 100, freq=FALSE, main="", xlab="x")
curve((3+x)^2/8-(1+x)^2/2, -1, 1, add=TRUE, lwd=2, col="blue")
hist(y, 100, freq=FALSE, main="", xlab="y")
curve(3*(1+x)^2/8, -1, 1, add=TRUE, lwd=2, col="blue")

Problem 2

Consider the random vector (X, Y, Z) with joint density proportional to \(g(x,y,z)=yxz+x+z^2, 0<x,z<1;y=0,1\). Write a routine that generates data from this rv. Do several checks to verify that your routine works. This should include both graphical and numerical checks as well as some formal hypothesis tests. It should also include a check of the three-dimensional joint distribution, not just the marginals.

Let’s do some math first:

\[ \begin{aligned} &f_{X,Z}(x,z)=\sum_y Kg(x,y,z) = K\left(x+z^2+ xz+x+z^2\right) =\\ &K\left(2x+2z^2+ xz\right)\\ &f_X(x)=\int_0^1 K\left(2x+2z^2+ xz\right)dz=\\ &K\left(2xz+\frac23z^3+ \frac12xz^2\right)|_0^1=K\left(\frac52x+\frac23\right)\\ &\int_0^1K\left(\frac52x+\frac23\right)dx=\\ &K\left(\frac54x^2+\frac23x\right)|0^1=K\frac{23}{12}\text{, so }K=\frac{12}{23} \end{aligned} \] \[ \begin{aligned} &f_Z(z)=K\int_0^1 2x+2z^2+ xzdx=\\ &K\left(x^2+2z^2x+ x^2z/2\right)|_0^1=\frac{12}{23}(1+2z^2+z/2)\text{; 0<z<1} \end{aligned} \]

also

\[ \begin{aligned} &f_{X,Y, Z}(x, 0, z)= K(x+z^2)\\ &f_Y(0)=\int_0^1 \int_0^1 K\left(x+z^2 \right)dxdz= \\ &K\int_0^1 x^2/2+xz^2|_0^1 dz= K\int_0^1 1/2+z^2 dz = \\ &K(z/2+z^3/3)|_0^1=K(1/2+1/3)=\frac{12}{23}\frac56=\frac{10}{23}\\ &f_Y(1)=1-f_Y(0)=\frac{13}{23} \end{aligned} \]

Now we can generate data using accept-reject with (U,V,W) where \(U,W\sim U[0,1]\) and \(V\sim U\{0,1\}\). So

\[ \frac{f(x,y,z)}{g(x,y,z)} = \frac{yxz+x+z^2}{1\times\frac12\times1}=2(yxz+x+z^2)\le2(1+1+1^2)=6\\ \]

and so we have

rhw7 <- function(n=1e4) {
  f.cg <- function(x) (prod(x)+x[1]+x[3]^2)/3
  x <- matrix(0, n, 3)
  for(i in 1:n) {
    repeat {
      u <- c(runif(1), sample(0:1, 1), runif(1))
      if(runif(1)<=f.cg(u)) {
        x[i, ] <- u
        break
      }
    }
  }
  x
}
xyz <- rhw7()

Let’s first check Y:

df <-  data.frame("y"= 0:1,
          "Theory"=round(c(10, 13)/23, 3),
          "Simulation"=as.numeric(round(table(xyz[, 2])/1e4, 3)))
kable.nice(df, do.row.names = FALSE)

y	Theory	Simulation
0	0.435	0.43
1	0.565	0.57

and the marginals:

fx <- function(x) 12/23*(5*x/2+2/3)
hist(xyz[, 1], 50, freq=FALSE, main="")
curve(fx, 0, 1, add=TRUE, col="blue", lwd=2)

fz <- function(x) 12/23*(1+2*x^2+x/2)
hist(xyz[, 3], 50, freq=FALSE, main="")
curve(fz, 0, 1, add=TRUE, col="blue", lwd=2)

Next some numeric checks. Y is binomial p=13/23, so we can do a standard hypothesis test for a binomial:

binom.test(sum(xyz[, 2]==1), 1e4, p=13/23)

## 
##  Exact binomial test
## 
## data:  sum(xyz[, 2] == 1) and 10000
## number of successes = 5699, number of trials = 10000, p-value = 0.3482
## alternative hypothesis: true probability of success is not equal to 0.5652174
## 95 percent confidence interval:
##  0.5601272 0.5796320
## sample estimates:
## probability of success 
##                 0.5699

so it passes this test.

How about Kolmogorov-Smirnoff tests for the marginals? For that we need the cdfs:

\[ \begin{aligned} &F_X(x) = \int_0^t \frac{12}{23}(5t/2+2/3)dt = (15x^2+8x)/23\\ &F_Z(z) = \int_0^z \frac{12}{23}(1+2t^2+t/2)dt = (12z+8z^3+3z^2)/23 \end{aligned} \]

Fx <- function(x) (15*x^2+8*x)/23
ks.test(xyz[, 1], "Fx")

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  xyz[, 1]
## D = 0.0061136, p-value = 0.8489
## alternative hypothesis: two-sided

Fz <- function(x) (12*x+8*x^3+3*x^2)/23
ks.test(xyz[, 3], "Fz")

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  xyz[, 3]
## D = 0.0060251, p-value = 0.8609
## alternative hypothesis: two-sided

Finally a test for the joint distribution. For this I will compare probabilities of the form \(P(X<x, Y=0, Z<z)\) with the simulated numbers:

\[ \begin{aligned} &P(X<x, Y=0, Z<z) = \int_0^x \int_0^z f(u, 0, v) dv du = \\ & \int_0^x \int_0^z K(u+v^2) dv du = \\ & K\int_0^x uv+v^3/3|_0^z du = \\ & K\int_0^x uz+z^3/3 du = \\ &K(zu^2/2+z^3u/3|_0^x = \frac{12}{23}(x^2z/2+xz^3/3) \end{aligned} \]

Fxz <- function(x) (x[, 1]^2*x[, 2]/2+x[, 1]*x[, 2]^3/3)*12/23
x <- 1:3/4
z <- 1:3/4
xz <- expand.grid(x, z)
colnames(xz) <- c("x", "z")
y.theory <- Fxz(xz)
y.sim <- rep(0, 9)
tmp <- as.data.frame(xyz[xyz[, 2]==0, c(1, 3)])
colnames(tmp) <- c("x", "z")
for(i in 1:9) {
  tmp1 <- tmp[tmp$x<xz$x[i] & 
              tmp$z<xz$z[i], 1]
    y.sim[i] <- length(tmp1)/1e4
}    
df <- data.frame(x=xz$x, z=xz$z,
      Theory=round(y.theory, 3),
      Simulation=round(y.sim, 3))
kable.nice(df, do.row.names = FALSE)

x	z	Theory	Simulation
0.25	0.25	0.005	0.005
0.50	0.25	0.018	0.019
0.75	0.25	0.039	0.039
0.25	0.50	0.014	0.013
0.50	0.50	0.043	0.044
0.75	0.50	0.090	0.088
0.25	0.75	0.031	0.028
0.50	0.75	0.086	0.086
0.75	0.75	0.165	0.166