Solution to Midterm Esma 5015

Problem 1

Consider the discrete random vector (X,Y) with density P(X=k,Y=j)=c(k+j), k=1,..5 and j=1,..,7. Find the covariance of X and Y.

analytically

I will do it a bit more general, with N and M instead of 5 and 7:

\[ \begin{aligned} &1= \sum_{k,j} c(k+j) = \\ &c\sum_{k=1}^N \left(\sum_{j=1}^M k+j \right) = \\ &c\sum_{k=1}^N \left(k\sum_{j=1}^M 1+ \sum_{j=1}^M j \right) = \\ &c\sum_{k=1}^N \left(kM+ \frac{M(M+1)}2 \right) = \\ &c\left(M\frac{N(N+1)}2 +N\frac{M(M+1)}2 \right) = \\ &cNM\left(N(N+1)+M(M+1) \right)/2\\ &c=2/\left[NM\left(N+M+2 \right)\right] \end{aligned} \]

Now

\[ \begin{aligned} &f_X(k) =\sum_{j=1}^M c(k+j) =c\left(kM+\frac{M(M+1)}2 \right) \\ &E[X] = \sum_{k=1}^N k c\left(kM+\frac{M(M+1)}2 \right) = \\ &cM \sum_{k=1}^N k^2 + cM(M+1)/2 \sum_{k=1}^N k = \\ &cM N(N+1)(2N+1)/6 + cNM(N+1)(M+1)/4 = \\ &\frac{NM\left[2(N+1)(2N+1)+3(N+1)(M+1)\right]}{12NM\left(N+M+2 \right)}=\\ &\frac{2(N+1)(2N+1)+3(N+1)(M+1)}{6\left(N+M+2 \right)} \end{aligned} \]

By symmetry we immediately get

\[E[Y]=\frac{2(M+1)(2M+1)+3(N+1)(M+1)}{6\left(N+M+2 \right)}\]

\[ \begin{aligned} &E[XY] = \sum_{k,j} kj c(k+j) = \\ &c\left( \sum_{k=1}^N k^2 \sum_{j=1}^M j+ \sum_{j=1}^M j^2 \sum_{k=1}^N k \right) = \\ &c\left(N(N+1)(2N+1)/6\times M(M+1)/2 + M(M+1)(2M+1)/6\times N(N+1)/2 \right) = \\ &cNM(N+1)(M+1)(N+M+1)/6 =\\ &\frac{(N+1)(M+1)(N+M+1)}{3(N+M+2)}\\ \end{aligned} \]

and so finally:

\[ \begin{aligned} &cov(X,Y) = E[XY]-E[X]E[Y] = \\ &\frac{(N+1)(M+1)(N+M+1)}{3(N+M+2)}- \\ &\frac{2(N+1)(2N+1)+3(N+1)(M+1)}{6\left(N+M+2 \right)} \frac{2(M+1)(2M+1)+3(N+1)(M+1)}{6\left(N+M+2 \right)} = \\ &\frac{2(N+1)(M+1)(N+M+1)- 2(N+1)(2N+1)+3(N+1)(M+1)2(M+1)(2M+1)+3(N+1)(M+1) }{6(N+M+2)} \\ \end{aligned} \]

using simulation

We already found the marginal distribution of X. Let’s next find the conditional distribution of Y|X=k and use this for the simulation:

\[ \begin{aligned} &f_{Y|X=k}(j\vert k) =\frac{f(x,y)}{f_X(k)} = \\ &\frac{c(k+j)}{c\left(kM+\frac{M(M+1)}2 \right)} = \\ &\frac{k+j}{kM+M(M+1)/2} \\ \end{aligned} \]

Before we do the simulation, let’s find the probabilities using R:

N=5;M=7
p=matrix(0, N, M)
i=0
for(k in 1:N)
  for(j in 1:M) {
    i=i+1
    p[i]=k+j
  }
# Verify constant:
c(N*M*(N+M+2)/2, sum(p))

## [1] 245 245

p=p/sum(p)

rmidp1<-function(B=1e6, N=5, M=7) {
  px=1:N*M+M*(M+1)/2
  x=sample(1:N, size=B, replace=TRUE, prob = px)
  y=rep(0, B)
  for(k in 1:N)
    y[x==k]=sample(1:M, size=length(y[x==k]), replace = TRUE,
                   prob = (k+1:M)/(k*M+M*(M+1)/2))
  cbind(x, y)
}
xy=rmidp1()

# Verify joint distribution:
round(table(xy[, 1], xy[, 2])/1e6-p, 2)

##    
##         1     2     3     4     5     6     7
##   1  0.00 -0.02 -0.01  0.00 -0.02 -0.01  0.00
##   2  0.00 -0.02 -0.01  0.00  0.01 -0.01  0.00
##   3  0.00  0.01 -0.01  0.00  0.01 -0.01  0.00
##   4  0.00  0.01 -0.01  0.00  0.01  0.02  0.00
##   5  0.00  0.01  0.02  0.00  0.01  0.02  0.00

# Verify marginals of X 
round(table(xy[, 1])/1e6, 4)

## 
##      1      2      3      4      5 
## 0.1424 0.1708 0.2010 0.2279 0.2579

round((1:N*M+M*(M+1)/2)/(N*M*(N+M+1)/2) , 4)

## [1] 0.1538 0.1846 0.2154 0.2462 0.2769

# Verify Means
EX=(2*(N+1)*(2*N+1)+3*(N+1)*(M+1))/(6*(N+M+2))
round(c(mean(xy[, 1]), EX), 2)

## [1] 3.29 3.29

EY=(2*(M+1)*(2*M+1)+3*(N+1)*(M+1))/(6*(N+M+2))
round(c(mean(xy[, 2]), EY), 2)

## [1] 4.57 4.57

EXY=((N+1)*(M+1)*(N+M+1))/(3*(N+M+2))
round(c(mean(xy[, 1]*xy[, 2]), EXY), 2)

## [1] 14.87 14.86

# Verify covariance:
round(c(cov(xy)[2, 1], EXY-EX*EY), 3)

## [1] -0.164 -0.163

Problem 2

Consider the random variable with density shown here:

It has density

\[ f(x) = \left\{ \begin{array}[lll] 00 &\text{ for } &x<0, x>1\\ 4x &\text{ for } &0<x<0.5\\ 4(1-x) &\text{ for } &0.5<x<1\\ \end{array} \right. \]

Generate data from this density using the runif command only. I can do this in a way that generates one x for each runif. If you can do so as well you get bonus points.

Generating one x for each runif is only possible using the generalize inverse method, so I need the cdf:

\[ \begin{aligned} &F(x) =\int_0^x 4t dt =2t|_0^x =2x^2\text{ ; 0<x<0.5} \\ &F(x) =\frac12+\int_{1/2}^x 4(1-t) dt =\frac12+[-2(1-t)^2|_{1/2}^x =\\ &\frac12+ (-2(1-x)^2-(-2(1-\frac12)^2) = \\ &1-2(1-x)^2 \text{ ; 0.5<x<1} \\ \end{aligned} \]

and it’s inverse: \[ \begin{aligned} &y=2x^2 \rightarrow x=\sqrt{y/2} \\ &y=1-2(1-x)^2 \rightarrow x= 1-\sqrt{(1-y)/2} \end{aligned} \]

midp2=function(n=1e4) {
   y=runif(n)
   ifelse(y<0.5, sqrt(y/2), 1-sqrt((1-y)/2))
}
hist(midp2(), 50, freq=FALSE)
curve(f, 0,1, add=TRUE, col="blue", lwd=2)

Problem 3

Consider the random variable with density shown here:

It has density

\[ f(x) = \frac{16}3\left\{ \begin{array}[lll] 00 &\text{ for } &x<0, x>1\\ 2x &\text{ for } &0<x<0.25\\ 1-2x &\text{ for } &0.25<x<0.5\\ x-0.5 &\text{ for } &0.5<x<0.75\\ 1-x &\text{ for } &0.75<x<1\\ \end{array} \right. \]

Generate data from this density using the accept-reject algorithm and using the runif command only. Draw the histogram and overlay it with the density to show your method works.

Fist I will use just uniform [0,1] proposals. Note that \(c=\max\{f(x)/g(x)\}=8/3\), so

midp1a=function(n=1e4) {
  x=rep(0, n)
  counter=0
  for(i in 1:n) {
    repeat {
      counter=counter+2
      y=runif(1)
      if(runif(1)<3*f(y)/8) {
        x[i]=y
        break
      }
    }  
  }
  cat("Number of runif needed per 1 obs: ", round(counter/n, 3),"\n")
  x
} 
hist(midp1a(), 50, freq=FALSE)

## Number of runif needed per 1 obs:  5.326

curve(f, 0,1, add=TRUE, col="blue", lwd=2)

I will speed up generating data as follows: Notice that the left pyramid is twice as high the right one, so it has twice the area. I will generate data as follows: if runif(1)<2/3, I generate an observation for the left pyramid, otherwise for the right one.

midp1b=function(n=1e4) {
  x=rep(0, n)
  counter=0
  for(i in 1:n) {
    counter=counter+1
    if(runif(1)<2/3) {  
      repeat {
         counter=counter+2
         y=runif(1)/2
         if(runif(1)<3/8*f(y)) {
            x[i]=y
            break
         }
      }
    }  
    else {
        repeat {
         counter=counter+2
         y=(runif(1)+1)/2
         if(runif(1)<3*f(y)/4) {
           x[i]=y
           break
         }
      }
    }  
  }
  cat("Number of runif needed per 1 obs: ", round(counter/n, 3),"\n")
  x
} 
hist(midp1b(), 50, freq=FALSE)

## Number of runif needed per 1 obs:  5.04

curve(f, 0,1, add=TRUE, col="blue", lwd=2)

Problem 4

Consider the random vector (X, Y) with density

\[ \begin{aligned} &f(x,y)=\frac{c}{(1+x+y)^5} \end{aligned} \] for \(x>0,y>0\), 0 otherwise.

Write a routine that generates data from this random vector. Draw the histograms of the marginals with the marginal densities to check your routine. Find P(1<X<2, 0.5<Y<1) both analytically and using your simulated data.

\[ \begin{aligned} &F(x,y) =\int_0^x \int_0^y \frac{c}{(1+u+v)^5} du dv\\ &c\int_0^x \frac{-1}{4(1+u+v)^4}\vert_0^y dv\\ &c\int_0^x \frac{1}{4(1+v)^4} -\frac{1}{4(1+v+y)^4} dv = \\ &c\left(-\frac{1}{12(1+v)^3} +\frac{1}{12(1+v+y)^3}\vert_0^x \right.=\\ &\frac{c}{12}\left(1+\frac{1}{(1+x+y)^3}-\frac{1}{(1+x)^3} -\frac{1}{(1+y)^3}\right)\\ &1=\lim_{x\rightarrow \infty} F(x,y)=c/12 \\ &c=12 \end{aligned} \]

\[ \begin{aligned} &f_X(x) = \int_0^\infty \frac{12}{(1+x+y)^5} dy = \\ &-\frac{3}{(1+x+y)^4}\vert_0^\infty = \frac{3}{(1+x)^4}\\ \end{aligned} \] \[ \begin{aligned} &F_X(x) =\int_0^x \frac3{(1+t)^4} dt = 1-1/(1+x)^3\\ &y=1-1/(1+x)^3 \\ &x =1/(1-y)^{1/3}-1 \\ \end{aligned} \]

so we can use the inverse method to generate X. Also notice that the joint density of X and Y is symmetric in x and y, so we have \(f_Y(x)=f_X(x)\). However, X and Y are not independent, so we can not simply generate Y’s the same way. We can try to use the conditional density, though:

\[f_{Y|X=x}(y|x) =\frac{12/(1+x+y)^5}{3/(1+x)^4} = \frac{4(1+x)^4}{(1+x+y)^5}\]

\[ \begin{aligned} &F_{Y|X=x}(y|x) =\int_0^y \frac{4(1+x)^4}{(1+x+t)^5} dt =\\ &-\frac{(1+x)^4}{(1+x+t)^4}\vert_0^y = \\ &\frac{(1+x)^4}{(1+x)^4}-\frac{(1+x)^4}{(1+x+y)^4} = \\ &1-\left(\frac{1+x}{1+x+y}\right)^4 \\ &t=1-\left(\frac{1+x}{1+x+y}\right)^4 \\ &\frac{1+x}{1+x+y} =(1-t)^{1/4} \\ &y=\frac{1+x}{(1-t)^{1/4}} -1-x \end{aligned} \]

and here is the simulation:

set.seed(11)
u=runif(1e5)
x=1/(1-u)^(1/3)-1
v=runif(1e5)
y=(1+x)/(1-v)^(1/4)-1-x
plot(x, y, pch=".", xlim=c(0,5), ylim=c(0,5))

par(mfrow=c(1,2))
hist(x[x<3], 50, freq=FALSE, main="fx")
curve(3/(1+x)^4, 0, 5, add=TRUE, lwd=2)
hist(y[y<3], 50, freq=FALSE, main="fy")
curve(3/(1+x)^4, 0, 5, add=TRUE, lwd=2)

round(c(sum(x<1&y<1)/1e5, 1+1/3^3-2*1/2^3), 3)

## [1] 0.787 0.787

Finally for P(1<X<2, 0.5<Y<1):

\[ \begin{aligned} &P(x_1<X<x_2,y_1<Y<y_2) =\\ &P(X<x_2,Y<y_2) - P(X<x_1,Y<y_2) + P(X<x_2,Y<y_1) - P(X<x_2,Y<y_2)= \\ &F(x_2, y_2)-F(x_2, y_1) -F(x_1, y_2) +F(x_1, y_1) \end{aligned} \]

This is easiest to see in a picture:

and so

cdf <- function(x, y) 1+1/(1+x+y)^3-1/(1+x)^3-1/(1+y)^3
p1=cdf(2, 1)-cdf(2, 0.5)-cdf(1, 1)+cdf(1, 0.5)  
p2=sum(x>1&x<2&y>0.5&y<1)/length(x)
round(c(p1,p2), 3)

## [1] 0.019 0.019