Consider the discrete random vector (X,Y) with density P(X=k,Y=j)=c(k+j), k=1,..5 and j=1,..,7. Find the covariance of X and Y.
I will do it a bit more general, with N and M instead of 5 and 7:
\[ \begin{aligned} &1= \sum_{k,j} c(k+j) = \\ &c\sum_{k=1}^N \left(\sum_{j=1}^M k+j \right) = \\ &c\sum_{k=1}^N \left(k\sum_{j=1}^M 1+ \sum_{j=1}^M j \right) = \\ &c\sum_{k=1}^N \left(kM+ \frac{M(M+1)}2 \right) = \\ &c\left(M\frac{N(N+1)}2 +N\frac{M(M+1)}2 \right) = \\ &cNM\left(N(N+1)+M(M+1) \right)/2\\ &c=2/\left[NM\left(N+M+2 \right)\right] \end{aligned} \]
Now
\[ \begin{aligned} &f_X(k) =\sum_{j=1}^M c(k+j) =c\left(kM+\frac{M(M+1)}2 \right) \\ &E[X] = \sum_{k=1}^N k c\left(kM+\frac{M(M+1)}2 \right) = \\ &cM \sum_{k=1}^N k^2 + cM(M+1)/2 \sum_{k=1}^N k = \\ &cM N(N+1)(2N+1)/6 + cNM(N+1)(M+1)/4 = \\ &\frac{NM\left[2(N+1)(2N+1)+3(N+1)(M+1)\right]}{12NM\left(N+M+2 \right)}=\\ &\frac{2(N+1)(2N+1)+3(N+1)(M+1)}{6\left(N+M+2 \right)} \end{aligned} \]
By symmetry we immediately get
\[E[Y]=\frac{2(M+1)(2M+1)+3(N+1)(M+1)}{6\left(N+M+2 \right)}\]
Next
\[ \begin{aligned} &E[XY] = \sum_{k,j} kj c(k+j) = \\ &c\left( \sum_{k=1}^N k^2 \sum_{j=1}^M j+ \sum_{j=1}^M j^2 \sum_{k=1}^N k \right) = \\ &c\left(N(N+1)(2N+1)/6\times M(M+1)/2 + M(M+1)(2M+1)/6\times N(N+1)/2 \right) = \\ &cNM(N+1)(M+1)(N+M+1)/6 =\\ &\frac{(N+1)(M+1)(N+M+1)}{3(N+M+2)}\\ \end{aligned} \]
and so finally:
\[ \begin{aligned} &cov(X,Y) = E[XY]-E[X]E[Y] = \\ &\frac{(N+1)(M+1)(N+M+1)}{3(N+M+2)}- \\ &\frac{2(N+1)(2N+1)+3(N+1)(M+1)}{6\left(N+M+2 \right)} \frac{2(M+1)(2M+1)+3(N+1)(M+1)}{6\left(N+M+2 \right)} = \\ &\frac{2(N+1)(M+1)(N+M+1)- 2(N+1)(2N+1)+3(N+1)(M+1)2(M+1)(2M+1)+3(N+1)(M+1) }{6(N+M+2)} \\ \end{aligned} \]
We already found the marginal distribution of X. Let’s next find the conditional distribution of Y|X=k and use this for the simulation:
\[ \begin{aligned} &f_{Y|X=k}(j\vert k) =\frac{f(x,y)}{f_X(k)} = \\ &\frac{c(k+j)}{c\left(kM+\frac{M(M+1)}2 \right)} = \\ &\frac{k+j}{kM+M(M+1)/2} \\ \end{aligned} \]
Before we do the simulation, let’s find the probabilities using R:
N=5;M=7
p=matrix(0, N, M)
i=0
for(k in 1:N)
for(j in 1:M) {
i=i+1
p[i]=k+j
}
# Verify constant:
c(N*M*(N+M+2)/2, sum(p))
## [1] 245 245
p=p/sum(p)
rmidp1<-function(B=1e6, N=5, M=7) {
px=1:N*M+M*(M+1)/2
x=sample(1:N, size=B, replace=TRUE, prob = px)
y=rep(0, B)
for(k in 1:N)
y[x==k]=sample(1:M, size=length(y[x==k]), replace = TRUE,
prob = (k+1:M)/(k*M+M*(M+1)/2))
cbind(x, y)
}
xy=rmidp1()
# Verify joint distribution:
round(table(xy[, 1], xy[, 2])/1e6-p, 2)
##
## 1 2 3 4 5 6 7
## 1 0.00 -0.02 -0.01 0.00 -0.02 -0.01 0.00
## 2 0.00 -0.02 -0.01 0.00 0.01 -0.01 0.00
## 3 0.00 0.01 -0.01 0.00 0.01 -0.01 0.00
## 4 0.00 0.01 -0.01 0.00 0.01 0.02 0.00
## 5 0.00 0.01 0.02 0.00 0.01 0.02 0.00
# Verify marginals of X
round(table(xy[, 1])/1e6, 4)
##
## 1 2 3 4 5
## 0.1424 0.1708 0.2010 0.2279 0.2579
round((1:N*M+M*(M+1)/2)/(N*M*(N+M+1)/2) , 4)
## [1] 0.1538 0.1846 0.2154 0.2462 0.2769
# Verify Means
EX=(2*(N+1)*(2*N+1)+3*(N+1)*(M+1))/(6*(N+M+2))
round(c(mean(xy[, 1]), EX), 2)
## [1] 3.29 3.29
EY=(2*(M+1)*(2*M+1)+3*(N+1)*(M+1))/(6*(N+M+2))
round(c(mean(xy[, 2]), EY), 2)
## [1] 4.57 4.57
EXY=((N+1)*(M+1)*(N+M+1))/(3*(N+M+2))
round(c(mean(xy[, 1]*xy[, 2]), EXY), 2)
## [1] 14.87 14.86
# Verify covariance:
round(c(cov(xy)[2, 1], EXY-EX*EY), 3)
## [1] -0.164 -0.163
Consider the random variable with density shown here:
It has density
\[ f(x) = \left\{ \begin{array}[lll] 00 &\text{ for } &x<0, x>1\\ 4x &\text{ for } &0<x<0.5\\ 4(1-x) &\text{ for } &0.5<x<1\\ \end{array} \right. \]
Generate data from this density using the runif command only. I can do this in a way that generates one x for each runif. If you can do so as well you get bonus points.
Generating one x for each runif is only possible using the generalize inverse method, so I need the cdf:
\[ \begin{aligned} &F(x) =\int_0^x 4t dt =2t|_0^x =2x^2\text{ ; 0<x<0.5} \\ &F(x) =\frac12+\int_{1/2}^x 4(1-t) dt =\frac12+[-2(1-t)^2|_{1/2}^x =\\ &\frac12+ (-2(1-x)^2-(-2(1-\frac12)^2) = \\ &1-2(1-x)^2 \text{ ; 0.5<x<1} \\ \end{aligned} \]
and it’s inverse: \[ \begin{aligned} &y=2x^2 \rightarrow x=\sqrt{y/2} \\ &y=1-2(1-x)^2 \rightarrow x= 1-\sqrt{(1-y)/2} \end{aligned} \]
midp2=function(n=1e4) {
y=runif(n)
ifelse(y<0.5, sqrt(y/2), 1-sqrt((1-y)/2))
}
hist(midp2(), 50, freq=FALSE)
curve(f, 0,1, add=TRUE, col="blue", lwd=2)
Consider the random variable with density shown here:
It has density
\[ f(x) = \frac{16}3\left\{ \begin{array}[lll] 00 &\text{ for } &x<0, x>1\\ 2x &\text{ for } &0<x<0.25\\ 1-2x &\text{ for } &0.25<x<0.5\\ x-0.5 &\text{ for } &0.5<x<0.75\\ 1-x &\text{ for } &0.75<x<1\\ \end{array} \right. \]
Generate data from this density using the accept-reject algorithm and using the runif command only. Draw the histogram and overlay it with the density to show your method works.
Fist I will use just uniform [0,1] proposals. Note that \(c=\max\{f(x)/g(x)\}=8/3\), so
midp1a=function(n=1e4) {
x=rep(0, n)
counter=0
for(i in 1:n) {
repeat {
counter=counter+2
y=runif(1)
if(runif(1)<3*f(y)/8) {
x[i]=y
break
}
}
}
cat("Number of runif needed per 1 obs: ", round(counter/n, 3),"\n")
x
}
hist(midp1a(), 50, freq=FALSE)
## Number of runif needed per 1 obs: 5.326
curve(f, 0,1, add=TRUE, col="blue", lwd=2)
I will speed up generating data as follows: Notice that the left pyramid is twice as high the right one, so it has twice the area. I will generate data as follows: if runif(1)<2/3, I generate an observation for the left pyramid, otherwise for the right one.
midp1b=function(n=1e4) {
x=rep(0, n)
counter=0
for(i in 1:n) {
counter=counter+1
if(runif(1)<2/3) {
repeat {
counter=counter+2
y=runif(1)/2
if(runif(1)<3/8*f(y)) {
x[i]=y
break
}
}
}
else {
repeat {
counter=counter+2
y=(runif(1)+1)/2
if(runif(1)<3*f(y)/4) {
x[i]=y
break
}
}
}
}
cat("Number of runif needed per 1 obs: ", round(counter/n, 3),"\n")
x
}
hist(midp1b(), 50, freq=FALSE)
## Number of runif needed per 1 obs: 5.04
curve(f, 0,1, add=TRUE, col="blue", lwd=2)
Consider the random vector (X, Y) with density
\[ \begin{aligned} &f(x,y)=\frac{c}{(1+x+y)^5} \end{aligned} \] for \(x>0,y>0\), 0 otherwise.
Write a routine that generates data from this random vector. Draw the histograms of the marginals with the marginal densities to check your routine. Find P(1<X<2, 0.5<Y<1) both analytically and using your simulated data.
\[ \begin{aligned} &F(x,y) =\int_0^x \int_0^y \frac{c}{(1+u+v)^5} du dv\\ &c\int_0^x \frac{-1}{4(1+u+v)^4}\vert_0^y dv\\ &c\int_0^x \frac{1}{4(1+v)^4} -\frac{1}{4(1+v+y)^4} dv = \\ &c\left(-\frac{1}{12(1+v)^3} +\frac{1}{12(1+v+y)^3}\vert_0^x \right.=\\ &\frac{c}{12}\left(1+\frac{1}{(1+x+y)^3}-\frac{1}{(1+x)^3} -\frac{1}{(1+y)^3}\right)\\ &1=\lim_{x\rightarrow \infty} F(x,y)=c/12 \\ &c=12 \end{aligned} \]
\[ \begin{aligned} &f_X(x) = \int_0^\infty \frac{12}{(1+x+y)^5} dy = \\ &-\frac{3}{(1+x+y)^4}\vert_0^\infty = \frac{3}{(1+x)^4}\\ \end{aligned} \] \[ \begin{aligned} &F_X(x) =\int_0^x \frac3{(1+t)^4} dt = 1-1/(1+x)^3\\ &y=1-1/(1+x)^3 \\ &x =1/(1-y)^{1/3}-1 \\ \end{aligned} \]
so we can use the inverse method to generate X. Also notice that the joint density of X and Y is symmetric in x and y, so we have \(f_Y(x)=f_X(x)\). However, X and Y are not independent, so we can not simply generate Y’s the same way. We can try to use the conditional density, though:
\[f_{Y|X=x}(y|x) =\frac{12/(1+x+y)^5}{3/(1+x)^4} = \frac{4(1+x)^4}{(1+x+y)^5}\]
\[ \begin{aligned} &F_{Y|X=x}(y|x) =\int_0^y \frac{4(1+x)^4}{(1+x+t)^5} dt =\\ &-\frac{(1+x)^4}{(1+x+t)^4}\vert_0^y = \\ &\frac{(1+x)^4}{(1+x)^4}-\frac{(1+x)^4}{(1+x+y)^4} = \\ &1-\left(\frac{1+x}{1+x+y}\right)^4 \\ &t=1-\left(\frac{1+x}{1+x+y}\right)^4 \\ &\frac{1+x}{1+x+y} =(1-t)^{1/4} \\ &y=\frac{1+x}{(1-t)^{1/4}} -1-x \end{aligned} \]
and here is the simulation:
set.seed(11)
u=runif(1e5)
x=1/(1-u)^(1/3)-1
v=runif(1e5)
y=(1+x)/(1-v)^(1/4)-1-x
plot(x, y, pch=".", xlim=c(0,5), ylim=c(0,5))
par(mfrow=c(1,2))
hist(x[x<3], 50, freq=FALSE, main="fx")
curve(3/(1+x)^4, 0, 5, add=TRUE, lwd=2)
hist(y[y<3], 50, freq=FALSE, main="fy")
curve(3/(1+x)^4, 0, 5, add=TRUE, lwd=2)
round(c(sum(x<1&y<1)/1e5, 1+1/3^3-2*1/2^3), 3)
## [1] 0.787 0.787
Finally for P(1<X<2, 0.5<Y<1):
\[ \begin{aligned} &P(x_1<X<x_2,y_1<Y<y_2) =\\ &P(X<x_2,Y<y_2) - P(X<x_1,Y<y_2) + P(X<x_2,Y<y_1) - P(X<x_2,Y<y_2)= \\ &F(x_2, y_2)-F(x_2, y_1) -F(x_1, y_2) +F(x_1, y_1) \end{aligned} \]
This is easiest to see in a picture:
and so
cdf <- function(x, y) 1+1/(1+x+y)^3-1/(1+x)^3-1/(1+y)^3
p1=cdf(2, 1)-cdf(2, 0.5)-cdf(1, 1)+cdf(1, 0.5)
p2=sum(x>1&x<2&y>0.5&y<1)/length(x)
round(c(p1,p2), 3)
## [1] 0.019 0.019