Solution to Homework 5

Problem 1

say the random variable X has \(f(x)=c(1+x)\) for \(x=1,2,..,N\), \(N>1\) fixed and known. Find E[X].

\[ \begin{aligned} &\sum_{x=1}^N f(x;1) = \sum_{x=1}^N c(1+x) = \\ &c\sum_{x=1}^N 1 + c\sum_{x=1}^N x = \\ &cN + c \frac{N(N+1)}2 = \\ &\frac{cN}2 \left(N+3 \right) =1\\ &c=\frac2{N(N+3)} \end{aligned} \]

\[ \begin{aligned} &EX = \sum_{x=1}^N xf(x;1) = \sum_{x=1}^N xc(1+x) = \\ &c\sum_{x=1}^N x + c\sum_{x=1}^N x^2 = \\ &c \frac{N(N+1)}2 + c \frac{N(N+1)(2N+1)}6 = \\ &\frac{cN(N+1)}6 \left(2N+4 \right) = \\ &\frac2{N(N+3)}\frac{N(N+1)}6 \left(2N+4 \right) = \\ &\frac{2(N+1)(N+2)}{3(N+3)} \end{aligned} \]

Here is a routine that generates data from this rv, which I will use to verify the answers:

rhw5p1 <- function(n=1e5, N=2) {
  sample(1:N, n, replace = TRUE, prob=1:N+1)
}

N <- 2
x <- rhw5p1(N=N)
table(x)/1e5

## x
##       1       2 
## 0.39894 0.60106

2/N/(N+3)*(1+ 1:N)

## [1] 0.4 0.6

mean(x)

## [1] 1.60106

2*(N+1)*(N+2)/3/(N+3)

## [1] 1.6

N <- 5
x <- rhw5p1(N=N)
table(x)/1e5

## x
##       1       2       3       4       5 
## 0.09995 0.14760 0.20003 0.25173 0.30069

2/N/(N+3)*(1+ 1:N)

## [1] 0.10 0.15 0.20 0.25 0.30

mean(x)

## [1] 3.50561

2*(N+1)*(N+2)/3/(N+3)

## [1] 3.5

Problem 2

say the random vector \((X,Y)\) has \(f(x,y) = c \cdot (x+y^2)\) for \(0<x<y<1\), 0 otherwise.

Find Cov(X,Y).

\[ f_X(x)=\int_x^1 c (x+y^2)dy=\\ c (xy+\frac{1}{3}y^3)|_x^1=\\ c (x+\frac{1}{3})-c (x^2+\frac{1}{3}x^3)=\\ \frac{c}{3} \left( 1+3x-3x^2-x^3 \right) \\ 0<x<1 \] \[ \int_0^1 f_X(x)dx=\\ \int_0^1 \frac{c}{3} \left( 1+3x-3x^2-x^3 \right)dx=\\ \frac{c}{3} \left( x+3/2x^2-x^3-1/4x^4 \right)|_0^1=\\ \frac{c}{3} \left( 1+3/2-1-1/4 \right)=1\\ c=\frac{12}{5} \] \[ E[X]=\int_0^1 f_X(x)dx=\\ \int_0^1 x\frac{4}{5}(1+3x-3x^2-x^3)dx=\\ \frac{4}{5}(1/2x^2+x^3-3/4x^4-1/5x^5)|_0^1=\\ \frac{4}{5}(1/2+1-3/4-1/5)=\frac{11}{25}\\ \]

\[ f_Y(y)=\int_0^y \frac{12}{5} (x+y^2)dx=\\ \frac{12}{5} (1/2x^2+xy^2)|_0^y=\\ \frac{12}{5} (1/2y^2+y^3)=\\ \frac{12}{10} (y^2+2y^3)\\ 0<y<1 \] \[ E[Y]=\int_0^1yf_y(y)dy=\\ \int_0^1 y\frac{12}{10} (y^2+2y^3)dy\\ \frac{12}{10} (1/4y^4+2/5y^5)|_0^1\\ \frac{12}{10} (1/4+2/5)=\frac{39}{50} \]

\[ E[XY]=\int_0^1 \int_0^y xy\frac{12}{5} (x+y^2)dxdy=\\ \frac{12}{5}\int_0^1 y \int_0^y x^2+xy^2dxdy=\\ \frac{12}{5}\int_0^1 y ( 1/3x^3+1/2x^2y^2|_0^ydy=\\ \frac{12}{5}\int_0^1 y ( 1/3y^3+1/2y^4)dy=\\ \frac{12}{30}\int_0^1 2y^4+3y^5dy=\\ \frac{12}{30} ( 2/5y^5+1/2y^6|_0^1=\\ \frac{12}{30} (2/5+1/2)=\frac{9}{25}\\ \] \[ Cov(X,Y)=E[XY]-E[X]E[Y]=\\ \frac{9}{25}-\frac{39}{50}\frac{11}{25}=\frac{21}{2 \cdot5^4}=0.0168 \]

xy <- rhw5p2()
round(c(21/2/5^4, cov(xy)[1,2]), 3)

## [1] 0.017 0.017

Find E[X|Y=y] \[ f_{X|Y=y}(x|y)=\frac {f(x,y)}{f_Y(y)}=\\ \frac{\frac{12}{5} (x+y^2)}{\frac{12}{10} (y^2+2y^3)}=\\ \frac{2(x+y^2)}{y^2+2y^3}\\ 0<x<y\\ \]

\[ E[X|Y=y]=\\ \int_0^y x\frac{2(x+y^2)}{y^2+2y^3}dx=\\ \frac{2(1/3x^3+1/2x^2y^2)}{y^2+2y^3}|_0^y=\\ \frac{2y^3+3y^4}{3(y^2+2y^3)}=\\ \frac{2y+3y^2}{3+6y} \]

y <- 0:9/10+0.05
EX.Y <- rep(0, 10)
for(i in 1:10) {
  xy1 <- xy[xy[,2]>y[i]-0.025, ]
  xy2 <- xy1[xy1[,2]<y[i]+0.025, ]  
  EX.Y[i] <- mean(xy2[,1])
}
plot(y, EX.Y, col="blue")
curve( (2*x+3*x^2)/(3+6*x), 0, 1, add=TRUE)

Let \(Z=XY\). Find the of Y. \[ F_{XY}(z)=P(XY<z)=\\ \int_{-\infty}^{ \infty}P(XY<z|Y=y)f_Y(y)dy=\\ \int_{-\infty}^{ \infty}P(X<z/y|Y=y)f_Y(y)dy=\\ \int_{-\infty}^{ \infty}F_{X|Y=y}(z/y|y)f_Y(y)dy\\ f_{XY}(z)=\frac{d}{dz}F_{XY}(z)=\\ \frac{d}{dz}\int_{-\infty}^{ \infty}F_{X|Y=y}(z/y|y)f_Y(y)dy=\\ \int_{-\infty}^{ \infty}\frac{d}{dz}F_{X|Y=y}(z/y|y)f_Y(y)dy=\\ \int_{-\infty}^{ \infty}f_{X|Y=y}(z/y|y)\frac{1}{y}f_Y(y)dy=**\\ \] \[ 0<x<y<1 \rightarrow 0<z/y<y<1\rightarrow 0<z<y^2 \] \[ **=\int_{\sqrt z}^1\frac{2(z/y+y^2)}{y^2+2y^3}\frac{12}{10} \frac{1}{y}(y^2+2y^3)dy=\\ \frac{12}{5}\int_{\sqrt z}^1 (z/y^2+y)dy=\\ \frac{12}{5} (-z/y+1/2y^2)|_{\sqrt z}^1=\\ \frac{12}{5} (-z+1/2+\sqrt z-1/2z)=\\ \frac{6}{5} (1-3z+2\sqrt z)\\ 0<z<1 \]

hist(xy[,1]*xy[,2], 50, freq=F, main="")
curve(6/5*(1-3*x+2*sqrt(x)), 0, 1, add=T, lwd=2, col="blue")

Problem 3

Let \(X\sim Ber(0.5)\) and \(Y|X=x \sim Ber(\frac1{1+x})\) Let \(U=XY\). Find \(Var(U)\)

via simulation

B <- 1e5
x <- sample(0:1, B, replace = TRUE)
y <- rep(0,B)
for(i in 1:B) y[i] <- sample(0:1, 1, prob=c(1-1/(1+x[i]), 1/(1+x[i])))
var(x*y)

## [1] 0.1875769

via approximation

We have

\[ \begin{aligned} &EX = \frac12 \\ &VarX= 0.5(1-0.5) = \frac14\\ \end{aligned} \]

\[ \begin{aligned} &f_{Y|X=x}(y|x) = (1-\frac1{1+x})^{1-y}(\frac1{1+x})^{y} \\ &f(x,y) = f_{Y|X=x}(y|x)f_X(x) =\\ &(1-\frac{1}{1+x})^{1-y}(\frac1{1+x})^{y}(\frac12)^{1-x}(\frac12)^x =\\ &\frac{x^{1-y}}{2(1+x)} \\ \end{aligned} \]

\[ \begin{aligned} &E[XY] = 0\times0\times f(0,0) + 0\times1\times f(0,1)+\\ &1\times0\times f(1,0) + 1\times1\times f(1,1)+ = \frac14 \end{aligned} \]

\[ \begin{aligned} &f_Y(0) = f(0,0) + f(1,0) = \frac14\\ &f_Y(1) = f(0,1) + f(1,1) = \frac34\\ &Y\sim Ber(\frac34)\\ &E[Y] = \frac34\\ &Var[Y] = (1-\frac34)\frac34 = \frac3{16}\\ &Cov(X,Y) = E[XY]-EXEY=\frac14-\frac12\frac34=-\frac18 \end{aligned} \]

finally

\[ \begin{aligned} &h(x,y) = xy\\ &\frac{dh(x,y)}{dx} = y\text{, }\frac{dh(x,y)}{dy} = x\\ &Var(XY) = (\frac{dh(\mu_x,\mu_y)}{dx})^2Var[X]+(\frac{dh(\mu_x,\mu_y)}{dy})^2Var[Y]+\\ &\frac{dh(\mu_x,\mu_y)}{dx})\frac{dh(\mu_x,\mu_y)}{dx}Cov[X,Y] = \\ &\mu_y^2Var[X]+\mu_x^2Var[Y]+2\mu_x\mu_y Cov(X,Y)=\\ &(\frac34)^2\frac14+(\frac12)^2\frac3{16}+2\frac12\frac34(-\frac18) = \\ &(9+3-6)/2^6=3/2^5=0.09375 \end{aligned} \] c. via analytic calculation

Note that \(XY\) is either 0 or 1. Now

\[ \begin{aligned} &P(XY=1) = P(X=1,Y=1) = f(1, 1) = \frac14\\ &XY\sim Ber(\frac14) \\ &Var[XY] = (1-\frac14)\frac14=3/16=0.1875\\ \end{aligned} \]

Problem 4

Let \(X_i;i=1,..\) be a sequence of independent and identically distributed random variables with \(\mu=E[X_1]\) and \(\sigma^2=Var[X_1]<\infty.\). Let \(Z_n=\frac{\sum_{i=1}^n X_i-n\mu}{\sqrt{n}\sigma}\), then according to the central limit theorem \(Z_n\) converges in distribution to a standard normal random variable. So we would expect \(P(Z_n\le x)\approx \Phi(x)\), where \(\Phi\) is the cdf of a standard normal. But how large does n have to be? Let’s look at an example:

Say \(X_1\sim Ber(p)\). Find N such that for any n>N

\[\lvert P(Z_n\le x)-\Phi(x)\rvert<0.01\]

and

p=0.5, x=0.3
p=0.1, x=0.3

Doing this analytically is quite difficult, so let’s use R. First note that \(\mu=p\) and \(\sigma^2=p(1-p)\) and \(\sum_{i=1}^n X_i\sim Bin(n,p)\). Now

\[ \begin{aligned} &P(Z_n\le x) = P(\frac{\sum_{i=1}^n X_i-n\mu}{\sqrt{n}\sigma}\le x) = \\ &P(\frac{\sum_{i=1}^n X_i-np}{\sqrt{np(1-p)}}\le x) = \\ &P(\sum_{i=1}^n X_i \le np+\sqrt{np(1-p)}x) = \\ &\text{pbinom}(np+\sqrt{np(1-p)}x;n,p) \end{aligned} \]

so:

hw5p4 <- function(p,x=0.3) {
   phi=pnorm(x)
   n=100:10000
   pz=pbinom(n*p+sqrt(n*p*(1-p))*x, n, p)
   y=abs(pz-phi)
   plot(n, y, pch=".")
   abline(h=0.01)
   max(n[y>0.01])
}
hw5p4(0.5)

## [1] 1351

hw5p4(0.1)

## [1] 6050

so if p=0.5 n=1351 and if p=0.1 n=6050.

Comments:

As you can see in the graph, it happens many times for much smaller n that that \(\lvert P(Z_n\le x)-\Phi(x)\rvert<0.01\), purely by accident. Obviously the n we want is such that \(\lvert P(Z_m\le x)-\Phi(x)\rvert<0.01\) for all \(m\ge n\).
There is no absolute guarantee in my solution that there is no much larger n so that suddenly we again have \(\lvert P(Z_n\le x)-\Phi(x)\rvert>0.01\), but it does seem unlikely!
there is actually an analytic answer to this question, the so called Beery-Essen theorem.