Probability Theory

Introduction

For a detailed discussion of Probability Theory go http://academic.uprm.edu/wrolke/esma6600

We are not going to do a detailed review of the theory of probability. Instead we are going to do through several examples that include the kinds of calculations you should know how to do.

Example (2.1.1)

say we have a random variable X with density \(f(x)=c/x^{a-1}, x>1, a>0\)

  1. find c

\[1=\int_{-\infty}^{\infty} f(x)dx = \int_1^{\infty} \frac{c}{x^{a+1}}dx=\frac{-c}{a x^{a}}\vert_1^{\infty}=\frac{c}{a}\]

so\(c=\alpha\).

  1. Find E[X] and var(X)

\[ \begin{aligned} &E[X^k] = \int_{-\infty}^{\infty}x^k f(x)dx =\\ &\int_1^{\infty} x^k\frac{a}{x^{a+1}}dx=\\ &\int_1^{\infty} \frac{a}{x^{a-k+1}}dx = \\ &\frac{-a}{(a-k) x^{a-k}}\vert_1^{\infty}= \\ &\frac{a}{k-a} \\ \end{aligned} \]

if \(a>k\) and \(\infty\) otherwise.

So

\[ \begin{aligned} &E[X] = \left\{ \begin{array}{cc} \frac{a}{k-a} & a>1\\\infty&a\le 1\end{array}\right.\\ &var(X) =E[X^2]-(E[X])^2 = \\ &\left\{ \begin{array}{cc} \frac{a}{2-a}-(\frac{a}{1-a})^2 & a>2\\\infty & a\le 2\end{array}\right.=\\ &\left\{ \begin{array}{cc} \frac{a}{(a-2)(a-1)^2} & a>2\\\infty & a\le 2\end{array}\right. \end{aligned} \]

  1. Let \(Y=a\log X\). Find the density of Y.

Notice if \(x>1\), \(y=a\log x>0\), so

\[ \begin{aligned} &F_Y(y) =P(Y<y) =P(a\log X<y)=\\ &P(X<e^{y/a}) =\int_1^{e^{y/a}} a/x^{a+1} dx = \\ &-x^{-a}|_1^{e^{y/a}} = 1-e^{-x}\\ \end{aligned} \]

so \(Y\sim Exp(1)\)

Example (2.1.2)

say we have a discrete random vector (X,Y) with

0 1 2
0 0.1 0.1 0.2
1 0.0 0.3 0.3
  1. Find Cor(X,Y)

Cor(X,Y) = Cov(X,Y)/(sd(X)sd(Y))

Cov(X,Y) = EXY-EX*EY EXY=0*0*0.1+0*1*0.1+…+1*2*0.3 = 0.3+2*0.3 = 0.9

0 1 2 fx
0 0.1 0.1 0.2 0.4
1 0.0 0.3 0.3 0.6
fy 0.1 0.4 0.5 1.0

E[X] = 0*0.4+1*0.6 = 0.6
E[Y] = 0*0.1+1*0.4+2*0.5 = 1.4

cov(X,Y) = E[XY]-E[X]E[Y] = 0.9-0.6*1.4 = 0.06

E[X2] = 02*0.4+12*0.6 = 0.6
var(X) = E[X2]-(E[X])2 = 0.6-0.62 = 0.24
sd(X)=\(\sqrt{var(X)} = \sqrt{0.24} = 0.489\)

E[Y2] = 02*0.1+12*0.4+22*0.5 = 2.4
var(Y) = E[Y2]-(E[Y])2 = 2.4-1.42 = 0.44
sd(Y)=\(\sqrt{var(Y)} = \sqrt{0.44} = 0.663\)

cor(X,Y) = cov(X,Y)/(sd(X)sd(Y)) = 0.06/(0.489*0.663) = 0.185

  1. Are X and Y independent?

No, because \(cov(X,Y)\ne 0\)

or

\(f(0,0) = 0.1 \ne f_X(0)f_Y(0)= 0.1\times0.4=0.04\)

  1. Find E[X|Y=2]

E[X|Y=2] = \(\sum\)xfX|Y=1(x|2)
fX|Y=2(x|2) = f(x,2)/fY(2)
fX|Y=2(0|2) = f(0,2)/fY(2) = 0.2/0.5 = 0.4
fX|Y=2(1|2) = f(1,2)/fY(2) = 0.3/0.5 = 0.6

x P(X=x|Y=1)
0 0.4
1 0.6

E[X|Y=2] = 0*0.4+1*0.6 = 0.6

Example (2.1.3)

say we have random variables X,Y~U[0,1] and independent. Find the density of Z=X+Y

Solution 1:

\[ \begin{aligned} &F_{X+Y}(z) = P(X+Y\le z) =\\ &\int_{-\infty}^\infty P(X+Y\le z|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty (X\le z-y|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty F_{X|Y=y}(z-y|y)f_Y(y)dy \\ &f_Z(z) = \frac{d}{dz} F_Z(z) =\\ &\frac{d}{dz} \int_{-\infty}^\infty F_{X|Y=y}(z-y|y)f_Y(y)dy =\\ &\int_{-\infty}^\infty \frac{d}{dz} F_{X|Y=y}(z-y|y)f_Y(y)dy = \\ &\int_{-\infty}^\infty f_{X|Y=y}(z-y|y)f_Y(y)dy = \\ &\int_{-\infty}^\infty f_{X}(z-y)f_Y(y)dy \\ \end{aligned} \] Now \(f_x(x)=0\) if \(x<0\) or \(x>1\), so \(f_x(z-y)=0\) if \(z<y\) or \(y<z-1\), so

\[ \begin{aligned} &\int_{-\infty}^\infty f_{X}(z-y)f_Y(y)dy = \\ &\left\{\begin{array}{cc} \int_0^z 1dy & 0<z<1 \\ \int_{z-1}^1 1dy & 1<z<2 \end{array}\right. = \\ &\left\{\begin{array}{cc} z & 0<z<1 \\ 2-z & 1<z<2 \end{array}\right. \end{aligned} \]

Solution 2:

We will consider the transformation \(u=g_1(x,y)=x+y\), \(v=g_2(x,y)=x-y\). The inverse transform is given by \(x=h_1(u,v)=(u+v)/2\), \(y=h_2(u,v)=(x-y)/2\). The Jacobian is

\[ J= \begin{vmatrix} \frac{dx}{du} & \frac{dx}{dv} \\ \frac{dy}{du} & \frac{dy}{dv} \end{vmatrix}= \begin{vmatrix} \frac12 & \frac12 \\ \frac12 & -\frac12 \\ \end{vmatrix} =\frac12\frac12-\frac12(-\frac12)=\frac12 \] and the change of variable formula from calculus yields

\[f_{uv}(u,v)=f_{xy}(h_1(u,v), h_2(u,v))\vert J\vert=1\cdot 1\cdot\frac12=\frac12\] for u, v with

\[ \begin{aligned} &0<u<2\\ &-1<v<1 \\ &0<(u+v)/2<1\rightarrow -u<v<2-u \\ &0<(u-v)/2<1\rightarrow u<v<u-2 \\ \end{aligned} \]

here is a figure of this area:

For the density of U=X+Y we need to find the marginal:

\[ \begin{aligned} &f_{X+Y}(u) =\int_{-\infty}^\infty f_{uv}(u, v)dv =\\ &\left\{ \begin{array}{ccc} \int_{-u}^u\frac12 dv&\text{if}&0<u<1\\ \int_{u-2}^{2-u}\frac12 dv&\text{if}&1<u<2\\ \end{array} \right. = \\ &\text{ }\\ &\left\{ \begin{array}{ccc} \frac12v\vert_{-u}^u &\text{if}&0<u<1\\ \frac12v\vert_{u-2}^{2-u}&\text{if}&1<u<2\\ \end{array} \right. = \\ &\text{ }\\ &\left\{ \begin{array}{ccc} u &\text{if}&0<u<1\\ 2-u&\text{if}&1<u<2\\ \end{array} \right. \end{aligned} \]

Example (2.1.4)

Let \(X_1\),..,\(X_n\) be a random sample from \(U\{1,..,N\}\) for some \(N>1\). Let \(M=\max\{X_i\}\). Show that \(M\rightarrow N\) in probability.

\(M\rightarrow N\)

iff

for all \(\epsilon>0\) \(P(|M-N|>\epsilon)\rightarrow 0\)

Now

\[ \begin{aligned} &P(M\le m) =P(X_1\le m, .., X_n\le m)= \\ &P(X_1\le m)^n =(\frac{m}{N})^n, 1\le m \le N \\ &P(|M-N|>\epsilon) =1-P(N-\epsilon\le M\le N+\epsilon)= \\ &1-P(N-\epsilon\le M)= P(M<N-\epsilon) = \\ &P(M\le \lfloor N-\epsilon\rfloor) = \\ &(\frac{\lfloor N-\epsilon\rfloor}{N})^n\rightarrow 0 \end{aligned} \]

because \(\lfloor N-\epsilon\rfloor<N\).

Example (2.1.5)

say (X, Y) has joint density proportional to \(g(x,y)=x+y, 0<x,y<1\). Find the distribution function of of Z=E[Y|X].

Here are all the definitions we will need. Note that X and Y are continuous random variables, so

\[ \begin{aligned} &E[Y|X=x] =\int yf_{Y|X=x}(y|x)dy \\ &f_{Y|X=x}(y|x) =\frac{f(x,y)}{f_X(x)} \\ &f_X(x) = \int f(x,y) dy \\ &f_Y(y) = \int f(x,y) dx \end{aligned} \]

Now

\[ \begin{aligned} &f_X(x) =\int_0^1 c(x+y) dy = c(xy+y^2/2)|_0^1 = c(x+1/2) \\ &1=\int f_X(x)dx =\int_0^1 c(x+1/2) dx =\\ &c(x^2/2+x/2)|_0^1 =c(1/2+1/2);c=1 \\ &f_X(x) =x+1/2, 0<x<1 \\ &f_Y(y) =y+1/2,0<y<1\text{ by symmetry}\\ &f_{Y|X=x} =\frac{f(x,y)}{f_X(x)} =\frac{x+y}{x+1/2} \\ &E[Y|X=x] =\int yf_{Y|X=x}(y|x)dy =\\ &\int_0^1 y\frac{x+y}{x+1/2}dy = \frac{xy^2/2+y^3/3}{x+1/2}|_0^1 = \\ &\frac{x/2+1/3}{x+1/2}\\ &Z=E[Y|X]=\frac{X/2+1/3}{X+1/2} \end{aligned} \]

What values does Z take? Let’s see:

curve((x/2+1/3)/(x+1/2), 0, 1)

so \(5/9<z<2/3\). Now

\[ \begin{aligned} &F_Z(Z)=P(Z<z)=P(\frac{X/2+1/3}{X+1/2}<z) = \\ &P(X/2+1/3<zX+z/2) = \\ &P(X(1/2-z)<z/2-1/3) =\\ &P(X>\frac{z-2/3}{1-2z}) = \text{ (because z>1/2)}\\ &1-\int_0^{\frac{z-2/3}{1-2z}} x+1/2 dx =\\ &1-(x^2/2+x/2|^{\frac{z-2/3}{1-2z}}_0=\\ &1-((\frac{z-2/3}{1-2z})^2/2+(\frac{z-2/3}{1-2z})/2) = \\ &1-((z-2/3)^2-(z-2/3)(1-2z))/(2(1-2z)^2)\\ &1-(3z^2-11z/3+10/9)/(2(1-2z)^2)\\ &\end{aligned} \]

Example (2.1.6)

say \(X\sim Pois(\lambda)\), \(N=X+1\) and \(Y|N=n\sim Beta(n,1)\)

  1. Find E[Y]

\[ \begin{aligned} &Y\vert X=n \sim Beta(n+1, 1) \\ &E[Y] =E\{E[Y|X]\} \\ &E[Y|X=n] =\frac{n+1}{n+2} \\ &E\{E[Y|X]\}=\sum_{n=0}^\infty \frac{n+1}{n+2}\frac{\lambda^n}{n!}e^{-\lambda}=\\ &E\{E[Y|X]\}=\sum_{n=0}^\infty \frac{(n+1)^2}{\lambda^2}\frac{\lambda^{n+2}}{(n+2)!}e^{-\lambda}=\\ &\frac1{\lambda^2}\sum_{k=2}^\infty (k-1)^2\frac{\lambda^{k}}{k!}e^{-\lambda}=\\ &\frac1{\lambda^2}\left[\sum_{k=0}^\infty (k-1)^2\frac{\lambda^{k}}{k!}e^{-\lambda}-e^{-\lambda}\right]=\\ &\frac1{\lambda^2}\left[\sum_{k=0}^\infty (k^2-2k+1)\frac{\lambda^{k}}{k!}e^{-\lambda}-e^{-\lambda}\right]=\\ &\frac1{\lambda^2}\left[var(X)+E[X]^2-2E[X]+1-e^{-\lambda}\right]=\\ &\frac1{\lambda^2}\left[\lambda+\lambda^2-2\lambda+1-e^{-\lambda}\right]=\\ &\frac1{\lambda^2}\left[\lambda^2-\lambda+1-e^{-\lambda}\right] \end{aligned} \]

  1. Find E[X|Y=y]

\[ \begin{aligned} &P(N=n) =P(X=n-1) =\frac{\lambda^{n-1}}{(n-1)!}e^{-\lambda};n=1,2,.. \\ &f(y,n) = f_N(n)f_{Y|N=n}(y|n) = \frac{\lambda^{n-1}}{(n-1)!}e^{-\lambda} ny^{n-1}=\\ &\frac{n(\lambda y)^{n-1}}{(n-1)!}e^{-\lambda};0<y<1;n=1,2.. \end{aligned} \]

\[ \begin{aligned} &f_Y(y) =\sum_{n=1}^\infty \frac{n(\lambda y)^{n-1}}{(n-1)!}e^{-\lambda} =\\ &e^{\lambda y-\lambda}\sum_{n=1}^\infty [(n-1)+1]\frac{(\lambda y)^{n-1}}{(n-1)!}e^{-\lambda y} = \\ &e^{\lambda y-\lambda}\sum_{k=0}^\infty [k+1]\frac{(\lambda y)^{k}}{k!}e^{-\lambda y} = \\ &e^{\lambda (y-1)}\left[E[A]+1\right] =e^{\lambda y-\lambda}[\lambda y+1] \\ \end{aligned} \] where \(A\sim Pois(\lambda y)\).

Now

\[ \begin{aligned} &f_{N|Y=y}(n|y) = \frac{\frac{n(\lambda y)^{n-1}}{(n-1)!}e^{-\lambda}}{e^{\lambda y-\lambda}[\lambda y+1] } = \\ &\frac{n(\lambda y)^{n-1}}{(n-1)!(\lambda y+1)}e^{-\lambda y} \\ &E[N|Y=y] =\sum_{n=1}^\infty n \frac{n(\lambda y)^{n-1}}{(n-1)!(\lambda y+1)}e^{-\lambda y} =\\ &\frac1{\lambda y+1}\sum_{n=1}^\infty n^2 \frac{(\lambda y)^{n-1}}{(n-1)!}e^{-\lambda y} \\ = &\frac1{\lambda y+1}\sum_{k=0}^\infty (k+1)^2 \frac{(\lambda y)^{k}}{k!}e^{-\lambda y} =\\ &\frac1{\lambda y+1}E\left[(A+1)^2\right] =\\ &\frac1{\lambda y+1}\left[E[A^2]+2E[A]+1\right] =\\ &\frac1{\lambda y+1}\left[var(X)+E[X]^2+2E[X]+1-e^{-\lambda}\right] =\\ &\frac1{\lambda y+1}\left[\lambda y+(\lambda y)^2+2\lambda y+1\right] =\\ &\frac{\lambda^2 y^2+3\lambda y+1}{\lambda y+1} \end{aligned} \] and finally

\[E[X|Y=y] = E[N-1|Y=y] = E[N|Y=y] - 1=\lambda y\frac{\lambda y+2}{\lambda y+1}\]

Example (2.1.7)

Say \(X_1,..,X_{10}\) are iid \(N(10, 3)\). Find \(P(\sum X_i>110)\).

The sum of independent normal random variables is again normal. Also

\[ \begin{aligned} &E[\sum X_i] = \sum E[X_i] = 10\times 10 = 100\\ &Var[\sum X_i] = \sum Var[X_i] = 10\times 3^2 = 90 \\ &\frac{\sum X_i-100}{\sqrt{90}} \sim N(0,1) \\ &P(\sum X_i>110) = \\ &P(\frac{\sum X_i-100}{\sqrt{90}}>\frac{110-100}{\sqrt{90}}) =\\ &P(Z>\frac{10}{\sqrt{90}}) = 1- \Phi(\frac{\sqrt{10}}{3}) \end{aligned} \]

1-pnorm(sqrt(10)/3)
## [1] 0.1459203

Example (2.1.8)

Say \(X_1,..,X_{10}\) are iid with \(E[X_1]=10\) and \(sd(X_1)=3\). Find \(P(\sum X_i>110)\).

If the central limit theorem holds, we again have \(\sum X_i \sim N(100, \sqrt{90})\) and so again \(P(\sum X_i>110)=0.146\).

Example (2.1.9)

Say \(X_1,..,X_n\sim Geom(p)\). Let \(T= \sum X_i\). + Are the population mean and median of T the same?

First the population mean of T:

\[E[T]=E[\sum_{i=1}^n X_i]=\sum_{i=1}^n E[X_i]=n/p\]

Now the population median is defined as follows: Say \(T\sim F\), then M is such that \(P(T \le M)=0.5\).

Let’s try first the case n=1:

\[ \begin{aligned} &\frac12=P(X_1\le M) = \sum_{k=1}^M p(1-p)^{k-1} = \\ &p\sum_{n=0}^{M-1} (1-p)^{n} = p\frac{1-(1-p)^M}{1-(1-p)} = 1-(1-p)^M\\ &p =\frac{\log(1/2)}{\log(1-p)} = - \frac{\log2}{\log(1-p)}\\ \end{aligned} \]

for example if p=0.1 we find \(M=6.57 < 10=1/p\), so the median is not equal to the mean.

How about the case n=2? First we need to find the distribution of \(X_1+X_2\):

\[ \begin{aligned} &P(X_1+X_2=k) = \sum_{i=1}^{k-1} P(X_1+X_2=k\vert X_2=i)P(X_2=i)=\\ &\sum_{i=1}^{k-1} P(X_1=k-i)P(X_2=i) = \\ &\sum_{i=1}^{k-1} p(1-p)^{k-i-1}p(1-p)^{i-1} =(k-1)p^2(1-p)^{k-2} \\ &k=2,3,.. \end{aligned} \] and we can use R to find M:

find.median <- function (p)
{
  dgeom2 <- function(k,p) {(k-1)*p^2*(1-p)^(k-2)}
  M <- 1
  F <- 0
  repeat {
      M <- M+1
      F <- F+dgeom2(M,0.1)
      if(F>=0.5) break
  }
  M
}
find.median(0.1)
## [1] 17

What about n=3?

\[ \begin{aligned} &P(X_1+X_2+X_3=k) = \sum_{i=1}^{k-1} P(X_1+X_2+X_3=k\vert X_3=i)P(X_3=i)=\\ &\sum_{i=1}^{k-1} P(X_1+X_2=k-i)P(X_3=i) = \\ &\sum_{i=1}^{k-1} (k-i-1)p^2(1-p)^{k-i-2}p(1-p)^{i-1} =\\ &p^3(1-p)^{k-3}\sum_{i=1}^{k-1} (k-1-i) =\\ &p^3(1-p)^{k-3}\sum_{j=0}^{k-2} j =\\ &p^3(1-p)^{k-3}\frac{(k-2)(k-1)}{2}\\ &k=3,4,,.. \end{aligned} \]

This still works but gets worse for more terms. Alternatively we can use simulation:

find.median <- function (n, p, B=10000) 
{
    x<-rep(0,B)
    for(i in 1:n) x <- x+rgeom(B,p)+1
    return(quantile(x,0.5))
}
find.median(20, 0.1)
## 50% 
## 197

and this works for any reasonably small number n. 

Finally we have another solution if n is very large: let’s use the Central Limit Theorem. recall that \(var(X_1)=(1-p)/p^2\), so

\[ \begin{aligned} &\frac12=P(T\le M) = \\ &P\left(\sqrt{n}\frac{T/n-E[T]}{sd(T)}\le \sqrt{n}\frac{M/n-1/p}{\sqrt{(1-p)/p^2}} \right) = \Phi(\sqrt{n}\frac{pM/n-1}{\sqrt{1-p}})\\ &\sqrt{n}\frac{pM/n-1}{\sqrt{1-p}} =\Phi^{-1}(1/2)=0\\ &pM/n-1 = 0\\ &M=n/p \end{aligned} \]

and we see that for large n they are indeed the same. Of course that raises the question how large n has to be for this to work.