Expectation

Expectation of a Random Variable

Definition (3.4.1)

The expectation (or expected value) of a random variable g(X) is defined by

\[ \begin{aligned} &\sum_x g(x)f(x) \text{ if X discrete} \\ &\int_{- \infty}^\infty g(x)f(x) dx \text{ if X continuous} \\ \end{aligned} \]

We use the notation E[g(X)]

Example (3.4.2)

we roll fair die until the first time we get a six. What is the expected number of rolls?

We saw that f(x) = 1/6*(5/6)x-1 if

Here we just have g(x)=x, so

\[ E[X]=\sum_{i=1}^\infty g(x_i)f(x_i) = \sum_{i=1}^\infty i \frac16 (\frac56)^{i-1} \]

How do we compute this sum? Here is a “standard” trick:

\[ \begin{aligned} &\sum_{k=1}^\infty kt^{k-1} = \\ &\sum_{k=1}^\infty \frac{d t^{k}}{dt} = \\ &\frac{d}{dt}\sum_{k=1}^\infty t^k = \\ &\frac{d}{dt}\left[\sum_{k=0}^\infty t^k -1\right] = \\ &\frac{d}{dt}\left[\frac1{1-t} -1\right] = \\ &\frac1{(1-t)^2} \end{aligned} \]

and so we find

\[E[X]=\frac16\frac1{(1-5/6)^2}=6\]

Example (3.4.3)

X is said to have a uniform [A,B] distribution if f(x)=1/(B-A) for A<x<B, 0 otherwise.

Find E[Xk] (this is called the kth moment of X).

\[ \begin{aligned} &E[X^k] =\int_{-\infty}^\infty x^kf(x) dx= \\ &\int_A^B x^k\frac1{B-A}dx = \\ &\frac1{B-A} \frac{x^{k+1}}{k+1}\vert_A^B = \\ &\frac{B^{k+1}-A^{k+1}}{(k+1)(B-A)} = \\ &\frac{(B-A)\sum_{i=0}^k A^iB^{k-i}}{(k+1)(B-A)} = \\ &\frac{\sum_{i=0}^k A^iB^{k-i}}{k+1} \\ \end{aligned} \]


some special expectations are the mean of X defined by \(\mu=EX\) and the variance defined by \(\sigma^2=V(X)=E(X-\mu)^2\). Related to the variance is the standard deviation \(\sigma\), the square root of the variance.

Theorem (3.4.4)

  1. E[aX+b]=aE[X]+b
  2. E[X+Y]=E[X]+E[Y]
  3. var(aX+b)=a2var(X)
  4. var(X)=E[X2]-(E[X])2

the last one is a useful formula for finding the variance and/or the standard deviation.

Example (3.4.5)

find the mean and the standard deviation of a uniform [A,B] r.v.

\[ \begin{aligned} &E[X] = \frac{\sum_{i=0}^1 A^iB^{k-i}}{1+1} = \frac{A+B}2\\ &E[X^2] = \frac{\sum_{i=0}^2 A^iB^{k-i}}{2+1} = \frac{A^2+AB+B^2}3\\ &var(X) = \frac{A^2+AB+B^2}3 - ( \frac{A+B}2)^2= \frac{(B-A)^2}{12}\\ \end{aligned} \]

and so \(\sigma=(B-A)/\sqrt{12}\)

Example (3.4.6)

Find the mean and the standard deviatiton of an exponential rv with rate \(\lambda\).

\[ \begin{aligned} &E[X^k] =\int_{-\infty}^\infty x^kf(x) dx= \\ &\int_{0}^\infty x^k \lambda e^{-\lambda x} dx = \\ &-x^k e^{-\lambda x}\vert_0^\infty-\int_{0}^\infty -kx^{k-1} e^{-\lambda x} dx = \\ &k\int_{0}^\infty x^{k-1} e^{-\lambda x} dx = \\ &\frac{k}{\lambda}\int_{0}^\infty x^{k-1} \lambda e^{-\lambda x} dx = \\ &\frac{k}{\lambda}E[X^{k-1}] \\ &\mu=E[X]=\frac{1}{\lambda}E[X^{0}]=\frac{1}{\lambda} \\ &E[X^2]=\frac{2}{\lambda}E[X]=\frac{2}{\lambda^2} \\ &var(X) = E[X^2]-(E[X])^2 = \\ &\frac{2}{\lambda^2} - (\frac{1}{\lambda})^2 = \frac{1}{\lambda^2} \end{aligned} \]


One way to “link” probabilities and expectations is via the indicator function \(I_A(x)\) defined as

\[ I_A(x)=\left\{ \begin{array}{ccc} 1 & \text{if } &x\in A \\ 0 & \text{if }&x\notin A \end{array} \right. \]

because with this we have for a continuous r.v. X with f:

\[E[I_A(X)]=\int_{-\infty}^\infty I_A(x)f(x)dx=\int_A f(x)dx=P(X \in A)\]

Expectations of Random Vectors

The definition of expectation easily generalizes to random vectors:

Example (3.4.7)

Let (X,Y) be a discrete random vector with

\(f(x,y) = 8xy, 0\le x \le y \le 1\)

Find \(E[XY]\)

\[ \begin{aligned} &E[XY] = \int_{-\infty}^\infty \int_{-\infty}^\infty xyf(x,y) dxdy = \\ &\int_0^1 \int_0^y8xy xy dx dy = \\ &\int_0^1 \int_0^y8x^2y^2 dx dy = \\ &\int_0^1 8y^2 \left[x^3/3\vert_0^y\right] dy = \\ &\int_0^1 8y^2 \left[y^3/3\right] dy = \\ &\int_0^1 8y^5/3 dy = \\ &4y^6/9\vert_0^1 = 4/9 \end{aligned} \]

Covariance and Correlation

Definition (3.4.8)

The covariance of two r.v. X and Y is defined by

\[cov(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]\]

The correlation of X and Y is defined by

\[cor(X,Y)=\frac{cov(X,Y)}{\sigma_X \sigma_Y}\]

Note cov(X,X) = var(X)

As with the variance we have a simpler formula for actual calculations:

Theorem (3.4.9)

\[cov(X,Y) = E(XY) - (EX)(EY)\]

Example (3.4.10)

take the example of the sum and absolute value of the difference of two rolls of a die. What is the covariance of X and Y?

So we have

\[\mu_X = E[X] = 2*1/36 + 3*2/36 + ... + 12*1/36 = 7.0\\ \mu_Y = E[Y] = 0*6/36 + 1*12/36 + ... + 5*2/36 = 70/36\\ E[XY] = 0*2*1/36 + 1*2*0/36 + .2*2*0/36.. + 5*12*0/36 = 490/36\\ cov(X,Y) = EXY-EXEY = 490/36 - 7.0*70/36 = 0\]

Note that we previously saw that X and Y are not independent, so we here have an example that a covariance of 0 does not imply independence! It does work the other way around, though:

Theorem (3.4.11)

If X and Y are independent, then \(cov(X,Y) = cor(X,Y)=0\).

proof (in the case of X and Y continuous):

\[ \begin{aligned} &E[XY] = \iint_{R^2} xyf(x,y) d(x,y) = \\ &\int_{-\infty}^\infty \int_{-\infty}^\infty xy f(x,y)dxdy = \\ &\int_{-\infty}^\infty \int_{-\infty}^\infty xy f_X(x)f_Y(y)dxdy = \\ &\int_{-\infty}^\infty yf_Y(y) \left(\int_{-\infty}^\infty x f_X(x)dx\right)dy = \\ & \left(\int_{-\infty}^\infty x f_X(x)dx \right)\left(\int_{-\infty}^\infty y f_Y(y)dy \right)= \\ &E[X]E[Y] \end{aligned} \]

and so cov(X,Y) = EXY-EXEY = EXEY - EXEY = 0

Example (3.4.12)

we have continuous rv’s X and Y with joint

\[f(x,y)=8xy\text{, } 0 \le x<y \le 1\]

Find the covariance and the correlation of X and Y.

\[ \begin{aligned} &E[X^kY^j] = \int_{-\infty}^\infty \int_{-\infty}^\infty x^ky^jf(x,y) dxdy = \\ &\int_0^1 \int_0^y8x^ky^j xy dx dy = \\ &\int_0^1 \int_0^y8x^{k+1}y^{j+1} dx dy = \\ &\int_0^1 8y^{j+1} \left[x^{k+2}/(k+2)\vert_0^y\right] dy = \\ &\int_0^1 8y^{j+1} \left[y^{k+2}/(k+2)\right] dy = \\ &\int_0^1 8y^{j+k+3}/(k+2) dy = \\ &8y^{j+k+4}/(k+2)/(j+k+4)\vert_0^1 = \\ &\frac8{(k+2)(j+k+4)} \end{aligned} \]

therefore

\[ \begin{aligned} &E[X] = E[X^1Y^0]=\frac8{(1+2)(1+0+4)} = \frac{8}{15}\\ &E[X^2] = E[X^2Y^0]=\frac8{(2+2)(2+0+4)} = \frac{1}{3}\\ &var(X) = E[X^2]-(E[X])^2 = \frac{1}{3} - (\frac{8}{15})^2 = \frac{11}{225}\\ &E[Y] = E[X^0Y^1]=\frac8{(0+2)(0+1+4)} = \frac{4}{5}\\ &E[Y^2] = E[X^0Y^2]=\frac8{(0+2)(0+2+4)} = \frac{2}{3}\\ &var(Y) = E[Y^2]-(E[Y])^2 = \frac{2}{3} - (\frac{5}{5})^2 = \frac{12}{75}\\ &E[XY] = E[X^1Y^1]=\frac8{(1+2)(1+1+4)} = \frac{4}{9}\\ &cov(X,Y) = E[XY]-E[X]E[Y] = \frac{4}{9}-\frac{8}{15}\frac{4}{5}=\frac{12}{675} \\ &cor(X,Y) = \frac{cov(X,Y)}{\sqrt{var(X)var(Y)}} = 0.492 \end{aligned} \]


We saw above that E[X+Y] = E[X] + E[Y]. How about var(X+Y)?

Theorem (3.4.13)

\[var(X+Y)=var(X)+var(Y)+2cov(X,Y)\]

if \(X \perp Y\) we have var(X+Y) = var(X) + var(Y)

proof

\[ \begin{aligned} &var(X+Y) = E[(X+Y)^2]-(E[X+Y])^2 = \\ &E[X^2+2XY+Y^2]-(E[X]^2+2E[X]E[Y] +E[Y]^2) = \\ &E[X^2]+2E[XY]+E[Y^2]-E[X]^2-2E[X]E[Y] -E[Y]^2 = \\ &(E[X^2]-E[X]^2)+(E[Y^2]-E[Y]^2)+ 2E([XY]-2E[X]E[Y]) = \\ &var(X)+var(Y)+2cov(X,Y) \end{aligned} \]

Conditional Expectation and Variance

Definition (3.4.14)

Say X|Y=y is a conditional r.v. with density f. Then the conditional expectation of X|Y=y is defined by

\[ \begin{aligned} &E[g(X)|Y=y]=\sum_x g(x)f_{X|Y=y}(x|y) \text{ if X discrete} \\ &E[g(X)|Y=y]=\int_{- \infty}^\infty g(x)f_{X|Y=y}(x|y) dx \text{ if X continuous} \\ \end{aligned} \]

Let E[X|Y] denote the function of the random variable Y whose value at Y=y is given by E[X|Y=y]. Note then Z=E[X|Y] is itself a random variable.

Example (3.4.15)

An urn contains 2 white and 3 black balls. We pick two balls from the urn. Let X be denote the number of white balls chosen. An additional ball is drawn from the remaining three. Let Y equal 1 if the ball is white and 0 otherwise.

For example

\(f(0,0) = P(X=0,Y=0) = 3/5*2/4*1/3 = 1/10\).

The complete is given by:

x=0 x=1 x=2
y=0 0.1 0.4 0.1
y=1 0.2 0.2 0.0
The marginals are given by
x P(X=x)
1 x=0 0.3
2 x=1 0.6
3 x=2 0.1
y P(Y=y)
1 y=0 0.6
2 y=1 0.4

The conditional distribution of X|Y=0 is

x P(X=x|Y=0)
1 0 1/6
2 1 2/3
3 2 1/6

and so \(E[X|Y=0] = 0*1/6+1*2/3+2*1/6 = 1.0\).

The conditional distribution of X|Y=1 is

x P(X=x|Y=1)
1 0 1/2
2 1 1/2
3 2 0

and so \(E[X|Y=1] = 0*1/2+1*1/2+2*0 = 1/2\).

Finally the conditional r.v. Z = E[X|Y] has

z P(Z=z)
1 1 3/5
2 1/2 2/5

with this we can find \(E[Z] = E[E[X|Y]] = 1*3/5+1/2*2/5 = 4/5\).

How about using simulation to do these calculations? - program urn1

 urn1 <- function (n = 2, m = 3, draws = 2, B = 10000) {
    u <- c(rep("w", n), rep("b", m))
    x <- rep(0, B)
    y <- x
    for (i in 1:B) {
        z <- sample(u, draws + 1)
        y[i] <- ifelse(z[draws + 1] == "w", 1, 0)
        for (j in 1:draws) 
          x[i] <- x[i] + ifelse(z[j] == "w", 1, 0)
    }
    print("Joint pdf:")
    print(round(table(y, x)/B, 3))
    print("pdf of X:")
    print(round(table(x)/B, 3))
    print("pdf of Y:")
    print(round(table(y)/B, 3))
    print("pdf of X|Y=0:")
    x0 <- table(x[y == 0])/length(y[y == 0])
    print(round(x0, 3))
    print("E[X|Y=0]:")
    print(sum(c(0:draws) * x0))
    print("pdf of X|Y=1:")
    x1 <- table(x[y == 1])/length(y[y == 1])
    print(round(x1, 3))
    print("E[X|Y=1]:")
    print(sum(c(0:1) * x1))
    
 }
urn1()
## [1] "Joint pdf:"
##    x
## y       0     1     2
##   0 0.098 0.401 0.103
##   1 0.197 0.202 0.000
## [1] "pdf of X:"
## x
##     0     1     2 
## 0.294 0.603 0.103 
## [1] "pdf of Y:"
## y
##     0     1 
## 0.601 0.399 
## [1] "pdf of X|Y=0:"
## 
##     0     1     2 
## 0.163 0.666 0.171 
## [1] "E[X|Y=0]:"
## [1] 1.008314
## [1] "pdf of X|Y=1:"
## 
##     0     1 
## 0.493 0.507 
## [1] "E[X|Y=1]:"
## [1] 0.5067737

Example (3.4.16)

We have continuous rv’s X and Y with joint \(f(x,y)=8xy, 0 \le x<y \le 1\). We have found \(f_Y(y) = 4y^3, 0<y<1\), and \(f_{X|Y=y}(x|y) = 2x/y^2, 0 \le x \le y\). So

\[ \begin{aligned} &f_Y(y) =\int_0^y 8xy dx = 4x^2y\vert_0^y = 4y^3;0<y<1\\ &f_{X|Y=y}(x|y) =\frac{f(x,y)}{f_Y)y} = \frac{8xy}{4y^3} = \frac{2x}{y^2};0<x<y \\ &E[X|Y=y] =\int_{-\infty}^\infty x f_{X|Y=y}(x|y)dx = \int_0^y x \frac{2x}{y^2}dx = \\ &\frac{2}{y^2}\int_0^y x^2dx = \frac{2}{4y^2} [x^3/3\vert_0^y = \frac{2y^3}{3y^2} = \frac{2y}3 \end{aligned} \]

Throughout this calculation we treated y as a constant. Now, though, we can change our point of view and consider \(E[X|Y=y] = 2y/3\) as a function of y:

\[g(y)=E[X|Y=y]=2y/3\]

What are the values of y? Well, they are the observations we might get from the rv. Y, so we can also write

\[g(Y)=E[X|Y=Y]=2Y/3\]

but Y is a rv, then so is 2Y/3, and we see that we can define a rv

\[Z=g(Y)=E[X|Y]\]

Recall that the expression \(f_{X|Y}\) does not make sense. Now we see that on the other hand the expression \(E[X|Y]\) makes perfectly good sense!

There are very useful formulas for the expectation and variance of conditional r.v.s:

Theorem (3.4.17)

We have

  1. \(E[X] = E\{E[X|Y]\}\)

  2. \(var(X) = E[var(X|Y)] + var(E[X|Y])\)

Example (3.4.18)

Say \(Y\sim U[0,1]\) and \(X|Y=y\sim Exp(y+1)\), then

\[ \begin{aligned} &E[X|Y=y] = \frac1{y+1} \\ &E[X] = E\{E[X|Y\}] = E[\frac1{Y+1}] =\\ &\int_0^1 \frac1{y+1} dy = \log(y+1)\vert_0^1=\log2\\ &var(X|Y=y) = \frac1{(y+1)^2} \\ &var(X) = E[var(X|Y)] + var(E[X|Y]) =\\ &E[\frac1{(Y+1)^2}]+var(\frac1{Y+1}) = \\ &E[\frac1{(Y+1)^2}]+E[(\frac1{Y+1})^2]-(E[\frac1{Y+1}])^2 = \\ &2E[\frac1{(Y+1)^2}]-(\log 2)^2 = \\ &2\int_0^1 \frac1{(y+1)^2} dy-(\log 2)^2 = \\ &2[-\frac1{y+1}\vert_0^1 -(\log 2)^2 = \\ &2[1-\frac1{2}] -(\log 2)^2 = \\ &1-(\log 2)^2 = 0.52 \end{aligned} \]

let’s check:

y=runif(1e5)
x=rexp(1e5, y+1)
round(c(log(2), mean(x)), 3)
## [1] 0.693 0.692
round(c(1-log(2)^2,var(x)), 3)
## [1] 0.520 0.514