Expectations of Random Variables and Random Vectors

Expectation and Variance

Definition

The expectation (or expected value) of a random variable g(X) is defined by

Example

Say X is the sum of two dice. What is EX? What is EX2

we have

x 2 3 4 5 6 7 8 9 10 11 12
P(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

so

EX=2×1/36+3×2/36+4×3/36+5×4/36+6×5/36+7×6/36+8×5/36+9×4/36+10×3/36+11×2/36+12×3/36 = 7

EX2=22×1/36+32×2/36+42×3/36+52×4/36+62×5/36+72×6/36+82×5/36+92×4/36+102×3/36+112×2/36+122×3/36 = 54.83

Example

we roll fair die until the first time we get a six. What is the expected number of rolls?
We saw that f(x) = 1/6*(5/6)x-1 if x\(\in\) {1,2,..}. Here we just have g(x)=x, so

How do we compute this sum? Here is a “standard” trick:

and so we find

This is a special example of a geometric rv, that is a discrete rv X with density f(x)=p(1-p)x-1, x=1,2,.. Note that if we replace 1/6 above with p, we can show that

Example

X is said to have a uniform [A,B] distribution if f(x)=1/(B-A) for A<x<B, 0 otherwise. We denote a uniform [A,B] rv by X~U[A,B}
Find EXk (this is called the kth moment of X).

some special expectations are the mean of X defined by

μ=EX

and the variance defined by

σ2 = Var(X) = E(X-μ)2

Related to the variance is the standard deviation σ, the square root of the variance.

Proposition

Proposition

Let X and Y be rv’s and g and h functions on \(\mathbb{R}\). Then if X\(\perp\)Y we have

Eg(X)h(Y) = Eg(X)×Eh(Y)

There is a useful way to“link” probabilities and expectations is via the indicator function IA defined as

because with this we have for a continuous r.v. X with density f:

Covariance and Correlation

Definition

The covariance of two r.v. X and Y is defined by

cov(X,Y)=E[(X-μX)(Y-μY)]

The correlation of X and Y is defined by

cor(X,Y)=cov(X,Y)/(σXσY)

Note cov(X,X) = Var(X)

Proposition

cov(X,Y) = E(XY) - (EX)(EY)

Example

take the example of the sum and absolute value of the difference of two rolls of a die. What is the covariance of X and Y?
We have

X.Y 0 1 2 3 4 5
2 1 0 0 0 0 0
3 0 2 0 0 0 0
4 1 0 2 0 0 0
5 0 2 0 2 0 0
6 1 0 2 0 2 0
7 0 2 0 2 0 2
8 1 0 2 0 2 0
9 0 2 0 2 0 0
10 1 0 2 0 0 0
11 0 2 0 0 0 0
12 1 0 0 0 0 0

so

μX = EX = 2*1/36 + 3*2/36 + … + 12*1/36 = 7.0
μY = EY = 0*6/36 + 1*12/36 + … + 5*2/36 = 70/36
EXY = 0*2*1/36 + 1*2*0/36 + .2*2*0/36.. + 5*12*0/36 = 490/36
and so cov(X,Y) = EXY-EXEY = 490/36 - 7.0*70/36 = 0

Note that we previously saw that X and Y are not independent, so we here have an example that a covariance of 0 does not imply independence! It does work the other way around, though:

Proposition

If X and Y are independent, then cov(X,Y) = 0

Example

say the rv (X,Y) has joint density f(x,y)=c if 0<x<y<1, 0 otherwise. Find the correlation of X and Y.

We have previously done a more general problem (with 0<x<yp<1) and saw there that c=p+1=2 and fY(y)=2y, 0<y<1. Now

Proposition

Var(X+Y) = VarX + VarY + 2Cov(X,Y)

and if X\(\perp\)Y then

Var(X+Y) = VarX + VarY


This formula is the basis of what are called variance-reduction methods. If we can find a rv Y which is negatively correlated with X then the variance of X+Y might be smaller than the variance of X alone.

The above formulas generalize easily to more than two random variables

Proposition

Let X1, .., Xn be rv, then

Example (The Matching Problem)

At a party n people put their hats in the center of the room where the hats are mixed together. Each person then randomly selects a hat. We are interested in the mean and the variance of of the number people who get their own hat.
Let this number be X, and let’s write X = X1+..+Xn, where Xi is 1 if the kth person selects their own hat and 0 if they do not.
Now the ith person is equally likely to select any of the n hats, so P(Xi=1)=1/n, and so

EXi = 0×(n-1)/n +1×1/n =1/n

There is an even simpler way of doing this: Xi is an indicator rv, and so EXi = P(Xi=1) = 1/n
For the variance we have

EXi2 = 02×(n-1)/n +12×1/n =1/n

and so

VarXi = EXi2 - (EXi)2 = 1/n - (1/n)2 = (n-1)/n2.

Also

EXiXj = P(Xi×Xj=1) =
P(Xi = 1, Xj=1) =
P(Xi = 1) × P(Xj=1|Xi=1) =
1/n × 1/(n-1)

again because XiXj is an indicator rv. So

Cov(XiXj) =

EXiXj - EXi×EXj =

1/n(n-1) - (1/n)2 =

1/n2(n-1)

Finally

EX = E(X1+..+Xn) = EX1+..+EXn = n×1/n =1

and

It is interesting to see that E[X] = Var(X) =1, independent of n! Let’s make sure we got this right and check a few simple cases:

n=1: there is just one person and one hat, so P(X=1)=1, so E[X]=1, but Var(X) = E[(X-1)2]=0, so actually something is wrong

What is it?

How about n=2? now there are two people and they either both get their hats or neither does (they get each others hats). So

P(X1=0,X2=0) = P(X1=1,X2=1) = 1/2

P(X1=0,X2=1) = P(X1=0,X2=1) = 0

so

E[X1+X2] = 2*P(X1=1,X2=1) = 2*1/2 = 1

E[(X1+X2)2] = 22*P(X1=1,X2=1) = 4*1/2 = 2

Var(X1+X2) = E[(X1+X2)2] - E[X1+X2]2 = 2-12 = 1


Conditional Expectation

Say X|Y=y is a conditional r.v. with density (pdf) fX|Y=y. As was stated earlier, conditional rv’s are also just rv’s, so they have expectations as well, given by

We can think of π(y) = E[X|Y=y] as a function of y, that is if we know the joint density f(x,y) then for a fixed y we can compute π(y). But y is the realization of the random variable Y, so Z = π(Y) = E[X|Y] is a random variable as well.

Remember we do not have an object “X|Y”, only “X|Y=y”, but now we do have an object E[X|Y]

Example

An urn contains 2 white and 3 black balls. A random sample of size 2 is chosen. Let X be denote the number of white balls in the sample. An additional ball is drawn from the remaining six. Let Y equal 1 if the ball is white and 0 otherwise.
For example f(0,0) = P(X=0,Y=0) = 3/5*2/4*1/3 = 1/10. the complete density is given by:

y.x 0 1 2
0 1/10 2/5 1/10
1 1/5 1/5 0

The marginals are given by

x 0 1 2
fX(x) 3/10 3/5 1/10

and

y 0 1
fY(y) 3/5 2/5

The conditional distribution of X|Y=0 is

x 0 1 2
fX|Y=0(x|0) 1/6 2/3 1/6

and so E[X|Y=0] = 0*1/6+1*2/3+2*1/6 = 1.0

The conditional distribution of X|Y=1 is
y 0 1 2
fX|Y=1(y|1) 1/2 1/2 0

and so E[X|Y=1] = 0*1/2+1*1/2+2*0 = 1/2

Finally the conditional r.v. Z = E[X|Y] has density
z 1/2 1
fZ(z) 2/5 3/5

with this we can find E[Z] = E[E[X|Y]] = 1*3/5+1/2*2/5 = 4/5

Theorem

E[X] = E[E[X|Y]]

and

Var(X) = E[Var(X|Y)] + Var[E(X|Y)]

Example above

We found EZ = E[EX|Y]] = 4/5. Now E[X] = 0*3/10 + 1*3/5 + 2*1/10 = 4/5

Example

Let’s go back to the example above, where we had a continuous rv with joint pdf f(x,y) = 6x, 0≤x≤y≤1, 0 otherwise

Now

and

Example

Let’s have another look at the hat matching problem. Suppose that those choosing their own hats leave, while the others put their hats back into the center and do the exercise again. This process continuous until everybody has his or her own hat. Find E[Rn], where Rn is the number of rounds needed.

Given that in each round on average one person gets their own hat and then leaves, we might suspect that E[Rn]=n. Let’s proof this by induction on n. 

Let n=1, that is there is just one person, and clearly they pick up their own hat, so E[Rn]=1

Assume E[Rk]=k \(\forall\) k<n. Let M be the number of matches that occur in the first round. Clearly M\(\in\) {0,1, .. ,n}


and we are done.

Note that we solved this problem without ever finding P(M=0), which is also a non-trivial problem. Here is how it is done:

Of course the probability depends on n, so let’s use the notation

pn =P(M=0 | there are n people)

We will condition on whether or not the first person selects their own hat. Call the event “first person selects their own hat” E. Then

pn = P(M=0) = P(M=0|E)P(E) + P(M=0|Ec)P(Ec)

Now

P(M=0|E)=0

because E means at least one person got their own hat, and so

pn = P(M=0|Ec)(n-1)/n

Ec means the first person selects a hat that does not belong to them. So now there are n-1 hats in the center, one of which belongs to the first person. There are still n-1 people to pick hats, one of which has no hat in the center because the first person took it.

So P(M=0|Ec) is the probability of no matches when n-1 people select from a set of n-1 hats that does not contain the hat of one of them. This can happen in either of two mutually exclusive ways:

• there are no matches and the extra person does not select the extra hat. This has probability pn-1. (think of the extra hat as to belong to the extra person)

• there are no matches and the extra person does select the extra hat. This has probability 1/(n-1)pn-2 because the extra person needs to choose the extra hat (1/n-1), and then there are n-2 people and their n-2 hats left.

So now we have

P(M=0|Ec) = pn-1 + 1/(n-1)pn-2 and
pn = (n-1)/npn-1 + 1/npn-2

This is called a recursive relationship. We can solve it via induction as follows:


so no matter how many people are present, there is always 63% chance of somebody getting their own hat.


Moment Generating and Characteristic Functions

Definition

  1. The moment generating function of a rv X is defined by ψ(t) = E[etX]
  2. The characteristic function of a rv X is defined by Φ(t) = E[eitX]

Example

Let X be 1 with probability p and 0 with probability q=1-p. Find ψ(t)

ψ(t) = E[etX] = et0q + et1p = q+etp

X is called a Bernoulli rv. with success parameter p, denoted by X~Ber(p)

Example

Say X~Exp(λ). Find ψ(t)

which explains the name moment generating function, although actually finding moments in this way is like killing a fly with a cannon! The main usefulness of moment generating functions is as a tool for proving theorems, and usually uses the following

Proposition

  1. Let X1, .., Xn be independent random variables, with pdf’s (density’s) f1(x), ..,fn(x). Let X=∑Xi. Then

  2. If in addition all the distributions are the same then

If X1, .., Xn are independent random variables with the same pdf (density) f we say X1, .., Xn are iid.

Example

Say X1, .., Xn are iid Ber(p). Let X=∑Xi, then

Theorem

If ψ(t) exists in an open neighborhood of 0, then it determines f.

Proof much to complicated for us.

Example

A discrete rv. X is said to have a Poisson distribution with rate λ if it has density f(x) = (λx/x!)e, x=0, 1, 2, … Now

Say X1, .., Xn are iid Poisson rate λ, then

but this is again the moment generating function of a Poisson rv, this one with rate nλ. So by the uniqueness theorem we have shown that ∑Xi has a Poisson distribution with rate nλ.

Proposition

Let X be a rv with mgf ψ(t). Then Y=aX+b has mgf ψY(at)ebt

Proof