Definition
The expectation (or expected value) of a random variable g(X) is defined by
Say X is the sum of two dice. What is EX? What is EX2
we have
x | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|
P(X=x) | 1/36 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 | 5/36 | 4/36 | 3/36 | 2/36 | 1/36 |
so
EX=2×1/36+3×2/36+4×3/36+5×4/36+6×5/36+7×6/36+8×5/36+9×4/36+10×3/36+11×2/36+12×3/36 = 7
EX2=22×1/36+32×2/36+42×3/36+52×4/36+62×5/36+72×6/36+82×5/36+92×4/36+102×3/36+112×2/36+122×3/36 = 54.83
we roll fair die until the first time we get a six. What is the expected number of rolls?
We saw that f(x) = 1/6*(5/6)x-1 if x\(\in\) {1,2,..}. Here we just have g(x)=x, so
How do we compute this sum? Here is a “standard” trick:
and so we find
This is a special example of a geometric rv, that is a discrete rv X with density f(x)=p(1-p)x-1, x=1,2,.. Note that if we replace 1/6 above with p, we can show that
X is said to have a uniform [A,B] distribution if f(x)=1/(B-A) for A<x<B, 0 otherwise. We denote a uniform [A,B] rv by X~U[A,B}
Find EXk (this is called the kth moment of X).
some special expectations are the mean of X defined by
μ=EX
and the variance defined by
σ2 = Var(X) = E(X-μ)2
Related to the variance is the standard deviation σ, the square root of the variance.
Proposition
Proposition
Let X and Y be rv’s and g and h functions on \(\mathbb{R}\). Then if X\(\perp\)Y we have
Eg(X)h(Y) = Eg(X)×Eh(Y)
There is a useful way to“link” probabilities and expectations is via the indicator function IA defined as
because with this we have for a continuous r.v. X with density f:
Definition
The covariance of two r.v. X and Y is defined by
cov(X,Y)=E[(X-μX)(Y-μY)]
The correlation of X and Y is defined by
cor(X,Y)=cov(X,Y)/(σXσY)
Note cov(X,X) = Var(X)
Proposition
cov(X,Y) = E(XY) - (EX)(EY)
take the example of the sum and absolute value of the difference of two rolls of a die. What is the covariance of X and Y?
We have
X.Y | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
2 | 1 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 2 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 2 | 0 | 0 | 0 |
5 | 0 | 2 | 0 | 2 | 0 | 0 |
6 | 1 | 0 | 2 | 0 | 2 | 0 |
7 | 0 | 2 | 0 | 2 | 0 | 2 |
8 | 1 | 0 | 2 | 0 | 2 | 0 |
9 | 0 | 2 | 0 | 2 | 0 | 0 |
10 | 1 | 0 | 2 | 0 | 0 | 0 |
11 | 0 | 2 | 0 | 0 | 0 | 0 |
12 | 1 | 0 | 0 | 0 | 0 | 0 |
so
μX = EX = 2*1/36 + 3*2/36 + … + 12*1/36 = 7.0
μY = EY = 0*6/36 + 1*12/36 + … + 5*2/36 = 70/36
EXY = 0*2*1/36 + 1*2*0/36 + .2*2*0/36.. + 5*12*0/36 = 490/36
and so cov(X,Y) = EXY-EXEY = 490/36 - 7.0*70/36 = 0
Note that we previously saw that X and Y are not independent, so we here have an example that a covariance of 0 does not imply independence! It does work the other way around, though:
Proposition
If X and Y are independent, then cov(X,Y) = 0
say the rv (X,Y) has joint density f(x,y)=c if 0<x<y<1, 0 otherwise. Find the correlation of X and Y.
We have previously done a more general problem (with 0<x<yp<1) and saw there that c=p+1=2 and fY(y)=2y, 0<y<1. Now
Proposition
Var(X+Y) = VarX + VarY + 2Cov(X,Y)
and if X\(\perp\)Y then
Var(X+Y) = VarX + VarY
This formula is the basis of what are called variance-reduction methods. If we can find a rv Y which is negatively correlated with X then the variance of X+Y might be smaller than the variance of X alone.
The above formulas generalize easily to more than two random variables
Proposition
Let X1, .., Xn be rv, then
At a party n people put their hats in the center of the room where the hats are mixed together. Each person then randomly selects a hat. We are interested in the mean and the variance of of the number people who get their own hat.
Let this number be X, and let’s write X = X1+..+Xn, where Xi is 1 if the kth person selects their own hat and 0 if they do not.
Now the ith person is equally likely to select any of the n hats, so P(Xi=1)=1/n, and so
EXi = 0×(n-1)/n +1×1/n =1/n
There is an even simpler way of doing this: Xi is an indicator rv, and so EXi = P(Xi=1) = 1/n
For the variance we have
EXi2 = 02×(n-1)/n +12×1/n =1/n
and so
VarXi = EXi2 - (EXi)2 = 1/n - (1/n)2 = (n-1)/n2.
Also
EXiXj = P(Xi×Xj=1) =
P(Xi = 1, Xj=1) =
P(Xi = 1) × P(Xj=1|Xi=1) =
1/n × 1/(n-1)
again because XiXj is an indicator rv. So
Cov(XiXj) =
EXiXj - EXi×EXj =
1/n(n-1) - (1/n)2 =
1/n2(n-1)
Finally
EX = E(X1+..+Xn) = EX1+..+EXn = n×1/n =1
and
It is interesting to see that E[X] = Var(X) =1, independent of n! Let’s make sure we got this right and check a few simple cases:
n=1: there is just one person and one hat, so P(X=1)=1, so E[X]=1, but Var(X) = E[(X-1)2]=0, so actually something is wrong
What is it?
How about n=2? now there are two people and they either both get their hats or neither does (they get each others hats). So
P(X1=0,X2=0) = P(X1=1,X2=1) = 1/2
P(X1=0,X2=1) = P(X1=0,X2=1) = 0
so
E[X1+X2] = 2*P(X1=1,X2=1) = 2*1/2 = 1
E[(X1+X2)2] = 22*P(X1=1,X2=1) = 4*1/2 = 2
Var(X1+X2) = E[(X1+X2)2] - E[X1+X2]2 = 2-12 = 1
Say X|Y=y is a conditional r.v. with density (pdf) fX|Y=y. As was stated earlier, conditional rv’s are also just rv’s, so they have expectations as well, given by
We can think of π(y) = E[X|Y=y] as a function of y, that is if we know the joint density f(x,y) then for a fixed y we can compute π(y). But y is the realization of the random variable Y, so Z = π(Y) = E[X|Y] is a random variable as well.
Remember we do not have an object “X|Y”, only “X|Y=y”, but now we do have an object E[X|Y]
An urn contains 2 white and 3 black balls. A random sample of size 2 is chosen. Let X be denote the number of white balls in the sample. An additional ball is drawn from the remaining six. Let Y equal 1 if the ball is white and 0 otherwise.
For example f(0,0) = P(X=0,Y=0) = 3/5*2/4*1/3 = 1/10. the complete density is given by:
y.x | 0 | 1 | 2 |
---|---|---|---|
0 | 1/10 | 2/5 | 1/10 |
1 | 1/5 | 1/5 | 0 |
The marginals are given by
x | 0 | 1 | 2 |
---|---|---|---|
fX(x) | 3/10 | 3/5 | 1/10 |
and
y | 0 | 1 |
---|---|---|
fY(y) | 3/5 | 2/5 |
The conditional distribution of X|Y=0 is
x | 0 | 1 | 2 |
---|---|---|---|
fX|Y=0(x|0) | 1/6 | 2/3 | 1/6 |
and so E[X|Y=0] = 0*1/6+1*2/3+2*1/6 = 1.0
The conditional distribution of X|Y=1 isy | 0 | 1 | 2 |
---|---|---|---|
fX|Y=1(y|1) | 1/2 | 1/2 | 0 |
and so E[X|Y=1] = 0*1/2+1*1/2+2*0 = 1/2
Finally the conditional r.v. Z = E[X|Y] has densityz | 1/2 | 1 |
---|---|---|
fZ(z) | 2/5 | 3/5 |
with this we can find E[Z] = E[E[X|Y]] = 1*3/5+1/2*2/5 = 4/5
Theorem
E[X] = E[E[X|Y]]
and
Var(X) = E[Var(X|Y)] + Var[E(X|Y)]
We found EZ = E[EX|Y]] = 4/5. Now E[X] = 0*3/10 + 1*3/5 + 2*1/10 = 4/5
Let’s go back to the example above, where we had a continuous rv with joint pdf f(x,y) = 6x, 0≤x≤y≤1, 0 otherwise
Now
and
Let’s have another look at the hat matching problem. Suppose that those choosing their own hats leave, while the others put their hats back into the center and do the exercise again. This process continuous until everybody has his or her own hat. Find E[Rn], where Rn is the number of rounds needed.
Given that in each round on average one person gets their own hat and then leaves, we might suspect that E[Rn]=n. Let’s proof this by induction on n.
Let n=1, that is there is just one person, and clearly they pick up their own hat, so E[Rn]=1
Assume E[Rk]=k \(\forall\) k<n. Let M be the number of matches that occur in the first round. Clearly M\(\in\) {0,1, .. ,n}
and we are done.
Note that we solved this problem without ever finding P(M=0), which is also a non-trivial problem. Here is how it is done:
Of course the probability depends on n, so let’s use the notation
pn =P(M=0 | there are n people)
We will condition on whether or not the first person selects their own hat. Call the event “first person selects their own hat” E. Then
pn = P(M=0) = P(M=0|E)P(E) + P(M=0|Ec)P(Ec)
Now
P(M=0|E)=0
because E means at least one person got their own hat, and so
pn = P(M=0|Ec)(n-1)/n
Ec means the first person selects a hat that does not belong to them. So now there are n-1 hats in the center, one of which belongs to the first person. There are still n-1 people to pick hats, one of which has no hat in the center because the first person took it.
So P(M=0|Ec) is the probability of no matches when n-1 people select from a set of n-1 hats that does not contain the hat of one of them. This can happen in either of two mutually exclusive ways:
• there are no matches and the extra person does not select the extra hat. This has probability pn-1. (think of the extra hat as to belong to the extra person)
• there are no matches and the extra person does select the extra hat. This has probability 1/(n-1)pn-2 because the extra person needs to choose the extra hat (1/n-1), and then there are n-2 people and their n-2 hats left.
So now we have
P(M=0|Ec) = pn-1 + 1/(n-1)pn-2 and
pn = (n-1)/npn-1 + 1/npn-2
This is called a recursive relationship. We can solve it via induction as follows:
so no matter how many people are present, there is always 63% chance of somebody getting their own hat.
Definition
Let X be 1 with probability p and 0 with probability q=1-p. Find ψ(t)
ψ(t) = E[etX] = et0q + et1p = q+etp
X is called a Bernoulli rv. with success parameter p, denoted by X~Ber(p)
Say X~Exp(λ). Find ψ(t)
which explains the name moment generating function, although actually finding moments in this way is like killing a fly with a cannon! The main usefulness of moment generating functions is as a tool for proving theorems, and usually uses the following
Proposition
If X1, .., Xn are independent random variables with the same pdf (density) f we say X1, .., Xn are iid.
Say X1, .., Xn are iid Ber(p). Let X=∑Xi, then
Theorem
If ψ(t) exists in an open neighborhood of 0, then it determines f.
Proof much to complicated for us.
A discrete rv. X is said to have a Poisson distribution with rate λ if it has density f(x) = (λx/x!)e-λ, x=0, 1, 2, … Now
Say X1, .., Xn are iid Poisson rate λ, then
but this is again the moment generating function of a Poisson rv, this one with rate nλ. So by the uniqueness theorem we have shown that ∑Xi has a Poisson distribution with rate nλ.
Proposition
Let X be a rv with mgf ψ(t). Then Y=aX+b has mgf ψY(at)ebt
Proof