Theorem (Markov’s Inequality)
If X takes on only nonnegative values, then for any a>0
Theorem (Chebyshev’s Inequality)
Let X be a rv with mean μ and standard deviation σ, let k>0, then
Theorem (Chernoff Bounds)
Let X be a rv with moment generating function ψ(t) = E[etX]. Then by Markov’s inequality for any a>0
P(X≥a) ≤ e-taψ(t) \(\forall\) t>0
P(X≤a) ≤ e-taψ(t) \(\forall\) t<0
Proof For t>0
The proof for t<0 is similar
say Z~N(0,1), then
Theorem (Jensen’s Inequality)
If f is a convex function then E[f(X)]≥f(EX).
Definition
Let X, X1, X2, .. be random variables. We say
Xn → X in mean iff E[Xn] → E[X]
Xn → X in quadratic mean iff E[Xn] → E[X] and Var[Xn] → 0
(note that this implies that the limit has to be a constant)
Xn → X in distribution (or in law) iff FXn~~(x) → FX(x) \(\forall\) x where FX is continuous
Xn → X in probability iff \(\forall\) ε>0 P(|Xn-X|>ε) → 0
Xn → X almost surely iff there exists a set N such that Xn(ω)→X(ω) \(\forall\) ω\(\in\) S\N and P(N)=0 (Xn(ω) converges to X(ω) for all ω’s except maybe for a set of probability 0)
say X1,X2,.. are iid with P(Xn=0) = P(Xn=2) = ½(1-1/n), P(Xn=1) = 1/n. Let X be a rv with P(X=0)=P(X=2)=1/2. Now
and this last probability depends on the joint distribution of (Xn,X). Note the if Xn is independent of X we have
and so we don’t have convergence in probability. This is always true.
Let’s say the joint density is given by
Then
The last one is a bit vague. Generally showing convergence almost surely is much harder because it requires some measure theory, (here we would have to go back and “invent” a sample space)
say X1,X2,.. are iid U[0.1] Let Mn=max{X1,..,Xn}. Let δxbe the point mass at x, that is the random variable with P(δx=x)=1. Then
How about almost sure convergence? It does in fact hold here as well.
Theorem
convergence in quadratic mean implies convergence in probability.
convergence in probability implies convergence in distribution. The reverse is true if the limit is a constant.
almost sure convergence implies convergence in probability, but not vice versa
Theorem (Weak Law of Large Numbers)
Let X1, X2, … be a sequence of independent and identically distributed (iid) r.v.’s having mean μ. Let Zn= (X1+..+Xn)/n. Then Zn→μ in probability.
Proof can be found in any probability text book
Theorem (Central Limit Theorem)
Let X1, X2, .. be an iid sequence of r.v.’s with mean μ and standard deviation σ. Define the sample mean by
=(X1+..+Xn)/n
Then
√n(-μ)/σ→Z in distribution
let’s study the CLT on a very simple example: say X1,X2,.. are iid Ber(p). Now
The CLT is not so much a theorem as a family of theorems. Any of the conditions (the same means, the same standard deviations, independence) can be relaxed considerably, and still the result holds. Unfortunately there is no single set of necessary conditions, so there are many theorems for different situations!