Inequalities and Limit Theorems

Inequalities

Theorem (Markov’s Inequality)

If X takes on only nonnegative values, then for any a>0

Theorem (Chebyshev’s Inequality)

Let X be a rv with mean μ and standard deviation σ, let k>0, then

Theorem (Chernoff Bounds)

Let X be a rv with moment generating function ψ(t) = E[etX]. Then by Markov’s inequality for any a>0
P(X≥a) ≤ e-taψ(t) \(\forall\) t>0
P(X≤a) ≤ e-taψ(t) \(\forall\) t<0

Proof For t>0

The proof for t<0 is similar

Example

say Z~N(0,1), then

Theorem (Jensen’s Inequality)

If f is a convex function then E[f(X)]≥f(EX).

Limit Theorems

Definition

Let X, X1, X2, .. be random variables. We say

Xn → X in mean iff E[Xn] → E[X]

Xn → X in quadratic mean iff E[Xn] → E[X] and Var[Xn] → 0

(note that this implies that the limit has to be a constant)

Xn → X in distribution (or in law) iff FXn~~(x) → FX(x) \(\forall\) x where FX is continuous

Xn → X in probability iff \(\forall\) ε>0 P(|Xn-X|>ε) → 0

Xn → X almost surely iff there exists a set N such that Xn(ω)→X(ω) \(\forall\) ω\(\in\) S\N and P(N)=0 (Xn(ω) converges to X(ω) for all ω’s except maybe for a set of probability 0)

Example

say X1,X2,.. are iid with P(Xn=0) = P(Xn=2) = ½(1-1/n), P(Xn=1) = 1/n. Let X be a rv with P(X=0)=P(X=2)=1/2. Now

and this last probability depends on the joint distribution of (Xn,X). Note the if Xn is independent of X we have

and so we don’t have convergence in probability. This is always true.

Let’s say the joint density is given by

Then

The last one is a bit vague. Generally showing convergence almost surely is much harder because it requires some measure theory, (here we would have to go back and “invent” a sample space)

Example

say X1,X2,.. are iid U[0.1] Let Mn=max{X1,..,Xn}. Let δxbe the point mass at x, that is the random variable with P(δx=x)=1. Then

How about almost sure convergence? It does in fact hold here as well.

Theorem

  1. convergence in quadratic mean implies convergence in probability.

  2. convergence in probability implies convergence in distribution. The reverse is true if the limit is a constant.

  3. almost sure convergence implies convergence in probability, but not vice versa

Theorem (Weak Law of Large Numbers)

Let X1, X2, … be a sequence of independent and identically distributed (iid) r.v.’s having mean μ. Let Zn= (X1+..+Xn)/n. Then Zn→μ in probability.

Proof can be found in any probability text book

Theorem (Central Limit Theorem)

Let X1, X2, .. be an iid sequence of r.v.’s with mean μ and standard deviation σ. Define the sample mean by

=(X1+..+Xn)/n

Then

√n(-μ)/σ→Z in distribution

Example

let’s study the CLT on a very simple example: say X1,X2,.. are iid Ber(p). Now

The CLT is not so much a theorem as a family of theorems. Any of the conditions (the same means, the same standard deviations, independence) can be relaxed considerably, and still the result holds. Unfortunately there is no single set of necessary conditions, so there are many theorems for different situations!