probability5.utf8.md

Inequalities and Limit Theorems

Inequalities

Theorem (Markov’s Inequality)

If X takes on only nonnegative values, then for any a>0

Theorem (Chebyshev’s Inequality)

Let X be a rv with mean μ and standard deviation σ, let k>0, then

Theorem (Chernoff Bounds)

Let X be a rv with moment generating function ψ(t) = E[e^tX]. Then by Markov’s inequality for any a>0
P(X≥a) ≤ e^-taψ(t) \(\forall\) t>0
P(X≤a) ≤ e^-taψ(t) \(\forall\) t<0

Proof For t>0

The proof for t<0 is similar

Example

say Z~N(0,1), then

Theorem (Jensen’s Inequality)

If f is a convex function then E[f(X)]≥f(EX).

Limit Theorems

Definition

Let X, X₁, X₂, .. be random variables. We say

X_n → X in mean iff E[X_n] → E[X]

X_n → X in quadratic mean iff E[X_n] → E[X] and Var[X_n] → 0

(note that this implies that the limit has to be a constant)

X_n → X in distribution (or in law) iff F_Xn~~(x) → F_X(x) \(\forall\) x where F_X is continuous

X_n → X in probability iff \(\forall\) ε>0 P(|X_n-X|>ε) → 0

X_n → X almost surely iff there exists a set N such that X_n(ω)→X(ω) \(\forall\) ω\(\in\) S\N and P(N)=0 (X_n(ω) converges to X(ω) for all ω’s except maybe for a set of probability 0)

Example

say X₁,X₂,.. are iid with P(X_n=0) = P(X_n=2) = ½(1-1/n), P(X_n=1) = 1/n. Let X be a rv with P(X=0)=P(X=2)=1/2. Now

and this last probability depends on the joint distribution of (X_n,X). Note the if X_n is independent of X we have

and so we don’t have convergence in probability. This is always true.

Let’s say the joint density is given by

Then

The last one is a bit vague. Generally showing convergence almost surely is much harder because it requires some measure theory, (here we would have to go back and “invent” a sample space)

Example

say X₁,X₂,.. are iid U[0.1] Let M_n=max{X₁,..,X_n}. Let δ_xbe the point mass at x, that is the random variable with P(δ_x=x)=1. Then

How about almost sure convergence? It does in fact hold here as well.

Theorem

convergence in quadratic mean implies convergence in probability.
convergence in probability implies convergence in distribution. The reverse is true if the limit is a constant.
almost sure convergence implies convergence in probability, but not vice versa

Theorem (Weak Law of Large Numbers)

Let X₁, X₂, … be a sequence of independent and identically distributed (iid) r.v.’s having mean μ. Let Z_n= (X₁+..+X_n)/n. Then Z_n→μ in probability.

Proof can be found in any probability text book

Theorem (Central Limit Theorem)

Let X₁, X₂, .. be an iid sequence of r.v.’s with mean μ and standard deviation σ. Define the sample mean by

=(X₁+..+X_n)/n

Then

√n(-μ)/σ→Z in distribution

Example

let’s study the CLT on a very simple example: say X₁,X₂,.. are iid Ber(p). Now

The CLT is not so much a theorem as a family of theorems. Any of the conditions (the same means, the same standard deviations, independence) can be relaxed considerably, and still the result holds. Unfortunately there is no single set of necessary conditions, so there are many theorems for different situations!