Limit Theorems

Convergence Concepts

Say we have a sequence of numbers {a_n}. Then there is just one definition of a "limit", namely

a_n→a iff

for every ε>0 there exists an n_ε such that |a_n-a|<ε for all n>n_ε

Example say a_n=(1+1/n)ⁿ. Show that a_n→e

Fix n, and let t be such that 1≤t≤1+1/n. Then

so fix an ε>0. Then if n>e/ε-1 we have |(1+1/n)ⁿ-e|<ε, and therefore

(1+1/n)ⁿ→e

Things already get a little more complicated if we go to sequences of functions. Here there are two ways in which they can converge:

• Pointwise Convergence: f_n(x)→f(x) pointwise for all xS iff for every xS and every ε>0 there exists an n_ε,x such that |f_n(x)-f(x)|<ε for all n>n_ε,x

• Uniform Convergence: f_n(x)→f(x) uniformly for all xS iff for every xS and every ε>0 there exists an n_ε (independent of x!) such that |f_n(x)-f(x)|<ε for all n>n_ε

and there is a simple hierarchy: uniform convergence implies pointwise convergence but not vice versa

Example say f_n(x)=1+x/n, x , S=[A,B] where A<B, f(x)=1, then f_n(x)→f(x) uniformly.

|f_n(x)-f(x)| = |1+x/n-1| = |x/n| ≤ max(|A|,|B|)/n < ε if n≥max(|A|,|B|)/ε

Example say f_n(x)=xⁿ , S=[0,1] f(x)=I_{1}(x), then f_n(x)→f(x) pointwise but not uniformly.

say x<1 then |f_n(x)-f(x)| = xⁿ < ε for all n > n_ε,x = log(ε)/log(x)
say x=1 then |f_n(x)-f(x)| = 0 < ε for all n > n_ε,x = 1

but

Now when we go to probabilities it gets a bit more complicated. Say we have a sequence of rv's X_n with means μ_n and cdf's F_n, and a rv X with mean μ and cdf F. Then we have:

Definition

• Convergence in Mean X_n→X in mean iff μ_n→μ

• Convergence in Quadratic Mean X_n→ μ in quadratic mean iff E[X_n]→μ and Var(X_n)→0

• Convergence in Distribution (Law) X_n→X in law iff F_n(x)→F(x) pointwise for all x where F is continuous

• Convergence in Probability X_n→X in probability iff for every ε>0 lim_n→∞P(|X_n-X|≥ε)=0

• Almost Sure Convergence X_n→X almost surely iff for every ε>0 P(lim_n→∞|X_n-X|<ε)=1

Example Let X_n have density f_n(x)=nx^n-1, 0<x<1 (X~Beta(n,1)) and let X be such that P(X=1)=1. Then

Unfortunately there is no simple hierarchy between the different modes of convergence. Here are some relationships:

Theorem

a) convergence in quadratic mean implies convergence in probability.

b) convergence in probability implies convergence in distribution. The reverse is true if the limit is a constant.

c) almost sure convergence implies convergence in probability, but not vice versa

Laws of Large Numbers

Theorem (Weak Law of Large Numbers WLLN)

Let X₁, X₂, ... be a sequence of independent and identically distributed (iid) r.v.'s having mean μ. ThenX̅ converges to μ in probability

proof (assuming in addition that V(X_i)=σ² < ∞)

so X̅ →μ in quadratic mean and therefore in probability.

It is best to think of this (and other) limits theorems not as one theorem but as a family of theorems, all with the same conclusion but with different conditions. For example there are weak laws even if the X_n's are not independent, don't have the same mean and don't have even have finite standard deviations.

Theorem (Strong Law of Large Numbers SLLN)

Let X₁, X₂, ... be a sequence of independent and identically distributed (iid) r.v.'s having mean μ. Then X̅ converges to μ almost surely

Central Limit Theorems

Recall: a random variable X is said to be normally distributed with mean μ and variance σ² if it has density:

If μ=0 and σ=1 we say X has a standard normal distribution.

We use the symbol Φ for the distribution function of a standard normal r.v., so

Let X₁, X₂, .. be a sequence of r.v.'s with means E[X_i]=μ_i and sd(X_i)=σ_i. Let X̅_n be the sample mean of the first n observations. Then a central limit theorem would assert that

for all x, or that this standardized sum converges to a standard normal in distribution.

Note that plural "s" in the title. As with the laws of large number there are many central limit theorems, all with different conditions on

a) dependence between the X_i's

b) μ_i's

c) σ_i's

as a rough guide we have to have some combination of

a) not to strong a dependence

b) μ_i→μ finite

c) σ_i goes neither to 0 nor to ∞ to fast

Example: Say X_i~Exp(1) independent, then we know that

EX_i=1 and VarX_i=1

also

S_n = X₁+..X_n ~ Γ(n,1)

In the following graph we have these probabilities for n from 1 to 1000 for the case x=0.5, together with the clt approximation Φ(0.5)=0.69: