probability-inequalities

Inequalities and Limit Theorems

Two very useful inequalities

Theorem (3.5.1)

Markov’s Inequality

If X takes on only nonnegative values, then for any a>0

\[P(X \ge a) \le \frac{EX}a\] proof omitted

Theorem (3.5.2)

Chebyshev’s Inequality:

If X is a r.v. with mean \(\mu\) and variance \(\sigma^2\), then for any k>0:

\[P(|X-\mu|\ge k \sigma)\le 1/k^2\]

proof

\[ \begin{aligned} & P(|X-\mu|\ge k \sigma) = \\ & P((X-\mu)^2\ge k^2 \sigma^2) \le \\ & \frac{E(X-\mu)^2}{k^2\sigma^2} = \frac{\sigma^2}{k^2\sigma^2} =1/k^2\\ \end{aligned} \]

Example (3.5.3)

Consider the uniform random variable with f(x) = 1 if \(0<x<1\), 0 otherwise. We already know that \(\mu=0.5\) and \(\sigma=1/\sqrt{12} = 0.2887\). Now Chebyshev says

\(P(|X-0.5|>k0.2887) \le 1/k^2\)

For example

\(P(|X-0.5|>0.2887) \le 1\) (rather boring!)

\(P(|X-0.5|>3\times0.2887) \le 1/9\)

actually \(P(|X-0.5|>0.866) = 0\), so this is not a very good upper bound.

Law of Large Numbers, Convergence in Probability

Theorem (3.5.4)

(Weak) Law of Large Numbers

Let \(X_1, X_2, ...\) be a sequence of independent and identically distributed (iid) r.v.’s having mean \(\mu\). Then for all \(\epsilon>0\)

\[P(|\frac1n \sum X_i -\mu|>\epsilon)\rightarrow 0\]

proof (assuming in addition that \(V(X_i)=\sigma^2 < \infty\)

\[ \begin{aligned} &E[\frac1n \sum X_i] = \frac1n \sum E[X_i] = \mu\\ &V[\frac1n \sum X_i] = \frac1{n^2} \sum V[X_i] = \frac{\sigma^2}n\\ &P(|\frac1n \sum X_i -\mu|>\epsilon) = \\ &P(|\frac1n \sum X_i -\mu|>\frac{\epsilon}{\sigma/\sqrt{n}}\sigma/\sqrt{n}) \le\\ &1/(\frac{\epsilon}{\sigma/\sqrt{n}}) = \frac{\sigma}{\epsilon\sqrt{n}}\rightarrow 0 \end{aligned} \]

This theorem forms the bases of (almost) all simulation studies: say we want to find a parameter \(\theta\) of a population. We can generate data from a random variable X with pdf () \(f(x|\theta)\) such that \(Eh(X) = \theta\). Then by the law of large numbers

\[\frac1n \sum h(X_i) \rightarrow \theta\]

Example (3.5.5)

in a game a player rolls 5 fair dice. He then moves his game piece along k fields on a board, where k is the smallest number on the dice + largest number on the dice. For example if his dice show 2, 2, 3, 5, 5 he moves 2+5 = 7 fields. What is the mean number of fields \(\theta\) a player will move?

To do this analytically would be quite an exercise. To do it via simulation is easy:

Let X be an independent random vector of length 5, with \(X[j] \in 1,..,6\) and \(P(X[j]=k)=1/6\). Let \(h(x) = \min(x)+\max(x)\), then \(Eh(X) = \theta\).

Let \(X_1, X_2, ..\) be iid copies of X, then by the law of large numbers

B <- 1e5 
z <- rep(0, B)
for (i in 1:B) {
  x <- sample(1:6, size = 5, replace = TRUE)
  z[i] <- min(x)+max(x)
}
mean(z)

## [1] 6.98824

Central Limit Theorem

This is one of the most famous theorems in all of mathematics / statistics. Without it, Statistics as a science would not have existed until very recently:

We first need the definition of a normal (or Gaussian) r.v.:

A random variable X is said to be normally distributed with mean \(\mu\) and standard deviation \(\sigma\) if it has :

\[f(x) = \frac1{\sqrt{2\pi \sigma^2}} \exp\left\{-\frac1{2\sigma^2}\left(x-\mu\right)^2 \right\}\]

If \(\mu =0\) and \(\sigma =1\) we say X has a standard normal distribution.

We use the symbol \(\Phi\) for the distribution function of a standard normal r.v.

Theorem (3.5.6)

Let \(X_1, X_2, ..\) be an iid sequence of r.v.’s with mean \(\mu\) and standard deviation \(\sigma\). Let \(\bar{X}=\frac1n \sum X\). Then

\[P(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \le z) \rightarrow \Phi(z)\]

Example (3.5.7)

Let’s do a simulation to illustrate the CLT: we will use the most basic r.v. of all, called a Bernoulli r.v. which has \(P(X=0)=1-p\) and \(P(X=1)=p\). (Think indicator function for the coin toss}. So we sample n Bernoulli r.v. with “success parameter p” and find their sample mean. Note that

\[ \begin{aligned} &E(X) = p\\ &var(X) = p(1-p) \end{aligned} \]

cltexample1 <- function (p, n, B=1000) {
  xbar <- rep(0, n)
  for (i in 1:B) {
    xbar[i] <- mean(sample(c(0, 1), n, TRUE, prob=c(1-p,  p)))
    }
  df <- data.frame(x=sqrt(n)*(xbar-p)/sqrt(p*(1-p)))
  bw <- diff(range(df$x))/50 
  ggplot(df, aes(x)) +
    geom_histogram(aes(y = ..density..),
      color = "black", 
      fill = "white", 
      binwidth = bw) + 
      labs(x = "x", y = "") +
    stat_function(fun = dnorm, colour = "blue",
                  args=list(mean=0, sd=1))
  
}
cltexample1(0.5, 500)