iterated.logarithm.knit

Law of the Iterated Logarithm

Let \(X_1 ,X_2 ,.\). be iid rv’s with mean 0 and std 1. Let \(S_n = X_1 +..+X_n\) . As we saw before, by the strong law of large numbers we have \(S_n /n \rightarrow 0\) almost surely, and in the last section we saw that by the central limit theorem we have \(S_n /\sqrt{n}\) converges in distribution to a standard normal random variable. So in some sense \(S_n /n\) “squeezes” down to one point whereas \(S_n /\sqrt{n}\) “spreads out”, roughly between -3 and 3. It is a reasonable question then whether there is an in-between case, namely a sequence \(\{a_n \}\) such that

\[\sqrt{n} < a_n < n\]

and \(S_n /a_n\) converges to something between a constant and a distribution. The answer is given by

Theorem (3.5.1)

Law of the Iterated Logarithm, Kolmogorov 1929

\[\limsup_{n\rightarrow \infty} \frac{S_n}{\sqrt{n\log\log n}}=\sqrt 2 \text{ almost surely}\] \[\liminf_{n\rightarrow \infty} \frac{S_n}{\sqrt{n\log\log n}}=-\sqrt 2 \text{ almost surely}\]

so this sequence oscillates between \(\pm \sqrt{2}\).

proof (outline)

If the \(X_i\)’s are Bernoulli rvs, we have a random walk and one can show that

\[P(S_n =0) = 1/\sqrt{\pi n}\]

A similar argument (again starting with Sterling’s formula) can be used to show that if \(n+k\) is even

\[P(S_n=k)\approx \sqrt{\frac{2}{\pi n}}\exp \left\{ -\frac{k^2}{2n}\right\}\]

next with some arithmetic one can show that there exists a constant \(c>0\) such that

\[P(S_n\ge k)\approx c\frac{\sqrt{n}}{k}\exp \left\{ -\frac{k^2}{2n}\right\}\]

and finally one applies the Borel-Cantelli lemma to show that for any \(\epsilon >0\)

\[S_n\le \sqrt{2n\log\log n}(1+\epsilon)\text{ a.s.}\]

Example (3.5.2)

let \(Y_1 , Y_2 , ..\) be iid Ber(1/2), then \(E[Y_i]=1/2\) and \(var(Y_i)=1/4\). Let

\[X_i = (Y_i -1/2)/(1/2) = 2Y_i-1\]

then \(E[X_i]=0\) and \(var(X_i)=1\). Let \(S_n = X_1 +..+X_n\).

The following graph has 25 simulated sequences with n=100000 and the four different “normalizations”

set.seed(1111)
K <- 25
n=1e5
df <- data.frame(n=rep(1:n, K),
                 y=0*K*n,
                 z=rep("1", K*n))
for(i in 1:K) {
  df$y[((i-1)*n+1):(i*n)] <- 
    cumsum(2*sample(0:1, size=n, replace = TRUE)-1)
  df$z[((i-1)*n+1):(i*n)] <- rep(paste(i), n)
}
df$z <- factor(df$z)

\(S_n\)

ggplot(data=df, aes(n, y, color=z)) +
  geom_line(size=0.5) + 
  theme(legend.position = "none") +
    stat_function(fun = function(x) 2*sqrt(x),
                  size=1.2, color="blue") +
    stat_function(fun = function(x) -2*sqrt(x),
                  size=1.2, color="blue")

\(S_n\) goes to \(\pm\infty\).

\(S_n/n\)

ggplot(data=df, aes(n, y/n, color=z)) +
  geom_line() + theme(legend.position = "none")

\(S_n/n\rightarrow 0\), as predicted by the law of large numbers.

\(S_n /\sqrt{n}\)

ggplot(data=df, aes(n, y/sqrt(n), color=z))  +
  geom_line(size=0.5) + 
  theme(legend.position = "none")

Here if we were to collect the values for \(n=100000\) (the last values of each sequence), these would follow a standard normal distribution, as predicted by the central limit theorem:

x=df$y[df$n==n]/sqrt(n)
boxplot(x)

\(S_n /\sqrt{n\log\log n}\)

ggplot(data=df, aes(n, y/sqrt(log(log(n))*n), color=z)) +
  geom_line(size=0.5) + 
  theme(legend.position = "none") +
    stat_function(fun = function(x) 2,
                  size=1.2, color="blue") +
    stat_function(fun = function(x) -2,
                  size=1.2, color="blue")

Here \(S_n /\sqrt{n\log\log n}\) takes any value between \(\pm \sqrt 2\), as predicted by the law of the iterated logarithm.