Let \(X_1 ,X_2 ,.\). be iid rv’s with mean 0 and std 1. Let \(S_n = X_1 +..+X_n\) . As we saw before, by the strong law of large numbers we have \(S_n /n \rightarrow 0\) almost surely, and in the last section we saw that by the central limit theorem we have \(S_n /\sqrt{n}\) converges in distribution to a standard normal random variable. So in some sense \(S_n /n\) “squeezes” down to one point whereas \(S_n /\sqrt{n}\) “spreads out”, roughly between -3 and 3. It is a reasonable question then whether there is an in-between case, namely a sequence \(\{a_n \}\) such that
\[\sqrt{n} < a_n < n\]
and \(S_n /a_n\) converges to something between a constant and a distribution. The answer is given by
Law of the Iterated Logarithm, Kolmogorov 1929
\[\limsup_{n\rightarrow \infty} \frac{S_n}{\sqrt{n\log\log n}}=\sqrt 2 \text{ almost surely}\] \[\liminf_{n\rightarrow \infty} \frac{S_n}{\sqrt{n\log\log n}}=-\sqrt 2 \text{ almost surely}\]
so this sequence oscillates between \(\pm \sqrt{2}\).
proof (outline)
If the \(X_i\)’s are Bernoulli rvs, we have a random walk and one can show that
\[P(S_n =0) = 1/\sqrt{\pi n}\]
A similar argument (again starting with Sterling’s formula) can be used to show that if \(n+k\) is even
\[P(S_n=k)\approx \sqrt{\frac{2}{\pi n}}\exp \left\{ -\frac{k^2}{2n}\right\}\]
next with some arithmetic one can show that there exists a constant \(c>0\) such that
\[P(S_n\ge k)\approx c\frac{\sqrt{n}}{k}\exp \left\{ -\frac{k^2}{2n}\right\}\]
and finally one applies the Borel-Cantelli lemma to show that for any \(\epsilon >0\)
\[S_n\le \sqrt{2n\log\log n}(1+\epsilon)\text{ a.s.}\]
let \(Y_1 , Y_2 , ..\) be iid Ber(1/2), then \(E[Y_i]=1/2\) and \(var(Y_i)=1/4\). Let
\[X_i = (Y_i -1/2)/(1/2) = 2Y_i-1\]
then \(E[X_i]=0\) and \(var(X_i)=1\). Let \(S_n = X_1 +..+X_n\).
The following graph has 25 simulated sequences with n=100000 and the four different “normalizations”
set.seed(1111)
K <- 25
n=1e5
df <- data.frame(n=rep(1:n, K),
y=0*K*n,
z=rep("1", K*n))
for(i in 1:K) {
df$y[((i-1)*n+1):(i*n)] <-
cumsum(2*sample(0:1, size=n, replace = TRUE)-1)
df$z[((i-1)*n+1):(i*n)] <- rep(paste(i), n)
}
df$z <- factor(df$z)
ggplot(data=df, aes(n, y, color=z)) +
geom_line(size=0.5) +
theme(legend.position = "none") +
stat_function(fun = function(x) 2*sqrt(x),
size=1.2, color="blue") +
stat_function(fun = function(x) -2*sqrt(x),
size=1.2, color="blue")
\(S_n\) goes to \(\pm\infty\).
ggplot(data=df, aes(n, y/n, color=z)) +
geom_line() + theme(legend.position = "none")
\(S_n/n\rightarrow 0\), as predicted by the law of large numbers.
ggplot(data=df, aes(n, y/sqrt(n), color=z)) +
geom_line(size=0.5) +
theme(legend.position = "none")
Here if we were to collect the values for \(n=100000\) (the last values of each sequence), these would follow a standard normal distribution, as predicted by the central limit theorem:
x=df$y[df$n==n]/sqrt(n)
boxplot(x)
ggplot(data=df, aes(n, y/sqrt(log(log(n))*n), color=z)) +
geom_line(size=0.5) +
theme(legend.position = "none") +
stat_function(fun = function(x) 2,
size=1.2, color="blue") +
stat_function(fun = function(x) -2,
size=1.2, color="blue")
Here \(S_n /\sqrt{n\log\log n}\) takes any value between \(\pm \sqrt 2\), as predicted by the law of the iterated logarithm.