The sequence of r.v. X1, X2, .. is said to be a Markov chain if for any event A we have
P(Xn \(\in\) A | X1 = x1, .., Xn-1=xn-1) = P(Xn \(\in\) A | Xn-1 = xn-1)
that is Xn depends only on Xn-1 but not on any of the r.v. before it.
Say we flip a coin repeatedly. Let the random variable Yi be 1 if the ith flip is heads, -1 otherwise.
Now let Xn = ∑ni=1 Yi
Clearly we have
If we think of the index n as a time variable, then all that matters for the state of the system at time n is where it was at time n-1, but not on how it got to that state.
For a Markov chain (that is a stochastic process with a discrete state space) all the relevant (probability) information is contained in the probability to get from state i to state j in k steps. For k=1 this is contained in the transition matrix P = (pij), and in fact as we shall see that P is all we need.
Here we have pij = 1/2 if |i-j|=1, 0 otherwise
As above the state space are the integers but now we go from i to i+1 with probability p, to i-1 with probability q and stay at i with probability 1-p-q.
Say we have two boxes, box 1 with k balls and box 2 with r-k balls. We pick one of the balls at random and move it to the other box. Let Xnbe the number of balls in box 1 at time n. We have
Xn \(\in\) {0,1,..,r},
r=7
Now
pk,k-1 = k/r (pick ball from Box 1, which has k or the r)
pk,k+1 = (r-k)/r (pick ball from Box 2, which has r-k or the r)
and
pi,j = 0 otherwise.
Note:
p0,1 = (r-0)/r = 1
pr,r-1 = r/r = 1
Ehrenfest used this model to study the exchange of air molecules in two chambers connected by a small hole.
Say we have a Markov chain Xn, n=1,2,.. with transition matrix P. Define the n-step transition matrix P(n) = (p(n)ij) by p(n)ij = P(Xn=j|X1=i). Of course P(1) = P. Now
Let’s find the 2-step transition matrix for the Ehrenfest chain with r=3. The transition matrix is given by
and so the 2-step transition matrix is
For example, P(X3=2|X1=2) = 7/9 because if X1 = 2 the chain could have gone to 1 (with probability 2/3) and then back to 2 (with probability 2/3) or it could have gone to 3 (with probability 1/3) and then back to 2 (with probability 1), so we have
P(X3=2|X1=2) = 2/3*2/3 + 1/3*1 = 7/9.
Proposition
let P=(pij) be a transition matrix. Then ∑jpij=1 for all i.
proof
In order to find P(n) we could just find PPP..P n-times. With a little linear algebra this becomes easier:
For many matrices P there exists a matrix U and a diagonal matrix D such that P=UDU-1. Here is how to find U and D:
First we need to find the eigenvalues of the matrix P, that is we need to find the solutions of the equations Px=λx. This is equivalent to (P-λI)x=0 or to det(P-λI)=0. So we have:
The D above now is the matrix with the eigenvalues on the diagonal. The columns of the matrix U are the corresponding eigenvectors (with Euclidean length 1), so for example
Of course we have det(P-λI)=0, so this system is does not have a unique solution. Setting x1=1 we can then easily find a solution x=(1,-1,1,-1).
This vector has Euclidean length √(12+(-1)2+12+(-1)2) = 2, so the normalized eigenvector is x=(1/2,-1/2,1/2,-1/2)
Similarly we can find eigenvectors for the other eigenvalues. Alternatively (and a lot easier!) we can use the R function eigen to do the calculations for us.
Why does this help in computing P(n)? It turns out that we have P(2) = PP = UDU-1UDU = UDDU-1 = UD2U-1, and
and in general we have P(n) = UDnU-1
Notice that this result allows us to define functions whose arguments are matrices: Say A is some \(d\times d\) matrix with \(A=UDU^{-1}\). Let p be a polynomial of the form
\[p(x)=\sum_{i=1}^n a_ix^i\]
The we define \(p(A)\) to be
\[ \begin{aligned} &p(A) = \\ &\sum_{i=1}^n a_iA^i = \\ &\sum_{i=1}^n a_i(UDU^{-1})^i = \\ &U\left(\sum_{i=1}^n a_iD^i\right)U^{-1} \\ \end{aligned} \]
and finally for any function f that has a Taylor series expansion \(f(x)=\sum_{i=1}^\infty a_ix^i\) we can define
\[f(A)=U\left(\sum_{i=1}^\infty a_iD^i\right)U^{-1}\]
Say
\[ A = \begin{bmatrix} 1 & 2\\ 2 & 1 \\ \end{bmatrix} \]
what would be \(f(A)=\exp(A)\)?
\[ \begin{aligned} &\det(A-\lambda I) = (1-\lambda)^2-4=0\\ &\lambda =1\pm2 \\ \end{aligned} \] so
\[ D= \begin{bmatrix} -1 & 0\\ 0 & 3\\ \end{bmatrix} \]
\[ (A-\lambda_1)x = \begin{bmatrix} 2 & 2\\ 2 & 2 \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \\ \end{bmatrix}= \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix} \]
so \(2x_1+2x_2=0\), or \(x_1=-x_2\). Say \(x_1=1\), then \(\sqrt{1^2+(-1)^2}=\sqrt 2\).
\[ (A-\lambda_2)x = \begin{bmatrix} -2 & 2\\ 2 & -2 \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \\ \end{bmatrix}= \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix} \]
so \(-x_1+x_2=0\), or \(x_1=x_2\). Say \(x_1=1\), then \(\sqrt{1^2+1^2}=\sqrt 2\) and we have
\[ U = \frac1{\sqrt 2} \begin{bmatrix} 1 & -1\\ 1 & 1 \\ \end{bmatrix} \] Also
\[ U^{-1} = \frac1{\sqrt 2} \begin{bmatrix} 1 & 1\\ -1 & 1 \\ \end{bmatrix} \]
Now we have \(f(x)=\exp(x)=\sum_{n=0}^\infty x^n/n!\), and so
\[ \begin{aligned} &\log(A) = U\left(\sum_{n=0}^\infty D^n/n! \right)U^{-1}\\ \end{aligned} \] \[ D^n/n!=\\ \begin{bmatrix} -1 & 0\\ 0 & 3\\ \end{bmatrix}^n/n!=\\ \begin{bmatrix} (-1)^{n}/n! & 0\\ 0 & 3^{n}/n! \\ \end{bmatrix} \]
\[ \sum_{n=0}^\infty D^n/n!= \begin{bmatrix} \sum_{n=0}^\infty (-1)^n/n! & 0\\ 0 & \sum_{n=0}^\infty 3^n/n! \\ \end{bmatrix}=\\ \begin{bmatrix} e^{-1} & 0\\ 0 & e^3 \\ \end{bmatrix} \]
and finally
\[ \exp(A)= U \begin{bmatrix} e^{-1} & 0\\ 0 & e^3 \\ \end{bmatrix} U^{-1}= \]
\[ \frac1{\sqrt 2} \begin{bmatrix} 1 & -1\\ 1 & 1 \\ \end{bmatrix} \begin{bmatrix} e^{-1} & 0\\ 0 & e^3 \\ \end{bmatrix} \frac1{\sqrt 2} \begin{bmatrix} 1 & 1\\ -1 & 1 \\ \end{bmatrix} = \] \[ \frac1{2} \begin{bmatrix} 1 & -1\\ 1 & 1 \\ \end{bmatrix} \begin{bmatrix} e^{-1} & e^{-1}\\ -e^3 & e^3 \\ \end{bmatrix} \]
\[ \frac1{2} \begin{bmatrix} e^{-1}+e^3 & e^{-1}-e^3\\ e^{-1}-e^3 & e^{-1}+e^3 \\ \end{bmatrix} \]
Proposition:
λ=1 is always an eigenvalue of a transition matrix, with eigenvector (1/√d 1/√d .. 1/√d)T where d is the dimension of P
Theorem
An important consequence of the Markov property is the fact that given the present the past and the future are independent. This is formalized in the Chapman-Kolmogorov equation:
proof
follows directly from the law of total probability
Definition
State j is said to be accessible from state i if there exists n≥0 such that pij(n)>0 and we write i→j. If i is accessible from j and j is accessible from i then i and j are said to communicate, and we write i↔j
Proposition
Communication is an equivalence relation. That is
i↔i
if i↔j then i ↔i
if i↔j and j ↔k then i↔k
proof
pij(n)>0 and pjk(m)>0
But then
pik(n+m) = ∑l pil(n)plk(m) ≥ pij(n)pjk(m) > 0
Definition
Two states that communicate are said to be in the same class. A Markov chain is said to be irreducible if it consists of only one class
A state i is said to have period d if pii(n)=0 whenever n is not divisible by d and d is the largest integer with this property. A state with period 1 is said to be aperiodic. Let d(i) denote the period of state i.
Both the two random walks and the Ehrenfest chain are irreducible. Random walk I and the Ehrenfest chain have period 2. Random Walk II is aperiodic.
Consider the following chain: you go to the Casino with $10. You play Blackjack, always betting $5. Let Xn be your “wealth” after the nth game. Then Xn \(\in\) {0,5,10,15,..} and P(Xn+k = j|Xk=0) = 0 for all n > 1. (“0” is called a coffin or absorbing state). So this chain is not irreducible.
Proposition
if i↔j then d(i)=d(j)
proof
Let n and m be such that pji(n)pij(m) > 0 and suppose that pii(s)> 0, so d(i) divides s. Then
pjj(n+m) ≥ pji(n)pij(m) > 0
pjj(n+s+m) ≥ pji(n)pii(s)pij(m) > 0
therefore d(j) divides both n+m and n+m+s, and so it also divides n+m+s-(n+m) = s. Therefore d(j) divides s as well. The argument is symmetric in i and j, so d(i)=d(j)
Definition
fij0 = 0
fijn = ∑n P(Xn =j,Xk≠j,k=1,..,n-1|X0=i)
fij = ∑n fijn
be the probability that the chain starting at i ever visits j
Definition
A state i of a Markov chain is said to be recurrent if fii =1. A state that is not recurrent is called transient.
A recurrent state i is said to be positive recurrent if starting at i the expected time until the return to i is finite, otherwise it is called null recurrent.
Positive recurrent aperiodic states are called ergodic.
The Ehrenfest chain is clearly recurrent. In the Casino example “0” is a recurrent state, the others are not.
Are the random walks recurrent? Good question! we will answer it using the following theorem:
Theorem
A state i is recurrent if and only if
proof
Because of the Markov property once the chain returns to i, everything starts all over again and so the probability that the chain returns to i a second time is (fii)2 and a third time is (fii)3 and so on.
Let M be the total number of returns to state i, then we have just shown that M has a geometric distribution with
Note that we have
state i is transient iff fii<1 iff E[M|X0=i]<∞
We can write M as
and so we are done.
The next corollary shows that, like periodicity, recurrence is a class property:
if state i is recurrent and i↔j, then j is recurrent
proof
say n and m are such that
pij(n)>0 and pji(m)>0
now for any s≥0
pjj(n+m+s) ≥ pji(m) pii(s) pij(n)
and so
∑s pjj(n+m+s) ≥ ∑s( pji(m) pii(s) pij(n)) = pji(m)pij(n)∑s pii(s) = ∞
if i↔j and state j is recurrent, then fij = 1
proof
Suppose X0=i, and let n be such that pij(n)>0. Now pij(n) maybe less than 1, so it is possible that Xn≠j. If that happens let’s say we missed opportunity 1. If so, let T1 be the next time the chain visits i. Note that
P(T1<∞)=1
by the corollary above. Say we missed opportunity 2 if XT1+n≠j, and let T2 be the time of the first visit to i after that, and so on.
Because at each visit to i the chain starts all over again it is clear that Tk has a geometric distribution with success probability pij(n). Therefore
E[Tk] =1/pij(n) < ∞
and because there are infinitely many Tk ’s sooner or later there will be a success, or fii=1
Let Nj(t) denote the number of transitions into j by time t. If j is recurrent and X0=j, then the process probabilistic ally starts all over , and so {Nj(t),t≥0} is a renewal process with interarrival time distribution {fjj n,n≥0}.
for simplicity we assume p+q=1, that is the chain always jumps. Then
So the state 0 is recurrent iff p=1/2. But we can get to 0 from any other state and back, so all states are recurrent.
Let’s consider a more general problem:
let the state space be the lattice of integers on Rd, that is Xn =(i1, .., id) for ij any integer. Then the chain goes from one point on the lattice to any of the 2d points that differ by one in one coordinate with probability 1/2d.
let 0=(0,0), then since the chain is irreducible, if 0 is recurrent every state is recurrent. Again returns to 0 are only possible in 2n steps, of which i have to be left and i right, n-i up and n-i down. Now the probabilities are multinomial, and we have
and so all states are recurrent.
Notice that
This is in effect true because we can think of the random walk in Rd as d independent random walks, one in each of the “dimensions”. Therefore
The first proof of this was given by George Polya in 1921, and described very nicely by Kakutani, who said
A drunk man will find his way home but a drunk bird may get lost.
Let’s return to 1 dimension and consider the following question: if P(X1=1)=p=1/2 we know that state 0 (like all others) is recurrent,so
P(chain returns to 0 | it starts at 0) =1
But what about if 0<p<1/2? First of all
P(chain returns to 0 | it starts at 0) ≥ P(X2=1,X1=-1|X0=0) = pq > 0
But what is P(chain returns to 0 | it starts at 0)? Well first of all
P(chain returns to 0) = P(chain returns to 0|X1=-1)(1-p) + P(chain returns to 0|X1=1)p
We know the chain is transient, so P(0 infinitely often)=0. Moreover, p<1/2, so Xn→-∞ with probability 1, therefore
P(chain returns to 0|X1=1)=1
Now
because α is the probability to “ever reach the state to the right”, and now we have to do that twice.
Finally
In general we have
P(chain returns to i | chain starts at i) = 2*min(p,1-p)
let X1, X2,.. be independent Pois(1). Let Sn= X1+..+Xn, so Sn~Pois(n). Therefore
and so we have a proof of Sterling’s formula!
Say we have a Markov chain with states {0,1,2} and transition matrix
Find P(X3=0)
A little linear algebra shows
Now
so to calculate P(X3=0) we need to specify the distribution of X0. Two common ways are to specify a specific state (say 0), or to choose one at random:
Soon we will see another common choice
Consider again the Ehrenfest chain with r=3, and compute P(n):
You notice that P(n) seems to converge to a limit. We will now study this limit.
Let S be the state space of a Markov chain {Xn,n≥0} with transition matrix P. Let π be a “measure” on S. Then π is called a stationary measure of {Xn,n≥0} if
πTP=πT
We won’t discuss exactly what it means for π to be a “measure”. You can think of it in the same way as a probability distribution, only that we don’t have ∑πi=1.
Note:
πTP=πT iff (PTπ)T=πT iff PTπ=π iff (PT-I)π=0
so again the system of equations is singular.
Say we have a Markov chain with states {0,1,2} and transition matrix
Find P(X3=0) if the chain starts in its stationary distribution.
but 0.3125 = 5/16, so we find
P(X3=0|X0 has distribution π) = π
And so we find if the chain starts in its stationary distribution it stays there! As it turns out this is true in general.
To find a (?) stationary measure we have to solve the system of equations
πj = ∑i πiPij i=0,1..,r
often we can get unique solution by requiring that π be a proper probability distribution, that is that ∑πi = 1
Here this means the system
π0 = 1/3π1
π1 = π0+2/3π2
π2 = 2/3π1 + π3
π3 = 1/3π2
π0 + π1 + π2 + π3 = 1
which has the solution π = (1/8,3/8,3/8,1/8)
In the general case we find the stationary measure
The interpretation is the following: Say we choose the initial state X0 according to π, that is
P(X0=i) = πi
Then πi is the long-run proportion of time the chain spends at state i, that is
πi = lim ∑Nk=1 I[Xn=i]/N.
Let S be the integers and define a Markov chain by pi,i+1 = p and pi,i-1 = q = 1-p. A stationary measure is given by πi=1 for all i because (πP)i = 1p+1q = 1.
Now assume p ≠ q and define πi =(p/q)i. Then
Note that this shows that stationary measure are not unique, even if we require ∑πi=1
One use of the stationary distribution is an extension of the WLLN to Markov chains. That is, say h is a function on the state space, then
where Z is a r.v. with density π.
One of the main results for Markov chains is the following:
If the Markov chain {Xn,n≥0} is irreducible and ergodic, then
Of course this result does not apply to the Ehrenfest chain, which is not aperiodic, but the result holds anyway as we have seen.
Here is another property of Markov chains:
Definition:
A Markov chain is said to be time-reversible if
πiPij=πjPji for all i≠j
Theorem
Let Markov chain {Yn,n≥0} be time reversible. Fix a time T large and run the chain backwards, that is let Xn=YT-n. The the transition matrix of the chain {Xn,n≥0} is the same as the transition matrix of {Yn,n≥0}
In other words, for a time reversible Markov chain if the chain is started from π and run backwards in time it again has transition matrix P.
The Ehrenfest chain is time-reversible. We will show this for the case i=k, j=k+1: