Discrete - Time Markov Chains

Often in this course we started an example with "Let X1, .., Xn be an iid sequence ...". This independence assumption makes a lot of sense in many statistics problems, mainly when the data comes from a random sample. But what happens if the data is not independent? In that case the most important thing is to describe the dependence structure of that data, that is to say what the relationship of Xi and Xk is. There many such structures. In this section we will study one of the most common, called a Markov chain.

The sequence of r.v. X1, X2, .. is said to be a Markov chain if for any event A we have P(Xn A | X1 = x1, .., Xn-1=xn-1) = P(Xn A | Xn-1 = xn-1), that is Xn depends only on Xn-1 but not on any of the r.v. before it.

Example : (Random Walk) Say we flip a coin repeatedly. Let the random variable Yi be 1 if the ith flip is heads, -1 otherwise. Now let Xn = ∑ni=1 Yi
Clearly we have
mark11.png - 4416 Bytes

If we think of the index n as a time variable, then all that matters for the state of the system at time n is where it was at time n-1, but not on how it got to that state.

The random walk above is a discrete-time stochastic process because the time variable n takes discrete values (1,2, etc.) There are also continuous time stochastic processes {X(t),t≥0} . Our random walk example is also a discrete state space stochastic process because the r.v. Xn can only take countably many values (0, ±1,±2 etc.). Other stochastic processes might have a continuous state space.
For a Markov chain (that is a stochastic process with a discrete state space) all the relevant (probability) information is contained in the probability to get from state i to state j in k steps. For k=1 this is contained in the transition matrix P = (pij), and in fact as we shall see that P is all we need.

Example : (Random Walk, cont)
Here we have pij = 1/2 if |i-j|=1, 0 otherwise

Example : (Asymmetric Random Walk) As above the state space are the integers but now we go from i to i+1 with probability p, to i-1 with probability q and stay at i with probability 1-p-q.

Example : (Ehrenfest chain)
Say we have two boxes, box 1 with k balls and box 2 with r-k balls. We pick one of the balls at random and move it to the other box. Let Xnbe the number of balls in box 1 at time n. We have
Xn {0,1,..,r},
pk,k+1 = (r-k)/r, pk,k-1 = k/r and pi,j = 0 otherwise.

Ehrenfest used this model to study the exchange of air molecules in two chambers connected by a small hole, so small that only one molecule at a time can pass through.

Say we have a Markov chain Xn, n=1,2,.. with transition matrix P. Define the n-step transition matrix P(n) = (p(n)ij) by p(n)ij = P(Xn=j|X1=i). Of course P(1) = P. Now

Example : (Ehrenfest chain)
Let's find the 2-step transition matrix for the Ehrenfest chain with r=3. The transition matrix is given by

and so the 2-step transition matrix is

For example , P(X3=2|X1=2) = 7/9 because if X1 = 2 the chain could have gone to 1 (with probability 2/3) and then back to 2 (with probability 2/3) or it could have gone to 3 (with probability 1/3) and then back to 2 (with probability 1), so we have P(X3=2|X1=2) = 2/3*2/3 + 1/3*1 = 7/9.
In order to find P(n) we could just find PPP..P n-times. With a little linear algebra this becomes easier: For many matrices P there exists a matrix U and a diagonal matrix D such that P=UDU-1. Here is how to find U and D:
First we need to find the eigenvalues of the matrix P, that is we need to find the solutions of the equations Px=λx. This is equivalent to (P-λI)x=0 or to det(P-λI)=0. So we have:

The D above now is the matrix with the eigenvalues on the diagonal. The columns of the matrix U are the corresponding eigenvectors (with Euclidean length 1), so For example

Of course we have det(P-λI)=0, so this system is does not have a unique solution. Setting x1=1 we can then easily find a solution x=(1,-1,1,-1).

This vector has Euclidean length √(12+(-1)2+12+(-1)2) = 2, so the normalized eigenvector is x=(1/2,-1/2,1/2,-1/2)
Similarly we can find eigenvectors for the other eigenvalues. Alternatively (and a lot easier!) we can use the R function eigen to do the calculations for us.
Why does this help in computing P(n)? It turns out that we have P(2) = PP = UDU-1UDU = UDDU-1 = UD2U-1, and

and in general we have P(n) = UDnU-1
The routine ehrenfest with which=1 computes P(n) for the Ehrenfest chain

An important consequence of the Markov property is the fact that given the present the past and the future are independent. This is formalized in the Chapman-Kolmogorov equation:

There are a number of properties of Markov chains. Here are some:

1) A Markov chain is said to be irreducible if for each pair of states i and j there is a positive probability that starting in state i the chain will eventually move to state j.
Both the two random walks and the Ehrenfest chain are irreducible.

Example : (Casino) Consider the following chain: you go to the Casino with $10. You play Blackjack, always betting $5. Let Xn be your "wealth" after the nth game. Then Xn {0,5,10,15,..} and P(Xn+k = j|Xk=0) = 0 for all n > 1. ("0" is called a coffin or absorbing state). So this chain is not irreducible.

2) A Markov chain is said to be aperiodic if for some n ≥0 and some state j we have
P(Xn = j|X1=j) > 0 and
P(Xn+1 = j|X1=j) > 0
In other words there should be a chance to return to state j in either n steps or in n+1 steps.
Random walk I and the Ehrenfest chain are not aperiodic because it is only possible to return to the same state in an even number of steps, but not an odd number. Random Walk II is aperiodic.

3) A state x of a Markov chain is said to be recurrent if P(the chain returns to x infinitely often) = 1. A Markov chain is called recurrent if all its states are recurrent. A state that is not recurrent is called transient.
A recurrent state i is said to be positive recurrent if starting at i the expected time until the return to i is finite, otherwise it is called null recurrent.
It can be shown that in a finite-state chain all recurrent states are positive recurrent.

Example : The Ehrenfest chain is clearly recurrent. In the Casino Example "0" is a recurrent state, the others are not. Are the random walks recurrent? Good question! It seems clear that the asymmetric r.v. is not (if p≠0.5), because eventually one expects the walk to run off to infinity (or - infinity). How about Random Walk I? Actually let's consider a more general problem

Example : (Random Walk III) let the state space be the lattice of integers on Rd, that is Xn =(i1, .., id) for ij any integer. Then the chain goes from one point on the lattice to any of the 2d points that differ by one in one coordinate with probability 1/2d.
One of the great results in probability states:
The simple random walk is recurrent if d≤2, transient otherwise, or as Kakutani once said "A drunk man will find his way home but a drunk bird may get lost".

4) Positive recurrent aperiodic states are called ergodic.

Stationary Distribution

Consider again the Ehrenfest chain, and compute P(n) for n→∞. You notice that P(n) seems to converge to a limit. We will now study this limit.

Let S be the state space of a Markov Chain X with transition matrix P. Let π be a "measure" on S. Then π is called a stationary measure of X if πTP=πT.
We won't discuss exactly what it means for π to be a "measure". You can think of it in the same way as a probability distribution, only that we don't have ∑πi=1.
Note: πTP=πT iff (PTπ)TT iff PTπ=π iff (PT-I)π=0, so again the system of equations is singular.

Example : (Ehrenfest chain) To find a (?) stationary measure we have to solve the system of equations
πj = ∑i πiPij i=0,1..,r
often we can get unique solution by requiring that π be a proper probability distribution, that is that ∑πi = 1
Here this means the system
π0 = 1/3π1
π1 = π0+2/3π2
π2 = 2/3π1 + π3
π3 = 1/3π2
π0 + π1 + π2 + π3 = 1

which has the solution π = (1/8,3/8,3/8,1/8)
In the general case we find the stationary measure

The interpretation is the following: Say we choose the initial state X0 according to π, that is P(X0=i) = πi. Then πi is the long-run proportion of time the chain spends at state i, that is πi = lim ∑Nk=1 I[Xn=i]/N.

Example : For the Ehrenfest chain this is illustrated in ehrenfest with which=2.

Example : (Random Walk) Let S be the integers and define a Markov chain by pi,i+1 = p and pi,i-1 = q = 1-p. A stationary measure is given by πi=1 for all i because (πP)i = 1p+1q = 1.
Now assume p≠q and define πi =(p/q)i. Then

Note that this shows that stationary measure are not unique.

One use of the stationary distribution is an extension of the WLLN to Markov chains. That is, say h is a function on the state space, then

where Z is a r.v. with pmf π.
This is illustrated in ehrenfest with which=3 for h(x)=x, h(x)=x2 and h(x)=log(x+1)

One of the main results for Markov chains is the following:
If the Markov chain {Xn} is irreducible and ergodic, thenn

Example: Of course this result does not apply to the Ehrenfest chain, which is not aperiodic, but the result holds anyway as we have seen.

Here is another property of Markov chains: A Markov chain is said to be time-reversible if πiPijjPji for all i≠j. It can be shown that for a time reversible Markov chain if the chain is started from π and run backwards in time it again has transition matrix P.

Example : The Ehrenfest chain is time-reversible. We will show this for the case i=k, j=k+1:

Example: consider the following Markov chain: if at i it next moves to i+1 with probability p or to 1 with probability 1-p. Let's find its stationay distribution:

so the stationary distribution is a geometric!

Let's do a simulation and check, done in mark1()