The sequence of r.v. X_{1}, X_{2}, .. is said to be a Markov chain if for any event A we have P(X_{n} A | X_{1} = x_{1}, .., X_{n-1}=x_{n-1}) = P(X_{n} A | X_{n-1} = x_{n-1}), that is X_{n} depends only on X_{n-1} but not on any of the r.v. before it.

** Example :** (Random Walk) Say we flip a coin repeatedly. Let the random variable Y_{i} be 1 if the i^{th} flip is heads, -1 otherwise. Now let X_{n} = ∑^{n}_{i=1} Y_{i}

Clearly we have

If we think of the index n as a time variable, then all that matters for the state of the system at time n is where it was at time n-1, but not on how it got to that state.

The random walk above is a discrete-time stochastic process because the time variable n takes discrete values (1,2, etc.) There are also continuous time stochastic processes {X(t),t≥0} . Our random walk example is also a discrete state space stochastic process because the r.v. X_{n} can only take countably many values (0, ±1,±2 etc.). Other stochastic processes might have a continuous state space.

For a Markov chain (that is a stochastic process with a discrete state space) all the relevant (probability) information is contained in the probability to get from state i to state j in k steps. For k=1 this is contained in the transition matrix P = (p_{ij}), and in fact as we shall see that P is all we need.

** Example :** (Random Walk, cont)

Here we have p_{ij} = 1/2 if |i-j|=1, 0 otherwise

** Example :** (Asymmetric Random Walk) As above the state space are the integers but now we go from i to i+1 with probability p, to i-1 with probability q and stay at i with probability 1-p-q.

** Example :** (Ehrenfest chain)

Say we have two boxes, box 1 with k balls and box 2 with r-k balls. We pick one of the balls at random and move it to the other box. Let X_{n}be the number of balls in box 1 at time n. We have

X_{n} {0,1,..,r},

p_{k,k+1} = (r-k)/r, p_{k,k-1} = k/r and p_{i,j} = 0 otherwise.

Ehrenfest used this model to study the exchange of air molecules in two chambers connected by a small hole, so small that only one molecule at a time can pass through.

Say we have a Markov chain X_{n}, n=1,2,.. with transition matrix P. Define the n-step transition matrix P^{(n)} = (p^{(n)}_{ij}) by p^{(n)}_{ij} = P(X_{n}=j|X_{1}=i). Of course P^{(1)} = P. Now

** Example :** (Ehrenfest chain)

Let's find the 2-step transition matrix for the Ehrenfest chain with r=3. The transition matrix is given by

and so the 2-step transition matrix is

For example , P(X_{3}=2|X_{1}=2) = 7/9 because if X_{1} = 2 the chain could have gone to 1 (with probability 2/3) and then back to 2 (with probability 2/3) or it could have gone to 3 (with probability 1/3) and then back to 2 (with probability 1), so we have P(X_{3}=2|X_{1}=2) = 2/3*2/3 + 1/3*1 = 7/9.

In order to find P^{(n)} we could just find PPP..P n-times. With a little linear algebra this becomes easier: For many matrices P there exists a matrix U and a diagonal matrix D such that P=UDU^{-1}. Here is how to find U and D:

First we need to find the eigenvalues of the matrix P, that is we need to find the solutions of the equations Px=λx. This is equivalent to (P-λI)x=0 or to det(P-λI)=0. So we have:

The D above now is the matrix with the eigenvalues on the diagonal. The columns of the matrix U are the corresponding eigenvectors (with Euclidean length 1), so For example

Of course we have det(P-λI)=0, so this system is does not have a unique solution. Setting x_{1}=1 we can then easily find a solution x=(1,-1,1,-1).

This vector has Euclidean length √(1^{2}+(-1)^{2}+1^{2}+(-1)^{2}) = 2, so the normalized eigenvector is x=(1/2,-1/2,1/2,-1/2)

Similarly we can find eigenvectors for the other eigenvalues. Alternatively (and a lot easier!) we can use the R function **eigen** to do the calculations for us.

Why does this help in computing P^{(n)}? It turns out that we have P^{(2)} = PP = UDU^{-1}UDU = UDDU^{-1} = UD^{2}U^{-1}, and

and in general we have P^{(n)} = UD^{n}U^{-1}

The routine **ehrenfest** with which=1 computes P^{(n)} for the Ehrenfest chain

An important consequence of the Markov property is the fact that given the present the past and the future are independent. This is formalized in the Chapman-Kolmogorov equation:

There are a number of properties of Markov chains. Here are some:

1) A Markov chain is said to be *irreducible* if for each pair of states i and j there is a positive probability that starting in state i the chain will eventually move to state j.

Both the two random walks and the Ehrenfest chain are irreducible.

** Example :** (Casino) Consider the following chain: you go to the Casino with $10. You play Blackjack, always betting $5. Let X_{n} be your "wealth" after the n^{th} game. Then X_{n} {0,5,10,15,..} and P(X_{n+k} = j|X_{k}=0) = 0 for all n > 1. ("0" is called a coffin or absorbing state). So this chain is not irreducible.

2) A Markov chain is said to be *aperiodic* if for some n ≥0 and some state j we have

P(X_{n} = j|X_{1}=j) > 0 and

P(X_{n+1} = j|X_{1}=j) > 0

In other words there should be a chance to return to state j in either n steps or in n+1 steps.

Random walk I and the Ehrenfest chain are not aperiodic because it is only possible to return to the same state in an even number of steps, but not an odd number. Random Walk II is aperiodic.

3) A state x of a Markov chain is said to be *recurrent* if P(the chain returns to x infinitely often) = 1. A Markov chain is called recurrent if all its states are recurrent. A state that is not recurrent is called *transient*.

A recurrent state i is said to be *positive recurrent* if starting at i the expected time until the return to i is finite, otherwise it is called *null recurrent*.

It can be shown that in a finite-state chain all recurrent states are positive recurrent.

** Example :** The Ehrenfest chain is clearly recurrent. In the Casino Example "0" is a recurrent state, the others are not. Are the random walks recurrent? Good question! It seems clear that the asymmetric r.v. is not (if p≠0.5), because eventually one expects the walk to run off to infinity (or - infinity). How about Random Walk I? Actually let's consider a more general problem

** Example :** (Random Walk III) let the state space be the lattice of integers on R^{d}, that is X_{n} =(i_{1}, .., i_{d}) for i_{j} any integer. Then the chain goes from one point on the lattice to any of the 2d points that differ by one in one coordinate with probability 1/2d.

One of the great results in probability states:

The simple random walk is recurrent if d≤2, transient otherwise, or as Kakutani once said "A drunk man will find his way home but a drunk bird may get lost".

4) Positive recurrent aperiodic states are called *ergodic*.

Let S be the state space of a Markov Chain X with transition matrix P. Let π be a "measure" on S. Then π is called a stationary measure of X if π^{T}P=π^{T}.

We won't discuss exactly what it means for π to be a "measure". You can think of it in the same way as a probability distribution, only that we don't have ∑π_{i}=1.

Note: π^{T}P=π^{T} iff (P^{T}π)^{T}=π^{T} iff P^{T}π=π iff (P^{T}-I)π=0, so again the system of equations is singular.

** Example :** (Ehrenfest chain) To find a (?) stationary measure we have to solve the system of equations

π_{j} = ∑_{i} π_{i}P_{ij} i=0,1..,r

often we can get unique solution by requiring that π be a proper probability distribution, that is that ∑π_{i} = 1

Here this means the system

π_{0} = 1/3π_{1} |

π_{1} = π_{0}+2/3π_{2} |

π_{2} = 2/3π_{1} + π_{3} |

π_{3} = 1/3π_{2} |

π_{0} + π_{1} + π_{2} + π_{3} = 1 |

which has the solution π = (1/8,3/8,3/8,1/8)

In the general case we find the stationary measure

The interpretation is the following: Say we choose the initial state X_{0} according to π, that is P(X_{0}=i) = π_{i}. Then π_{i} is the long-run proportion of time the chain spends at state i, that is π_{i} = lim ∑^{N}_{k=1} I[X_{n}=i]/N.

** Example :** For the Ehrenfest chain this is illustrated in **ehrenfest** with which=2.

** Example :** (Random Walk) Let S be the integers and define a Markov chain by p_{i,i+1} = p and p_{i,i-1} = q = 1-p. A stationary measure is given by π_{i}=1 for all i because (πP)_{i} = 1p+1q = 1.

Now assume p≠q and define π_{i} =(p/q)^{i}. Then

Note that this shows that stationary measure are not unique.

One use of the stationary distribution is an extension of the WLLN to Markov chains. That is, say h is a function on the state space, then

where Z is a r.v. with pmf π.

This is illustrated in **ehrenfest** with which=3 for h(x)=x, h(x)=x^{2} and h(x)=log(x+1)

One of the main results for Markov chains is the following:

If the Markov chain {X_{n}} is irreducible and ergodic, thenn

** Example:** Of course this result does not apply to the Ehrenfest chain, which is not aperiodic, but the result holds anyway as we have seen.

Here is another property of Markov chains: A Markov chain is said to be time-reversible if π_{i}P_{ij}=π_{j}P_{ji} for all i≠j. It can be shown that for a time reversible Markov chain if the chain is started from π and run backwards in time it again has transition matrix P.

** Example :** The Ehrenfest chain is time-reversible. We will show this for the case i=k, j=k+1:

**Example:** consider the following Markov chain: if at i it next moves to i+1 with probability p or to 1 with probability 1-p. Let's find its stationay distribution:

so the stationary distribution is a geometric!

Let's do a simulation and check, done in **mark1()**