Discrete - Time Markov Chains

Often in this course we started an example with "Let X1, .., Xn be an iid sequence ...". This independence assumption makes a lot of sense in many statistics problems, mainly when the data comes from a random sample. But what happens if the data is not independent? In that case the most important thing is to describe the dependence structure of that data, that is to say what the relationship of Xi and Xk is. There many such structures. In this section we will study one of the most common, called a Markov chain.

The sequence of r.v. X1, X2, .. is said to be a Markov chain if for any event A we have P(Xn A | X1 = x1, .., Xn-1=xn-1) = P(Xn A | Xn-1 = xn-1), that is Xn depends only on Xn-1 but not on any of the r.v. before it.

Example : (Random Walk) Say we flip a coin repeatedly. Let the random variable Yi be 1 if the ith flip is heads, -1 otherwise. Now let Xn = ∑ni=1 Yi
Clearly we have
mark11.png - 4416 Bytes

If we think of the index n as a time variable, then all that matters for the state of the system at time n is where it was at time n-1, but not on how it got to that state.

The random walk above is a discrete-time stochastic process because the time variable n takes discrete values (1,2, etc.) There are also continuous time stochastic processes {X(t),t≥0} . Our random walk example is also a discrete state space stochastic process because the r.v. Xn can only take countably many values (0, ±1,±2 etc.). Other stochastic processes might have a continuous state space.
For a Markov chain (that is a stochastic process with a discrete state space) all the relevant (probability) information is contained in the probability to get from state i to state j in k steps. For k=1 this is contained in the transition matrix P = (pij), and in fact as we shall see that P is all we need.

Example : (Random Walk, cont)
Here we have pij = 1/2 if |i-j|=1, 0 otherwise

Example : (Asymmetric Random Walk) As above the state space are the integers but now we go from i to i+1 with probability p, to i-1 with probability q and stay at i with probability 1-p-q.

Example : (Ehrenfest chain)
Say we have two boxes, box 1 with k balls and box 2 with r-k balls. We pick one of the balls at random and move it to the other box. Let Xnbe the number of balls in box 1 at time n. We have
Xn {0,1,..,r},
pk,k+1 = (r-k)/r, pk,k-1 = k/r and pi,j = 0 otherwise.

Ehrenfest used this model to study the exchange of air molecules in two chambers connected by a small hole, so small that only one molecule at a time can pass through.

Say we have a Markov chain Xn, n=1,2,.. with transition matrix P. Define the n-step transition matrix P(n) = (p(n)ij) by p(n)ij = P(Xn=j|X1=i). Of course P(1) = P. Now

Example : (Ehrenfest chain)
Let's find the 2-step transition matrix for the Ehrenfest chain with r=3. The transition matrix is given by

and so the 2-step transition matrix is

For example , P(X3=2|X1=2) = 7/9 because if X1 = 2 the chain could have gone to 1 (with probability 2/3) and then back to 2 (with probability 2/3) or it could have gone to 3 (with probability 1/3) and then back to 2 (with probability 1), so we have P(X3=2|X1=2) = 2/3*2/3 + 1/3*1 = 7/9.
In order to find P(n) we could just find PPP..P n-times. With a little linear algebra this becomes easier: For many matrices P there exists a matrix U and a diagonal matrix D such that P=UDU-1. Here is how to find U and D:
First we need to find the eigenvalues of the matrix P, that is we need to find the solutions of the equations Px=λx. This is equivalent to (P-λI)x=0 or to det(P-λI)=0. So we have:

The D above now is the matrix with the eigenvalues on the diagonal. The columns of the matrix U are the corresponding eigenvectors (with Euclidean length 1), so For example

Of course we have det(P-λI)=0, so this system is does not have a unique solution. Setting x1=1 we can then easily find a solution x=(1,-1,1,-1).

This vector has Euclidean length √(12+(-1)2+12+(-1)2) = 2, so the normalized eigenvector is x=(1/2,-1/2,1/2,-1/2)
Similarly we can find eigenvectors for the other eigenvalues. Alternatively (and a lot easier!) we can use the R function eigen to do the calculations for us.
Why does this help in computing P(n)? It turns out that we have P(2) = PP = UDU-1UDU = UDDU-1 = UD2U-1, and

and in general we have P(n) = UDnU-1
The routine ehrenfest with which=1 computes P(n) for the Ehrenfest chain

An important consequence of the Markov property is the fact that given the present the past and the future are independent. This is formalized in the Chapman-Kolmogorov equation:

There are a number of properties of Markov chains. Here are some:

1) A Markov chain is said to be irreducible if for each pair of states i and j there is a positive probability that starting in state i the chain will eventually move to state j.
Both the two random walks and the Ehrenfest chain are irreducible.

Example : (Casino) Consider the following chain: you go to the Casino with $10. You play Blackjack, always betting $5. Let Xn be your "wealth" after the nth game. Then Xn {0,5,10,15,..} and P(Xn+k = j|Xk=0) = 0 for all n > 1. ("0" is called a coffin or absorbing state). So this chain is not irreducible.

2) A Markov chain is said to be aperiodic if for some n ≥0 and some state j we have
P(Xn = j|X1=j) > 0 and
P(Xn+1 = j|X1=j) > 0
In other words there should be a chance to return to state j in either n steps or in n+1 steps.
Random walk I and the Ehrenfest chain are not aperiodic because it is only possible to return to the same state in an even number of steps, but not an odd number. Random Walk II is aperiodic.

3) A state x of a Markov chain is said to be recurrent if P(the chain returns to x infinitely often) = 1. A Markov chain is called recurrent if all its states are recurrent. A state that is not recurrent is called transient.
A recurrent state i is said to be positive recurrent if starting at i the expected time until the return to i is finite, otherwise it is called null recurrent.
It can be shown that in a finite-state chain all recurrent states are positive recurrent.

Example : The Ehrenfest chain is clearly recurrent. In the Casino Example "0" is a recurrent state, the others are not. Are the random walks recurrent? Good question! It seems clear that the asymmetric r.v. is not (if p≠0.5), because eventually one expects the walk to run off to infinity (or - infinity). How about Random Walk I? Actually let's consider a more general problem

Example : (Random Walk III) let the state space be the lattice of integers on Rd, that is Xn =(i1, .., id) for ij any integer. Then the chain goes from one point on the lattice to any of the 2d points that differ by one in one coordinate with probability 1/2d.
One of the great results in probability states:
The simple random walk is recurrent if d≤2, transient otherwise, or as Kakutani once said "A drunk man will find his way home but a drunk bird may get lost".

4) Positive recurrent aperiodic states are called ergodic.

Stationary Distribution

Consider again the Ehrenfest chain, and compute P(n) for n→∞. You notice that P(n) seems to converge to a limit. We will now study this limit.

Let S be the state space of a Markov Chain X with transition matrix P. Let π be a "measure" on S. Then π is called a stationary measure of X if πTP=πT.
We won't discuss exactly what it means for π to be a "measure". You can think of it in the same way as a probability distribution, only that we don't have ∑πi=1.
Note: πTP=πT iff (PTπ)TT iff PTπ=π iff (PT-I)π=0, so again the system of equations is singular.

Example : (Ehrenfest chain) To find a (?) stationary measure we have to solve the system of equations
πj = ∑i πiPij i=0,1..,r
often we can get unique solution by requiring that π be a proper probability distribution, that is that ∑πi = 1
Here this means the system
π0 = 1/3π1
π1 = π0+2/3π2
π2 = 2/3π1 + π3
π3 = 1/3π2
π0 + π1 + π2 + π3 = 1

which has the solution π = (1/8,3/8,3/8,1/8)
In the general case we find the stationary measure

The interpretation is the following: Say we choose the initial state X0 according to π, that is P(X0=i) = πi. Then πi is the long-run proportion of time the chain spends at state i, that is πi = lim ∑Nk=1 I[Xn=i]/N.

Example : For the Ehrenfest chain this is illustrated in ehrenfest with which=2.

Example : (Random Walk) Let S be the integers and define a Markov chain by pi,i+1 = p and pi,i-1 = q = 1-p. A stationary measure is given by πi=1 for all i because (πP)i = 1p+1q = 1.
Now assume p≠q and define πi =(p/q)i. Then

Note that this shows that stationary measure are not unique.

One use of the stationary distribution is an extension of the WLLN to Markov chains. That is, say h is a function on the state space, then

where Z is a r.v. with pmf π.
This is illustrated in ehrenfest with which=3 for h(x)=x, h(x)=x2 and h(x)=log(x+1)

One of the main results for Markov chains is the following:
If the Markov chain {Xn} is irreducible and ergodic, thenn

Example : Of course this result does not apply to the Ehrenfest chain, which is not aperiodic, but the result holds anyway as we have seen.

Here is another property of Markov chains: A Markov chain is said to be time-reversible if πiPijjPji for all i≠j. It can be shown that for a time reversible Markov chain if the chain is started from π and run backwards in time it again has transition matrix P.

Example : The Ehrenfest chain is time-reversible. We will show this for the case i=k, j=k+1:

The Gambler's Ruin Problem

Suppose you go to the casino and repeatedly play a game where you win and double your "bet" with probability p and loose with probability q=1-p. For example , if you play roullette and always place your bet on "red" we have p=18/38.
Suppose you go in with the following plan: you have $i to start, you always bet $1 in each round and you stay until you either lost all your money or until you have reached $N. What is the probability of reaching $N before going broke?

If we let Xn denote your "fortune" after n rounds {Xn} is a Markov chain on {0,1,..,N} with transition probabilities
pi,i+1=p and pi,i-1=q for i in {1,..,N-1}
Also we X0=i

Let Pi denote the probability that, starting at i the fortune will eventually reach N. We have:
mark114.png - 28180 Bytes
Note that PN=1 and that the formula above also holds for i=N, so we have
mark115.png - 12770 Bytes

The main "trick" in this calculation was to condition on the "right" event (here X1). This is often the case when doing math with Markov chains.

Say in our Example playing roullette you start with $100. In gambler with which=1 we plot the probability of reaching N before going broke.

Is it the same probability to start with $100 and reach $110 or to start with $200 and reach $220? The answer is no, use gambler with which=2 to verify.

Instead of playing roullette you play Blackjack. You are really good at it, so now p=0.5 (almost). How do the answers to the questions above change?

When I go to the Casino I don't expect to win money. I go for the entertainment of gambling. For me a successful evening is when I don't loose my money to fast, that is I want to maximize the time I spend in the Casino. So how much time can I expect to spend in the Casino if I start with $100 and I will leave if I either win $100 or if I go broke? In gambler with which==3 and p=0.5 we run a simulation to estimate the mean time until I have to leave. Note that a game of blackjack costs $5 each, so we use i=20 and N=40.

Continuous-time Markov Chains

Say {X(t), t≥0} is a continuous-time stochastic process taking values on the nonnegative integers. Then X(t) is a Markov chain if for all s,t≥0, and nonnegative integers i,j, x(u), 0≤u<s we have
mark116.png - 2699 Bytes

Say the process enters state i at time 0. Let Ti be the time the process stays in state i. Then P(Ti>s+t|Ti>s) =P(Ti>t) by the Markov property. But this means that Ti is memoryless! Of course Ti is also non-negative and continuous, and therefore Ti has to have an exponential distribution.
With this we have the following characterization of a continous-time Markov chain:
1) the amount of time spent in state i is an exponential distribution with mean vi.
2) when the process leaves state i it next enters state j with some probability, say Pij.

So a continous-time Markov chain is a process that moves from state to state in accordance with a discrete-space Markov chain, but also spends an exponentially distributed amount of time in each state.

Example : Birth and Death Processes:
Consider a system whose state at any time is the number of "people" in the system. Suppose if there are n people in the system then
(i) new arrivals enter the system at an exponential rate λn ("births") and
(ii) people leave the system at and exponential rate μn ("deaths")
(ii) births and deaths occur independently of each other

Thus a birth and death process is a Markov chain with state-space {0,1,..} and
1) v0 = λ0
2) vi = λi + μi
3) P01 = 1
4) Pi,i+1 = λi/(λi + μi)
5) Pi,i-1 = μi/(λi + μi)

where 4) is because we go from i to i+1 if there is a birth before a death. Let X~Exp(λ), Y~Exp(μ) and XY. Now