Markov Chains and Markov Processes

Discrete - Time Markov Chains

The sequence of r.v. X1, X2, .. is said to be a Markov chain if for any event A we have

P(Xn \(\in\) A | X1 = x1, .., Xn-1=xn-1) = P(Xn \(\in\) A | Xn-1 = xn-1)

that is Xn depends only on Xn-1 but not on any of the r.v. before it.

Example (Random Walk)

Say we flip a coin repeatedly. Let the random variable Yi be 1 if the ith flip is heads, -1 otherwise.

Now let Xn = ∑ni=1 Yi

Clearly we have

If we think of the index n as a time variable, then all that matters for the state of the system at time n is where it was at time n-1, but not on how it got to that state.

For a Markov chain (that is a stochastic process with a discrete state space) all the relevant (probability) information is contained in the probability to get from state i to state j in k steps. For k=1 this is contained in the transition matrix P = (pij), and in fact as we shall see that P is all we need.

Example (Random Walk, cont)

Here we have pij = 1/2 if |i-j|=1, 0 otherwise

Example (Asymmetric Random Walk)

As above the state space are the integers but now we go from i to i+1 with probability p, to i-1 with probability q and stay at i with probability 1-p-q.

Example (Ehrenfest chain)

Say we have two boxes, box 1 with k balls and box 2 with r-k balls. We pick one of the balls at random and move it to the other box. Let Xnbe the number of balls in box 1 at time n. We have
Xn \(\in\) {0,1,..,r},

r=7

Now

pk,k-1 = k/r (pick ball from Box 1, which has k or the r)
pk,k+1 = (r-k)/r (pick ball from Box 2, which has r-k or the r)
and
pi,j = 0 otherwise.

Note:

p0,1 = (r-0)/r = 1
pr,r-1 = r/r = 1

Ehrenfest used this model to study the exchange of air molecules in two chambers connected by a small hole.

Say we have a Markov chain Xn, n=1,2,.. with transition matrix P. Define the n-step transition matrix P(n) = (p(n)ij) by p(n)ij = P(Xn=j|X1=i). Of course P(1) = P. Now

Example (Ehrenfest chain)

Let’s find the 2-step transition matrix for the Ehrenfest chain with r=3. The transition matrix is given by

and so the 2-step transition matrix is

For example, P(X3=2|X1=2) = 7/9 because if X1 = 2 the chain could have gone to 1 (with probability 2/3) and then back to 2 (with probability 2/3) or it could have gone to 3 (with probability 1/3) and then back to 2 (with probability 1), so we have

P(X3=2|X1=2) = 2/3*2/3 + 1/3*1 = 7/9.

Proposition
let P=(pij) be a transition matrix. Then ∑jpij=1 for all i.

proof

In order to find P(n) we could just find PPP..P n-times. With a little linear algebra this becomes easier:

For many matrices P there exists a matrix U and a diagonal matrix D such that P=UDU-1. Here is how to find U and D:
First we need to find the eigenvalues of the matrix P, that is we need to find the solutions of the equations Px=λx. This is equivalent to (P-λI)x=0 or to det(P-λI)=0. So we have:

The D above now is the matrix with the eigenvalues on the diagonal. The columns of the matrix U are the corresponding eigenvectors (with Euclidean length 1), so for example

Of course we have det(P-λI)=0, so this system is does not have a unique solution. Setting x1=1 we can then easily find a solution x=(1,-1,1,-1).

This vector has Euclidean length √(12+(-1)2+12+(-1)2) = 2, so the normalized eigenvector is x=(1/2,-1/2,1/2,-1/2)

Similarly we can find eigenvectors for the other eigenvalues. Alternatively (and a lot easier!) we can use the R function eigen to do the calculations for us.

Why does this help in computing P(n)? It turns out that we have P(2) = PP = UDU-1UDU = UDDU-1 = UD2U-1, and

and in general we have P(n) = UDnU-1


Notice that this result allows us to define functions whose arguments are matrices: Say A is some \(d\times d\) matrix with \(A=UDU^{-1}\). Let p be a polynomial of the form

\[p(x)=\sum_{i=1}^n a_ix^i\]

The we define \(p(A)\) to be

\[ \begin{aligned} &p(A) = \\ &\sum_{i=1}^n a_iA^i = \\ &\sum_{i=1}^n a_i(UDU^{-1})^i = \\ &U\left(\sum_{i=1}^n a_iD^i\right)U^{-1} \\ \end{aligned} \]

and finally for any function f that has a Taylor series expansion \(f(x)=\sum_{i=1}^\infty a_ix^i\) we can define

\[f(A)=U\left(\sum_{i=1}^\infty a_iD^i\right)U^{-1}\]

Example

Say

\[ A = \begin{bmatrix} 1 & 2\\ 2 & 1 \\ \end{bmatrix} \]

what would be \(f(A)=\exp(A)\)?

\[ \begin{aligned} &\det(A-\lambda I) = (1-\lambda)^2-4=0\\ &\lambda =1\pm2 \\ \end{aligned} \] so

\[ D= \begin{bmatrix} -1 & 0\\ 0 & 3\\ \end{bmatrix} \]

\[ (A-\lambda_1)x = \begin{bmatrix} 2 & 2\\ 2 & 2 \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \\ \end{bmatrix}= \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix} \]

so \(2x_1+2x_2=0\), or \(x_1=-x_2\). Say \(x_1=1\), then \(\sqrt{1^2+(-1)^2}=\sqrt 2\).

\[ (A-\lambda_2)x = \begin{bmatrix} -2 & 2\\ 2 & -2 \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \\ \end{bmatrix}= \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix} \]

so \(-x_1+x_2=0\), or \(x_1=x_2\). Say \(x_1=1\), then \(\sqrt{1^2+1^2}=\sqrt 2\) and we have

\[ U = \frac1{\sqrt 2} \begin{bmatrix} 1 & -1\\ 1 & 1 \\ \end{bmatrix} \] Also

\[ U^{-1} = \frac1{\sqrt 2} \begin{bmatrix} 1 & 1\\ -1 & 1 \\ \end{bmatrix} \]

Now we have \(f(x)=\exp(x)=\sum_{n=0}^\infty x^n/n!\), and so

\[ \begin{aligned} &\log(A) = U\left(\sum_{n=0}^\infty D^n/n! \right)U^{-1}\\ \end{aligned} \] \[ D^n/n!=\\ \begin{bmatrix} -1 & 0\\ 0 & 3\\ \end{bmatrix}^n/n!=\\ \begin{bmatrix} (-1)^{n}/n! & 0\\ 0 & 3^{n}/n! \\ \end{bmatrix} \]

\[ \sum_{n=0}^\infty D^n/n!= \begin{bmatrix} \sum_{n=0}^\infty (-1)^n/n! & 0\\ 0 & \sum_{n=0}^\infty 3^n/n! \\ \end{bmatrix}=\\ \begin{bmatrix} e^{-1} & 0\\ 0 & e^3 \\ \end{bmatrix} \]

and finally

\[ \exp(A)= U \begin{bmatrix} e^{-1} & 0\\ 0 & e^3 \\ \end{bmatrix} U^{-1}= \]

\[ \frac1{\sqrt 2} \begin{bmatrix} 1 & -1\\ 1 & 1 \\ \end{bmatrix} \begin{bmatrix} e^{-1} & 0\\ 0 & e^3 \\ \end{bmatrix} \frac1{\sqrt 2} \begin{bmatrix} 1 & 1\\ -1 & 1 \\ \end{bmatrix} = \] \[ \frac1{2} \begin{bmatrix} 1 & -1\\ 1 & 1 \\ \end{bmatrix} \begin{bmatrix} e^{-1} & e^{-1}\\ -e^3 & e^3 \\ \end{bmatrix} \]

\[ \frac1{2} \begin{bmatrix} e^{-1}+e^3 & e^{-1}-e^3\\ e^{-1}-e^3 & e^{-1}+e^3 \\ \end{bmatrix} \]

Proposition:

λ=1 is always an eigenvalue of a transition matrix, with eigenvector (1/√d 1/√d .. 1/√d)T where d is the dimension of P

Theorem

An important consequence of the Markov property is the fact that given the present the past and the future are independent. This is formalized in the Chapman-Kolmogorov equation:

proof

follows directly from the law of total probability


Classification of States and Chains

Definition

State j is said to be accessible from state i if there exists n≥0 such that pij(n)>0 and we write i→j. If i is accessible from j and j is accessible from i then i and j are said to communicate, and we write i↔j

Proposition

Communication is an equivalence relation. That is

  1. i↔i

  2. if i↔j then i ↔i

  3. if i↔j and j ↔k then i↔k

proof

  1. and ii) follow from the definition of communication. For iii) suppose i↔j and j ↔k, so there exist n and m such that

pij(n)>0 and pjk(m)>0

But then

pik(n+m) = ∑l pil(n)plk(m) ≥ pij(n)pjk(m) > 0

Definition

  1. Two states that communicate are said to be in the same class. A Markov chain is said to be irreducible if it consists of only one class

  2. A state i is said to have period d if pii(n)=0 whenever n is not divisible by d and d is the largest integer with this property. A state with period 1 is said to be aperiodic. Let d(i) denote the period of state i.

Example

Both the two random walks and the Ehrenfest chain are irreducible. Random walk I and the Ehrenfest chain have period 2. Random Walk II is aperiodic.

Example (Casino)

Consider the following chain: you go to the Casino with $10. You play Blackjack, always betting $5. Let Xn be your “wealth” after the nth game. Then Xn \(\in\) {0,5,10,15,..} and P(Xn+k = j|Xk=0) = 0 for all n > 1. (“0” is called a coffin or absorbing state). So this chain is not irreducible.

Proposition

if i↔j then d(i)=d(j)

proof

Let n and m be such that pji(n)pij(m) > 0 and suppose that pii(s)> 0, so d(i) divides s. Then

pjj(n+m) ≥ pji(n)pij(m) > 0

pjj(n+s+m) ≥ pji(n)pii(s)pij(m) > 0

therefore d(j) divides both n+m and n+m+s, and so it also divides n+m+s-(n+m) = s. Therefore d(j) divides s as well. The argument is symmetric in i and j, so d(i)=d(j)


Definition

  1. For any states i and j let fijn be the probability that starting at i the chain visits j for the first time after n steps . Formally

fij0 = 0

fijn = ∑n P(Xn =j,Xk≠j,k=1,..,n-1|X0=i)

  1. Let

fij = ∑n fijn

be the probability that the chain starting at i ever visits j

Definition

  1. A state i of a Markov chain is said to be recurrent if fii =1. A state that is not recurrent is called transient.

  2. A recurrent state i is said to be positive recurrent if starting at i the expected time until the return to i is finite, otherwise it is called null recurrent.

  3. Positive recurrent aperiodic states are called ergodic.

Example

The Ehrenfest chain is clearly recurrent. In the Casino example “0” is a recurrent state, the others are not.

Are the random walks recurrent? Good question! we will answer it using the following theorem:

Theorem

A state i is recurrent if and only if

proof

Because of the Markov property once the chain returns to i, everything starts all over again and so the probability that the chain returns to i a second time is (fii)2 and a third time is (fii)3 and so on.

Let M be the total number of returns to state i, then we have just shown that M has a geometric distribution with

Note that we have

state i is transient iff fii<1 iff E[M|X0=i]<∞

We can write M as

and so we are done.

The next corollary shows that, like periodicity, recurrence is a class property:

Corollary

if state i is recurrent and i↔j, then j is recurrent

proof

say n and m are such that

pij(n)>0 and pji(m)>0

now for any s≥0

pjj(n+m+s) ≥ pji(m) pii(s) pij(n)

and so

s pjj(n+m+s) ≥ ∑s( pji(m) pii(s) pij(n)) = pji(m)pij(n)s pii(s) = ∞

Corollary

if i↔j and state j is recurrent, then fij = 1

proof

Suppose X0=i, and let n be such that pij(n)>0. Now pij(n) maybe less than 1, so it is possible that Xn≠j. If that happens let’s say we missed opportunity 1. If so, let T1 be the next time the chain visits i. Note that

P(T1<∞)=1

by the corollary above. Say we missed opportunity 2 if XT1+n≠j, and let T2 be the time of the first visit to i after that, and so on.

Because at each visit to i the chain starts all over again it is clear that Tk has a geometric distribution with success probability pij(n). Therefore

E[Tk] =1/pij(n) < ∞

and because there are infinitely many Tk ’s sooner or later there will be a success, or fii=1


Let Nj(t) denote the number of transitions into j by time t. If j is recurrent and X0=j, then the process probabilistic ally starts all over , and so {Nj(t),t≥0} is a renewal process with interarrival time distribution {fjj n,n≥0}.

 

Example Random Walk

for simplicity we assume p+q=1, that is the chain always jumps. Then

So the state 0 is recurrent iff p=1/2. But we can get to 0 from any other state and back, so all states are recurrent.

Let’s consider a more general problem:

Example (Random Walk III)

let the state space be the lattice of integers on Rd, that is Xn =(i1, .., id) for ij any integer. Then the chain goes from one point on the lattice to any of the 2d points that differ by one in one coordinate with probability 1/2d.

Example Random Walk in R2:

let 0=(0,0), then since the chain is irreducible, if 0 is recurrent every state is recurrent. Again returns to 0 are only possible in 2n steps, of which i have to be left and i right, n-i up and n-i down. Now the probabilities are multinomial, and we have

and so all states are recurrent.

Example Random Walk in Rd

Notice that

This is in effect true because we can think of the random walk in Rd as d independent random walks, one in each of the “dimensions”. Therefore

The first proof of this was given by George Polya in 1921, and described very nicely by Kakutani, who said

A drunk man will find his way home but a drunk bird may get lost.

Example

Let’s return to 1 dimension and consider the following question: if P(X1=1)=p=1/2 we know that state 0 (like all others) is recurrent,so

P(chain returns to 0 | it starts at 0) =1

But what about if 0<p<1/2? First of all

P(chain returns to 0 | it starts at 0) ≥ P(X2=1,X1=-1|X0=0) = pq > 0

But what is P(chain returns to 0 | it starts at 0)? Well first of all

P(chain returns to 0) = P(chain returns to 0|X1=-1)(1-p) + P(chain returns to 0|X1=1)p

We know the chain is transient, so P(0 infinitely often)=0. Moreover, p<1/2, so Xn→-∞ with probability 1, therefore

P(chain returns to 0|X1=1)=1

Now

because α is the probability to “ever reach the state to the right”, and now we have to do that twice.

Finally


In general we have

P(chain returns to i | chain starts at i) = 2*min(p,1-p)

Example

let X1, X2,.. be independent Pois(1). Let Sn= X1+..+Xn, so Sn~Pois(n). Therefore

and so we have a proof of Sterling’s formula!

Example

Say we have a Markov chain with states {0,1,2} and transition matrix

Find P(X3=0)

A little linear algebra shows

Now

so to calculate P(X3=0) we need to specify the distribution of X0. Two common ways are to specify a specific state (say 0), or to choose one at random:

Soon we will see another common choice

Stationary Distribution and Limit Theorems

Example

Consider again the Ehrenfest chain with r=3, and compute P(n):


You notice that P(n) seems to converge to a limit. We will now study this limit.

Let S be the state space of a Markov chain {Xn,n≥0} with transition matrix P. Let π be a “measure” on S. Then π is called a stationary measure of {Xn,n≥0} if

πTP=πT

We won’t discuss exactly what it means for π to be a “measure”. You can think of it in the same way as a probability distribution, only that we don’t have ∑πi=1.

Note:

πTP=πT iff (PTπ)TT iff PTπ=π iff (PT-I)π=0

so again the system of equations is singular.

Example

Say we have a Markov chain with states {0,1,2} and transition matrix

Find P(X3=0) if the chain starts in its stationary distribution.

but 0.3125 = 5/16, so we find

P(X3=0|X0 has distribution π) = π

And so we find if the chain starts in its stationary distribution it stays there! As it turns out this is true in general.

Example (Ehrenfest chain)

To find a (?) stationary measure we have to solve the system of equations
πj = ∑i πiPij i=0,1..,r
often we can get unique solution by requiring that π be a proper probability distribution, that is that ∑πi = 1
Here this means the system

π0 = 1/3π1
π1 = π0+2/3π2
π2 = 2/3π1 + π3
π3 = 1/3π2
π0 + π1 + π2 + π3 = 1

which has the solution π = (1/8,3/8,3/8,1/8)
In the general case we find the stationary measure

The interpretation is the following: Say we choose the initial state X0 according to π, that is

P(X0=i) = πi

Then πi is the long-run proportion of time the chain spends at state i, that is

πi = lim ∑Nk=1 I[Xn=i]/N.

Example (Random Walk)

Let S be the integers and define a Markov chain by pi,i+1 = p and pi,i-1 = q = 1-p. A stationary measure is given by πi=1 for all i because (πP)i = 1p+1q = 1.
Now assume p ≠ q and define πi =(p/q)i. Then

Note that this shows that stationary measure are not unique, even if we require ∑πi=1

One use of the stationary distribution is an extension of the WLLN to Markov chains. That is, say h is a function on the state space, then

where Z is a r.v. with density π.

One of the main results for Markov chains is the following:

If the Markov chain {Xn,n≥0} is irreducible and ergodic, then

Example

Of course this result does not apply to the Ehrenfest chain, which is not aperiodic, but the result holds anyway as we have seen.

Here is another property of Markov chains:

Definition:

A Markov chain is said to be time-reversible if

πiPijjPji for all i≠j

Theorem

Let Markov chain {Yn,n≥0} be time reversible. Fix a time T large and run the chain backwards, that is let Xn=YT-n. The the transition matrix of the chain {Xn,n≥0} is the same as the transition matrix of {Yn,n≥0}

In other words, for a time reversible Markov chain if the chain is started from π and run backwards in time it again has transition matrix P.

Example

The Ehrenfest chain is time-reversible. We will show this for the case i=k, j=k+1: