Examples for Markov Chains

Umbrellas

Say you own r umbrellas, which are either at home or in your office. In the morning if it rains you take an umbrella, if there is one at home, equally in the evening in the office. Say it rains in the morning or in the evening independently with probability p. Analyze this as a Markov chain and find the transition matrix.

We are going to present two different solutions, that is two transition matrices which both describe this process:

Solution 1: Say Xn is the number of umbrellas at home in the morning of the nth day, then Xn\(\in\) {0,1,..,r}. Now

P(Xn=i|Xn-1=i) = P(it is raining in the morning and evening or it is not raining in the morning and evening) = p2+q2, 1≤i≤r

P(Xn=i-1|Xn-1=i) = P(it is raining in the morning but not in the evening) = pq, 1≤i≤r

P(Xn=i+1|Xn-1=i) = P(it is not raining in the morning but it is raining in the evening) = qp, 1≤i≤r-1

P(Xn=0|Xn-1=0) = P(it is not raining in the evening) = q

P(Xn=1|Xn-1=0) = P(it is raining in the evening) = p

P(Xn=r|Xn-1=r) = P(it is not raining in the morning or it is raining both times) = q+p2

so

Solution 2: Say Xn is the number of umbrellas at your present location (home or work), then Xn\(\in\) {0,1,..,r}. Now

P(Xn=r|Xn-1=0) = P(no umbrellas where you were last) = 1

P(Xn=r-i|Xn-1=i) = P(it is not raining) = q, 1≤i≤r

P(Xn=r-i+1|Xn-1=i) = P(it is raining) = p, 1≤i≤r

Let’s find the n-step transition matrix in the case r=2. Using solution 2 we have


For the eigenvectors we have

and in this generality that’s about it.

Finally we find the stationary distributions. For solution 1 the system of equations is

and so on shows x=c(q,1,..,1) solves the system. Now q+1+..+1=q+r, so the stationary distribution is π0=q/(q+r), πi=1/(q+r) i=1,..,r

For solution 2 we have

and we see that we get the same stationary distribution as in solution 1.

So, what percentage of times do you get wet?

Clearly this is P(no umbrella and it rains) = qπ0=q2/(q+r)

The Gambler’s Ruin Problem

Suppose you go to the casino and repeatedly play a game where you win and double your “bet” with probability p and loose with probability q=1-p. For example, if you play roulette and always place your bet on “red” we have p=18/38.
Suppose you go in with the following plan: you have $i to start, you always bet $1 in each round and you stay until you either lost all your money or until you have reached $N. What is the probability of reaching $N before going broke?

If we let Xn denote your “fortune” after n rounds {Xn} is a Markov chain on {0,1,..,N} with transition probabilities
p00=pNN=1
pi,i+1=p and pi,i-1=q for i in {1,..,N-1}
Also we X0=i

Let Pi denote the probability that, starting at i the fortune will eventually reach N. We have:

Note that PN=1 and that the formula above also holds for i=N, so we have

The main “trick” in this calculation was to condition on the “right” event (here X1). This is often the case when doing math with Markov chains.

Say in our example playing roulette you start with $100. Here is the probability of reaching N before going broke.

Is it the same probability to start with $100 and reach $110 or to start with $200 and reach $220? The answer is no, we find instead

## 100 to 110: 0.349 
## 200 to 220: 0.122

When I go the Casino I don’t expect to win money. I go for the entertainment of gambling. For me a successful evening is when I don’t loose my money to fast, that is I want to maximize the time I spend in the Casino. So how much time can I expect to spend in the Casino if I start with $100 and I will leave if I either win $100 or if I go broke? Here is the result of a simulation to estimate the mean time until I have to leave.

##   Mean Median    Low   High 
##  407.8  304.0   40.0 2830.0

An Application to Runs

Consider a sequence of numbers x1, x2, … and define a run as any increasing sequence. For example say the sequence is

0.83 0.64 0.20 0.02 0.81 0.26 0.26 0.35 0.23 0.25

then the runs are

0.83
0.64
0.20
0.02 0.81
0.26 0.27 0.35
0.23 0.25

so we have 3 runs of length 1, 2 runs of length 2 and 1 run of length 3

Now say Xi~U[0,1] independent and we are interested in the distribution of the length of the runs. For example, if L1 is the length of the first run, then

P(L1≥m) = P(first m numbers are in increasing order) = 1/m!, m=1,2,..

Also, assume that for any later run L its first value is x, then all values of this run have to be in the interval (x,1) and so

P(L≥m|x) = (1-x)m-1/(m-1)!, m=1,2,..

Let’s do a quick check of this formula. Clearly we have

P(L=1|x) = P(X2<x) = x

and our formula gives

P(L=1|x) = P(L≥1|x)-P(L≥2|x) = (1-x)1-1/(1-1)!-(1-x)2-1/(2-1)! = 1-(1-x) = x

Below we have the routine “runs”, which simulates this “game”. runs(1,0.5) verifies the probabilities above for x=0.5.

runs(1, 0.5)
## [1] ""
##             1     2     3     4     5     6 7 8 9 10
## density 0.500 0.375 0.104 0.018 0.002 0.000 0 0 0  0
## Sim     0.492 0.389 0.099 0.017 0.002 0.001 0 0 0  0
## [1] ""
## [1] 0.500000 1.648721 1.651000

Now let In be the first value of the nth run. For example say we have the sequence

0.83
0.64
0.20
0.02 0.81
0.26 0.27 0.35
0.23 0.25

then I={0.83, 0.64, 0.20, 0.02, 0.26, 0.23}

Now {In,n≥1} is a Markov chain with the continuous state space [0,1].

Let us find p(y|x), the probability density that the next run begins at y if the last one began at x. Now

Now the length of the run is m and the next run will start at y if

  1. the next m-1 values are increasing and all greater than x
  2. the mth value is y
  3. the largest of the m-1 values is greater than y

This is illustrated here

so now

runs(2,0.5) verifies the calculation

runs(2, 0.5)

What is the stationary distribution of this chain? First we need a continuous state space analog of the formula for a stationary distribution, which is

solving such an integral equation analytically is very difficult. So let’s make a guess what the stationary distribution is, and then verify that. Now I1 being the initial value of the run is U[0,1]. However, the later runs begin whenever a value smaller than the previous one occurs. So it seems likely that the long-run proportion of such values that are less than y would equal the probability that a U[0,1] rv is less than y given that it is smaller than a second independent U[0,1].

Now

Another way to try and guess what the stationary distribution might be is to use simulation. This is done in runs(3), where it is fairly easy to see that should be a linear function, which of course has to be π(y)=2(1-y)

runs(3)

Now we verify

so now the limiting distribution of the length of the nth run is given by

a curious identity!

the R command

cumsum(1/(2:11)/factorial(0:9))
##  [1] 0.5000000 0.8333333 0.9583333 0.9916667 0.9986111
##  [6] 0.9998016 0.9999752 0.9999972 0.9999997 1.0000000

shows that this identity is indeed true.

Finally runs(4) verifies the expected value

runs(4)
## [1] ""
##       1       2       3       4       5       6       7 
## 0.16514 0.20710 0.09144 0.02731 0.00605 0.00097 0.00011 
##       8       9      10 
## 0.00003 0.00000 0.00000
## [1] 2.007387

The On-Off Process

Here Xn takes only two possible states, coded as 0 (“Off”) and 1 (“On”). Therefore the transition matrix is given by


Now

For the stationary distribution we find

Finally

The Hardy-Weinberg Law

let’s consider the following simple case in genetics: there are two type of genes, A and a. Each individual carries a pair of these, AA, Aa or aa. Assume that the proportion of individuals in the population with AA is p0, aa q0 and Aa r0. When two individuals mate their offspring gets one gene each from their parents at random. So for example

P(offspring is AA | parents are AA and AA) = 1
P(offspring is Aa | parents are AA and AA) = 0
P(offspring is Aa | parents are AA and Aa) = 1/2 (A from parent 1 with probability 1, a from parent 2 with probability 1/2)

We also assume that parents don’t choose to mate based on their genes.

What will the proportions be in the next generation? let’s call these p, q and r.

First we have

P(A) = P(A|AA)p0+P(A|aa)q0+P(A|Aa)r0 = p0+r0/2

and also

P(a) = q0+r0/2

and so

p = P(AA) = P(A)P(A) = (p0+r0/2)2

q = P(aa) = P(a)P(a) = (q0+r0/2)2

r = 2P(A)P(a) = 2 (p0+r0/2) (q0+r0/2)

and so

p+r/2 =

(p0+r0/2)2+ (p0+r0/2) (q0+r0/2) =

(p0+r0/2)[p0+r0/2+q0+r0/2]=

p0+r0/2 = P(A)

So the proportions stay the same from generation to generation!

This is known as the Hardy-Weinberg law

Let us follow the genetic history of an individual from generation to generation. Let’s assume that the population has stabilized at the proportions p,q and r. So, for a given individual, let Xn be the genetic state of her nth offspring. then {Xn,n≥0} is a Markov chain. Now

and in the same way we can find the transition matrix

Clearly one would expect that the limiting probabilities for the individual are just the long-run proportions. Let’s verify this