Probability

Basics of Probability Theory

Kolmogorov’s Axioms and Some Consequences

Modern probability, like geometry, is built on a small set of basic rules called axioms, derived in the 1930’s by Kolmogorov. They are:

if A₁, A₂, … are mutually exclusive.

Example: Derive the formula for the probability of a sample space

with equally likely outcomes from the axioms

So in this case finding probabilities becomes a counting exercise.

Proposition

Complement: P(A) = 1 - P(A^c)

Addition Formula: P(A\(\cup\) B) = P(A)+P(B)-P(A∩B)

Subset if A\(\subset\) B then P(A)≤P(B)

Boole’s inequality P(∪A_i) ≤ ∑ P(A_i)

Definition

Let {A_n, n≥1} be a sequence of events. Then

the sequence is called increasing if A_n\(\subset\) A_n+1

If {A_n, n≥1} is an increasing sequence of events we define the new event lim A_n by

lim A_n = \(\cup\) A_n

the sequence is called decreasing if A_n+1\(\subset\) A_n

If {A_n, n≥1} is a decreasing sequence of events we define the new event lim A_n by

lim A_n = ∩A_n

Proposition

If {A_n, n≥1} is either an increasing or a decreasing sequence of events then

limP(A_n) = P(lim A_n)

Example

Consider a population consisting of individuals able to produce offspring of the same kind. The number of individuals initially present, denoted by X₀, is called the size of the zero’th generation. All offspring of the zero’th generation constitute the first generation and their number is denoted by X₁. In general, let X_n denote the size of the n^th generation.

Let
A_n = {X_n=0}

Now since X_n=0 implies X_n+1=0, it follows that {A_k, k≥n} is an increasing sequence and thus lim P(A_n) exists. What is the meaning of this probability? we have

lim P(X_n=0) = lim P(A_n) =
P(lim A_n) =
P( \(\cup\) A_n )=
P( \(\cup\) {X_n=0} )=
P(the population eventually dies out)

Proposition (Borel-Cantelli lemma)

Let A₁, A₂, .. be sequence of events. If ∑P(A_i) < ∞ then
P(an infinite number of A_i occur) =0

Example

Let X₁, X₂, .. be such that P(X_n=0)=1/n² = 1-P(X_n=1)

Let A_n={X_n=0}. Now ∑P(A_n) = ∑1/n² < ∞, so it follows that the probability that X_n equals 0 for an infinite number of n is also 0. Hence, for an n sufficiently large X_n must equal 1.

Definition

Two events A and B are said to be independent if

P(A∩B)=P(A)P(B)

A set of events {A_i,i=1,∞} is said to be independent if for any set of indices {i₁,..,i_n) we have

Proposition (Converse to the Borel-Cantelli lemma)

If A₁, A₂, .. are independent events such that ∑P(A_i) = ∞ then P(an infinite number of the A_i’s occur) =1

Definition

Say A and B are events, then the conditional probability of “A given B” is defined as

if P(B)>0

Note above we had the formula for two events to be independent. Now if A and B are independent we have

P(A|B) = P(A∩B)/P(B) = P(A)P(B)/P(B) = P(A)

so two events are independent if the knowledge that one event occurred does not change the probability of the other. For example the probability that a second flip of a fair coin results in heads is not changed by whether the first flip was heads or not.

It is important to notice that conditional probabilities are just like regular ones, for example they obey the Axioms:

Axiom 1:

P(A|B) = P(A∩ B)/P(B)

but

P(A∩ B) and P(B) are both regular probabilities, so

P(A∩ B)≥0, P(B)>0

P(A|B)=P(A∩ B)/P(B)≥0

Also A∩ B \(\subset\) B, so P(A|B)=P(A∩ B)/P(B)≤P(B)/P(B)=1

Axiom 2: P(S|B)=P(S∩ B)/P(B)=P(B)/P(B)=1

Axiom 3: say A₁,..,A_n are mutually exclusive, then

Proposition

P(A∩B) = P(A)P(B|A)

Definition

A set of events {A_n} is called a partition of the sample space if

A_i∩A_j=Ø for all i≠j

and

\(\cup\) A_i=S

Proposition (Law of Total Probability)

Say the events {A_n} form a partition of S, and let B\(\subset\) S, then

P(B) =∑P(B|A_i)P(A_i)

Proposition (Bayes’ formula)

Say the events {A_n} form a partition of S, and let B\(\subset\) S, then

Notice that the denominator is just the law of total probability, so we could have written the formula also in this way:

P(A_k|B) = P(B|A_k)P(A_k)/P(B)

Example

say you play the following game: first you roll a fair die, then you flip a coin as many times as the roll was. Given that you got 3 “heads”, what is the probability you rolled a 5?

Let A_i= {roll was i}, i=1,2.,,.6}

and

B = “3 heads”

then