
Basics of Probability Theory

Kolmogorov’s Axioms and Some Consequences

Modern probability, like geometry, is built on a small set of basic rules called axioms, derived in the 1930’s by Kolmogorov. They are:

if A1, A2, … are mutually exclusive.

Example: Derive the formula for the probability of a sample space

with equally likely outcomes from the axioms

So in this case finding probabilities becomes a counting exercise.


Complement: P(A) = 1 - P(Ac)

Addition Formula: P(A\(\cup\) B) = P(A)+P(B)-P(A∩B)

Subset if A\(\subset\) B then P(A)≤P(B)

Boole’s inequality P(∪Ai) ≤ ∑ P(Ai)


Let {An, n≥1} be a sequence of events. Then

  1. the sequence is called increasing if An\(\subset\) An+1

If {An, n≥1} is an increasing sequence of events we define the new event lim An by

lim An = \(\cup\) An

  1. the sequence is called decreasing if An+1\(\subset\) An

If {An, n≥1} is a decreasing sequence of events we define the new event lim An by

lim An = ∩An


If {An, n≥1} is either an increasing or a decreasing sequence of events then

limP(An) = P(lim An)


Consider a population consisting of individuals able to produce offspring of the same kind. The number of individuals initially present, denoted by X0, is called the size of the zero’th generation. All offspring of the zero’th generation constitute the first generation and their number is denoted by X1. In general, let Xn denote the size of the nth generation.

An = {Xn=0}

Now since Xn=0 implies Xn+1=0, it follows that {Ak, k≥n} is an increasing sequence and thus lim P(An) exists. What is the meaning of this probability? we have

lim P(Xn=0) = lim P(An) =
P(lim An) =
P( \(\cup\) An )=
P( \(\cup\) {Xn=0} )=
P(the population eventually dies out)

Proposition (Borel-Cantelli lemma)

Let A1, A2, .. be sequence of events. If ∑P(Ai) < ∞ then
P(an infinite number of Ai occur) =0


Let X1, X2, .. be such that P(Xn=0)=1/n2 = 1-P(Xn=1)

Let An={Xn=0}. Now ∑P(An) = ∑1/n2 < ∞, so it follows that the probability that Xn equals 0 for an infinite number of n is also 0. Hence, for an n sufficiently large Xn must equal 1.


  1. Two events A and B are said to be independent if


  1. A set of events {Ai,i=1,∞} is said to be independent if for any set of indices {i1,..,in) we have

Proposition (Converse to the Borel-Cantelli lemma)

If A1, A2, .. are independent events such that ∑P(Ai) = ∞ then P(an infinite number of the Ai’s occur) =1


Say A and B are events, then the conditional probability of “A given B” is defined as

if P(B)>0

Note above we had the formula for two events to be independent. Now if A and B are independent we have

P(A|B) = P(A∩B)/P(B) = P(A)P(B)/P(B) = P(A)

so two events are independent if the knowledge that one event occurred does not change the probability of the other. For example the probability that a second flip of a fair coin results in heads is not changed by whether the first flip was heads or not.

It is important to notice that conditional probabilities are just like regular ones, for example they obey the Axioms:

Axiom 1:

P(A|B) = P(A∩ B)/P(B)


P(A∩ B) and P(B) are both regular probabilities, so

P(A∩ B)≥0, P(B)>0


P(A|B)=P(A∩ B)/P(B)≥0

Also A∩ B \(\subset\) B, so P(A|B)=P(A∩ B)/P(B)≤P(B)/P(B)=1

Axiom 2: P(S|B)=P(S∩ B)/P(B)=P(B)/P(B)=1

Axiom 3: say A1,..,An are mutually exclusive, then


P(A∩B) = P(A)P(B|A)


A set of events {An} is called a partition of the sample space if

Ai∩Aj=Ø for all i≠j


\(\cup\) Ai=S

Proposition (Law of Total Probability)

Say the events {An} form a partition of S, and let B\(\subset\) S, then

P(B) =∑P(B|Ai)P(Ai)

Proposition (Bayes’ formula)

Say the events {An} form a partition of S, and let B\(\subset\) S, then

Notice that the denominator is just the law of total probability, so we could have written the formula also in this way:

P(Ak|B) = P(B|Ak)P(Ak)/P(B)


say you play the following game: first you roll a fair die, then you flip a coin as many times as the roll was. Given that you got 3 “heads”, what is the probability you rolled a 5?

Let Ai= {roll was i}, i=1,2.,,.6}


B = “3 heads”
