Basic Theorems

Basic Definitions

An experiment is a well-defined procedure that produces a set of outcomes. For example, “roll a die”, “randomly select a card from a standard 52-card deck”, “flip a coin” and “pick any moment in time between 10am and 12 am” are experiments.

A sample space is the set of outcomes from an experiment. Thus, for “flip a coin” the sample space is {H, T}, for “roll a die” the sample space is {1, 2, 3, 4, 5, 6} and for “pick any moment in time between 10am and 12 am” the sample space is [10, 12].

An event is a subset, say A, of a sample space S. For the experiment “roll a die”, an event is “obtain a number less than 3”. Here, the event is {1, 2}.

Kolmogorov’s Axioms of Probability

Definition

For any probability P we have

Axiom 1 \(0\le P(A)\le 1\)

Axiom 2 \(P(S)=1\)

Axiom 3 if \(A_1 , A_2, ..\) are mutually exclusive we have

\[P\left(\bigcup_{i=1}^\infty A_i\right)=\sum_{i=1}^\infty P(A_i)\]

Note one difference between the axioms here and our discussion on coherence before. There we showed that in order to avoid being a sure looser we have to have

\[P(A\cup B)=P(A)+P(B)\]

if A and B are disjoint. The extension to a finite collection of disjoint sets is straightforward (via induction) but in the axioms we also allow an infinite collection. This is called countable additivity, and is an extension to the requirements of coherence as discussed before. It can be shown that without this extension there is another type of dynamic sure loss.

Example (1.2.1)

say we have a sample space \(S=\{e_1 , ..., e_n \}\) and an event \(A=\{e_{k_1} , ..., e_{k_m}\}\). Let’s assume that all the events are equally likely. Then: \[ \begin{aligned} &1 = \text{ (Axiom 2)}\\ &P(S) = P\left(\{e_1 , ..., e_n \}\right) = \text{ (Axiom 3)}\\ &\sum_{i=1}^n P(e_i) = \sum_{i=1}^n p =np\\ \end{aligned} \]
and so we have \(p=1/n\). Now

\[ \begin{aligned} &P(A) = P\left(\{e_{k_1} , ..., e_{k_m}\}\right) = \text{ (Axiom 3)}\\ &\sum_{j=1}^m P(e_{k_j}) = \sum_{j=1}^m 1/n =m/n\\ \end{aligned} \]

and so in this (very special) case finding a probability becomes a counting problem. We will discuss some formulas for this soon.

Set Theory

Recall the following formulas for sets:

Commutativity:

\[A\cup B = B \cup A\]

and

\[A \cap B=B \cap A\]

Associativity

\[A \cup (B \cup C) = (A \cup B) \cup C\]
\[A \cap (B \cap C) = (A \cap B) \cap C\]

Distributive Law

\[A \cup (B \cap C) = (A \cup B)) \cap (A \cup C)\]
\[A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\]

DeMorgan’s Law

\[(A \cup B)^c = A^c \cap B^c\]
\[(A \cap B)^c = A^c \cup B^c\]

Basic Theorems and Inequalities

Theorem (1.2.2)

Addition Formula

Let A and B be two sets, then

\[P(A \cup B) = P(A)+P(B)-P(A \cap B)\]

proof first note that

\[A\cup B = (A \cap B^c) \cup (A \cap B) \cup (A^c \cap B)\]

and that all of these are disjoint. Therefore by the third axiom we have

\[P(A \cup B) = P(A \cap B^c) + P(A \cap B) + P(A^c \cap B)\]

but

\[ \begin{aligned} &P(A \cap B^c) + P(A \cap B) + P(A^c \cap B) = \\ &\{P(A \cap B^c)+P(A \cap B)\} + {P(A^c \cap B)+P(A \cap B)} - P(A \cap B) = \\ &P( (A \cap B^c) \cup (A \cap B)) + P( (A^c \cap B) \cup (A \cap B)) - P(A \cap B) = \\ &P( A \cap (B^c \cup B)) + P( (A^c \cup A) \cap B)) - P(A \cap B) =\\ &P(A \cap S)+P(S \cap B)-P(A \cap B) = \\ &P(A)+P(B)-P(A \cap B) \end{aligned} \]

Theorem (1.2.3)

Bonferroni’s Inequality

Let A and B be two sets, then

\[P(A \cap B) \ge P(A)+P(B)-1\]

proof follows directly from the addition formula and \(P(A \cup B) \le 1\).

Theorem (1.2.4)

Complement Formula

Let A be any set, then

\[P(A) = 1 - P(A^c)\]

proof \(S = A \cup A^c\), so

\[1 = P(S) = P(A \cup A^c) = P(A) + P(A^c)\]

Theorem (1.2.5)

Let A and B be two sets such that \(A \subseteq B\) then \(P(A) \le P(B)\).

proof

\[ \begin{aligned} &B = B \cap S = B \cap (A \cup A^c) = \\ &(B \cap A) \cup (B \cap A^c) = \\ &A \cup (B \cap A^c)\\ &\text{so}\\ &P(B) = P(A \cup (B \cap A^c) = \\ &P(A) + P(B \cap A^c) \ge P(A)\\ \end{aligned} \]

Theorem (1.2.6)

Boole’s Inequality

Let \(A_1 ,..,A_n\) be a (finite) collection of sets, then

\[P( \bigcup^n_{i=1} A_i ) \le \sum^n_{i=1} P(A_i )\]

proof follows by induction from the above

Borel-Cantelli lemmas

Definition (1.2.7)

Let \(\{A_n , n \ge 1\}\) be a sequence of events. Then

  1. the sequence is called increasing if \(A_n \subseteq A_{n+1}\)

If \(\{A_n , n \ge 1\}\) is an increasing sequence of events we define the new event \(\lim A_n\) by

\[\lim A_n = \bigcup_{n=1}^\infty A_n\]

  1. the sequence is called decreasing if \(A_{n+1} \subseteq A_n\)

If \(\{A_n , n \ge 1\}\) is a decreasing sequence of events we define the event \(\lim A_n\) by

\[\lim A_n = \bigcap_{n=1}^\infty A_n\]

Theorem (1.2.8)

If \(\{A_n , n \ge 1\}\) is either an increasing or a decreasing sequence of events then

\[\lim P(A_n ) = P(\lim A_n)\]

proof Suppose first that \(\{A_n , n \ge 1\}\) is an increasing sequence. Define the events \(F_n\) by

\[F_1 = A_1\]

\[F_{n+1} = A_{n+1} \cap (\bigcup_{i=1}^n A_i )^c = A_{n+1} \cap A_n ^c\]

That is, \(F_n\) consists of those points that were not in any earlier \(A_i , i=1,..,n-1\)

By their definition the \(F_n\) are mutually exclusive, so

\[P\left(\bigcup_{i=1}^n A_i\right) = P\left(\bigcup_{i=1}^n F_i\right) = \sum_{i=1}^nP(F_i)\] for all n>0. Now

\[ \begin{aligned} &P\left(\lim A_n\right) = P\left(\bigcup_{i=1}^\infty A_i\right) = \\ &P\left(\bigcup_{i=1}^\infty F_i\right) = \\ &\sum_{i=1}^\infty P(F_i) = \\ &\lim_{n\rightarrow\infty}\sum_{i=1}^n P(F_i) = \\ &\lim_{n\rightarrow\infty}P\left(\bigcup_{i=1}^n F_i\right) = \\ &\lim_{n\rightarrow\infty}P\left(\bigcup_{i=1}^n A_i\right) =\\ &\lim_{n\rightarrow\infty} P(A_n) \end{aligned} \]

The proof for a decreasing sequence \(\{A_n , n \ge 1\}\) follows directly from the fact that then \(\{A^c_n, n \ge 1\}\) is an increasing sequence.

Example (1.2.9)

Consider a population consisting of individuals able to produce offspring of the same kind. The number of individuals initially present, denoted by \(X_0\), is called the size of the zero’th generation. All offspring of the zero’th generation constitute the first generation and their number is denoted by \(X_1\). In general, let \(X_n\) denote the size of the \(n^{th}\) generation.

Let \(A_n = \{X_n =0\}\). Now since \(X_n =0\) implies \(X_{n+1} =0\), it follows that \(\{A_k , k \ge n\}\) is an increasing sequence and thus \(\lim P(A_n )\) exists. What is the meaning of this probability? We have

\[ \begin{aligned} &\lim P(X_n =0) = \lim P(A_n ) =\\ &P(\lim A_n ) =\\ &P\left( \bigcup A_n \right)= \\ &P\left( \bigcup \{X_n =0\} \right)= \\ &P(\text{the population eventually dies out}) \end{aligned} \]

Definition (1.2.10)

Let \(\{A_n \}\) be an infinite sequence of events. Then

\(\{A_n \text{ i.o.}\}\) (“\(A_n\) infinitely often”)

is the event that for any m there exists an n>m such that \(P(A_n)>0\).

Note:

\[\{A_n \text{ i.o.}\} = \bigcap _m \bigcup _{n \ge m} A_n\]

Theorem (1.2.11)

Borel-Cantelli lemma

Let \(A_1, A_2, ...\) be sequence of events. If \(\sum P(A_i) < \infty\) then

\[P(\{A_n \text{ i.o.}\})=0\]

proof Let’s call the event that an infinite number of the \(A_i\)’s occur \(\limsup A_i\). Then

\[\limsup A_i = \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i\]

This is because if an infinite number of \(A_i\)’s occur, then for any n there exists an m >n such that \(A_m\) occurs, therefore \(\bigcup_{i=n}^\infty A_i\) occurs, and then the intersection occurs as well.

Now \(\left\{\bigcup_{i=n}^\infty A_i; n\ge 1\right\}\) is a decreasing sequence and so it follows that

\[ \begin{aligned} &P\left(\limsup A_i\right) = \\ &P \left( \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i\right) = \\ &\lim_{n\rightarrow\infty}P \left( \bigcup_{i=n}^\infty A_i\right) \le \\ &\lim_{n\rightarrow\infty}\sum_{i=n}^\infty P \left(A_i \right) =0 \end{aligned} \] because in a convergent sum the terms have to go to 0.

Example (1.2.12)

Let \(X_1 , X_2 , ..\) be such that \(P(X_n =0)=1/n^2 = 1-P(X_n =1)\)

Let \(A_n =\{X_n =0\}\), then

\[\sum P(A_n ) = \sum 1/n^2 < \infty\]

so it follows that the probability that \(X_n\) equals 0 for an infinite number of n is also 0. Hence, for an n sufficiently large \(X_n\) must equal 1.

Independence

Definition (1.2.13)

  1. Two events A and B are said to be independent if

\[P(A \cap B)=P(A)P(B)\]

  1. A set of events \(\{A_n ,n \ge 1\}\) is said to be pairwise independent if for any i and j \(A_i\) and \(A_j\) are independent.

  2. A set of events \(\{A_n ,n \ge 1\}\) is said to be independent if for any set of indexes \(\{i_1, ..., i_n\}\) we have

\[P\left(\bigcap_{i=1}^n A_i\right) =\prod_{i=1}^n P(A_i)\]

Theorem (1.2.14)

Pairwise independence does not imply independence.

proof Consider the sample space \(S=\{1,2,3,4\}\) where all outcomes are equally likely. Define the events

\(A=\{1,2\}, B=\{1,3\}\) and \(C=\{1,4\}\)

then we have

\[P(A) = P(B) = P(C) = 1/2\]

and

\[P(A \cap B) = P(B \cap C) = P(A \cap C) = 1/4\]

so we have pairwise independence, but

\[1/8 = P(A)P(B)P(C) \ne P(A \cap B \cap C) = 1/4\]

Theorem (1.2.15)

Second Borel-Cantelli lemma

If \(A_1 , A_2 , ..\) are independent events such that \(\sum P(A_i ) = \infty\) then

\[P\left(\{A_n \text{i.o.}\}\right) =1\]

proof if an infinite number of the \(A_i\)’s occur, then by the same reasoning as in the first Borel-Cantelli lemma we have \(\limsup A_n\) occur, so we need to show that

\[P(\limsup A_n ) = 1\]

or equally

\[1-P(\limsup A_n ) = 0\]

Now

\[ \begin{aligned} &1-P\left(\limsup A_i\right) = \\ &1-P\left(\{A_n \text{i.o.}\}\right)=\\ &P\left(\{A_n \text{i.o.}\}^c\right)=\\ &P \left( \left\{ \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i\right\}^c\right) = \\ &P \left( \bigcup_{n=1}^\infty \left\{\bigcup_{i=n}^\infty A_i\right\}^c\right) = \\ &P \left( \bigcup_{n=1}^\infty \bigcap_{i=n}^\infty A_i^c\right) = \\ &\lim_{n\rightarrow\infty} P \left( \bigcap_{i=n}^\infty A_i^c\right) = \\ &\lim_{n\rightarrow\infty} \prod_{i=n}^\infty P \left( A_i^c\right) = \\ &\lim_{n\rightarrow\infty} \prod_{i=n}^\infty \left[1-P(A_i)\right] \le \\ &\lim_{n\rightarrow\infty} \prod_{i=n}^\infty \left[1-P(A_i)+\frac{P(A_i)^2}{2!}-\frac{P(A_i)^3}{3!}\pm ...\right] = \\ &\lim_{n\rightarrow\infty} \prod_{i=n}^\infty \exp\{-P(A_i)\} =\\ &\lim_{n\rightarrow\infty} \exp\{-\sum_{i=n}^\infty P(A_i)\} =\\ &\lim_{n\rightarrow\infty} \exp\{-\infty\} =0\\ \end{aligned} \] because in a divergent sum the “tail” is always infinite.

Example (1.2.16)

consider the following game we start with an urn with one white ball and one black ball. Let

\[A_1 =\{\text{white ball drawn}\}\]

next we add a black ball and let \(A_2\) again be the event \(\{\text{white ball drawn}\}\). We continue on like that.

Now \(P(A_n)=1/(n+1)\). Clearly the \(A_i\)’s are independent and

\[\sum P(A_i) = \sum 1/(n+1) = \infty\]

and therefore

\[P(\{A_n \text{i.o.}\}) = 1\]

So no matter how many balls are already in the urn, at some point in the future we will again pick a white one.

Say in each round we add k black balls, then again we have

\[\sum P(A_i ) = 1/(1+1) + 1/(2+k) + 1/(2+2k) + .. =\sum 1/(2+nk) =\infty\]

and for any k>0, so the same conclusion holds!

Now say that in each round we doubled the number of black balls added, so in the first round it is 1, then 2, 4, 8 etc

\[\sum P(A_i ) = 1/(1+1) + 1/(2+2) + 1/(4+4) + .. = \sum 1/2^n < \infty\]

and so now there will be a time after which we never pick the white ball again.


If we take the two Borel Cantelli lemmas together we have the following: Let \(\{A_n \}\) be a sequence of independent events, then

\[P(\{A_n \text{i.o.}\}) = 0 \text{ or }1\]

This is an example of a so called 0-1 law, of which there are quite a few in probability theory. Here is one of the famous ones:

Definition (1.2.17)

Let \(\{A_n \}\) be an infinite sequence of events. An event B is called a tail event if knowing whether or not \(A_n\) occurred for each n determines B, but B is independent of any finite collection of \(A_n\)’s.

Example (1.2.18)

\(P(\{A_n \text{i.o.}\})\) is a tail event.

Theorem (1.2.19)

Kolmogorov’s 0-1 law

Let B be a tail event of the sequence \(\{A_n \}\). Then P(B) is either 0 or 1.