Processing math: 28%

Basic Theorems

Basic Definitions

An experiment is a well-defined procedure that produces a set of outcomes. For example, “roll a die”, “randomly select a card from a standard 52-card deck”, “flip a coin” and “pick any moment in time between 10am and 12 am” are experiments.

A sample space is the set of outcomes from an experiment. Thus, for “flip a coin” the sample space is {H, T}, for “roll a die” the sample space is {1, 2, 3, 4, 5, 6} and for “pick any moment in time between 10am and 12 am” the sample space is [10, 12].

An event is a subset, say A, of a sample space S. For the experiment “roll a die”, an event is “obtain a number less than 3”. Here, the event is {1, 2}.

Kolmogorov’s Axioms of Probability

Definition

For any probability P we have

Axiom 1 0P(A)1

Axiom 2 P(S)=1

Axiom 3 if A1,A2,.. are mutually exclusive we have

P(i=1Ai)=i=1P(Ai)

Note one difference between the axioms here and our discussion on coherence before. There we showed that in order to avoid being a sure looser we have to have

P(AB)=P(A)+P(B)

if A and B are disjoint. The extension to a finite collection of disjoint sets is straightforward (via induction) but in the axioms we also allow an infinite collection. This is called countable additivity, and is an extension to the requirements of coherence as discussed before. It can be shown that without this extension there is another type of dynamic sure loss.

Example (1.2.1)

say we have a sample space S={e1,...,en} and an event A={ek1,...,ekm}. Let’s assume that all the events are equally likely. Then: 1= (Axiom 2)P(S)=P({e1,...,en})= (Axiom 3)ni=1P(ei)=ni=1p=np
and so we have p=1/n. Now

P(A)=P({ek1,...,ekm})= (Axiom 3)mj=1P(ekj)=mj=11/n=m/n

and so in this (very special) case finding a probability becomes a counting problem. We will discuss some formulas for this soon.

Set Theory

Recall the following formulas for sets:

Commutativity:

AB=BA

and

AB=BA

Associativity

A(BC)=(AB)C
A(BC)=(AB)C

Distributive Law

A(BC)=(AB))(AC)
A(BC)=(AB)(AC)

DeMorgan’s Law

(AB)c=AcBc
(AB)c=AcBc

Basic Theorems and Inequalities

Theorem (1.2.2)

Addition Formula

Let A and B be two sets, then

P(AB)=P(A)+P(B)P(AB)

proof first note that

AB=(ABc)(AB)(AcB)

and that all of these are disjoint. Therefore by the third axiom we have

P(AB)=P(ABc)+P(AB)+P(AcB)

but

P(ABc)+P(AB)+P(AcB)={P(ABc)+P(AB)}+P(AcB)+P(AB)P(AB)=P((ABc)(AB))+P((AcB)(AB))P(AB)=P(A(BcB))+P((AcA)B))P(AB)=P(AS)+P(SB)P(AB)=P(A)+P(B)P(AB)

Theorem (1.2.3)

Bonferroni’s Inequality

Let A and B be two sets, then

P(AB)P(A)+P(B)1

proof follows directly from the addition formula and P(AB)1.

Theorem (1.2.4)

Complement Formula

Let A be any set, then

P(A)=1P(Ac)

proof S=AAc, so

1=P(S)=P(AAc)=P(A)+P(Ac)

Theorem (1.2.5)

Let A and B be two sets such that AB then P(A)P(B).

proof

B=BS=B(AAc)=(BA)(BAc)=A(BAc)soP(B)=P(A(BAc)=P(A)+P(BAc)P(A)

Theorem (1.2.6)

Boole’s Inequality

Let A1,..,An be a (finite) collection of sets, then

P(ni=1Ai)ni=1P(Ai)

proof follows by induction from the above

Borel-Cantelli lemmas

Definition (1.2.7)

Let {An,n1} be a sequence of events. Then

  1. the sequence is called increasing if AnAn+1

If {An,n1} is an increasing sequence of events we define the new event lim by

\lim A_n = \bigcup_{n=1}^\infty A_n

  1. the sequence is called decreasing if A_{n+1} \subseteq A_n

If \{A_n , n \ge 1\} is a decreasing sequence of events we define the event \lim A_n by

\lim A_n = \bigcap_{n=1}^\infty A_n

Theorem (1.2.8)

If \{A_n , n \ge 1\} is either an increasing or a decreasing sequence of events then

\lim P(A_n ) = P(\lim A_n)

proof Suppose first that \{A_n , n \ge 1\} is an increasing sequence. Define the events F_n by

F_1 = A_1

F_{n+1} = A_{n+1} \cap (\bigcup_{i=1}^n A_i )^c = A_{n+1} \cap A_n ^c

That is, F_n consists of those points that were not in any earlier A_i , i=1,..,n-1

By their definition the F_n are mutually exclusive, so

P\left(\bigcup_{i=1}^n A_i\right) = P\left(\bigcup_{i=1}^n F_i\right) = \sum_{i=1}^nP(F_i) for all n>0. Now

\begin{aligned} &P\left(\lim A_n\right) = P\left(\bigcup_{i=1}^\infty A_i\right) = \\ &P\left(\bigcup_{i=1}^\infty F_i\right) = \\ &\sum_{i=1}^\infty P(F_i) = \\ &\lim_{n\rightarrow\infty}\sum_{i=1}^n P(F_i) = \\ &\lim_{n\rightarrow\infty}P\left(\bigcup_{i=1}^n F_i\right) = \\ &\lim_{n\rightarrow\infty}P\left(\bigcup_{i=1}^n A_i\right) =\\ &\lim_{n\rightarrow\infty} P(A_n) \end{aligned}

The proof for a decreasing sequence \{A_n , n \ge 1\} follows directly from the fact that then \{A^c_n, n \ge 1\} is an increasing sequence.

Example (1.2.9)

Consider a population consisting of individuals able to produce offspring of the same kind. The number of individuals initially present, denoted by X_0, is called the size of the zero’th generation. All offspring of the zero’th generation constitute the first generation and their number is denoted by X_1. In general, let X_n denote the size of the n^{th} generation.

Let A_n = \{X_n =0\}. Now since X_n =0 implies X_{n+1} =0, it follows that \{A_k , k \ge n\} is an increasing sequence and thus \lim P(A_n ) exists. What is the meaning of this probability? We have

\begin{aligned} &\lim P(X_n =0) = \lim P(A_n ) =\\ &P(\lim A_n ) =\\ &P\left( \bigcup A_n \right)= \\ &P\left( \bigcup \{X_n =0\} \right)= \\ &P(\text{the population eventually dies out}) \end{aligned}

Definition (1.2.10)

Let \{A_n \} be an infinite sequence of events. Then

\{A_n \text{ i.o.}\} (“A_n infinitely often”)

is the event that for any m there exists an n>m such that P(A_n)>0.

Note:

\{A_n \text{ i.o.}\} = \bigcap _m \bigcup _{n \ge m} A_n

Theorem (1.2.11)

Borel-Cantelli lemma

Let A_1, A_2, ... be sequence of events. If \sum P(A_i) < \infty then

P(\{A_n \text{ i.o.}\})=0

proof Let’s call the event that an infinite number of the A_i’s occur \limsup A_i. Then

\limsup A_i = \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i

This is because if an infinite number of A_i’s occur, then for any n there exists an m >n such that A_m occurs, therefore \bigcup_{i=n}^\infty A_i occurs, and then the intersection occurs as well.

Now \left\{\bigcup_{i=n}^\infty A_i; n\ge 1\right\} is a decreasing sequence and so it follows that

\begin{aligned} &P\left(\limsup A_i\right) = \\ &P \left( \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i\right) = \\ &\lim_{n\rightarrow\infty}P \left( \bigcup_{i=n}^\infty A_i\right) \le \\ &\lim_{n\rightarrow\infty}\sum_{i=n}^\infty P \left(A_i \right) =0 \end{aligned} because in a convergent sum the terms have to go to 0.

Example (1.2.12)

Let X_1 , X_2 , .. be such that P(X_n =0)=1/n^2 = 1-P(X_n =1)

Let A_n =\{X_n =0\}, then

\sum P(A_n ) = \sum 1/n^2 < \infty

so it follows that the probability that X_n equals 0 for an infinite number of n is also 0. Hence, for an n sufficiently large X_n must equal 1.

Independence

Definition (1.2.13)

  1. Two events A and B are said to be independent if

P(A \cap B)=P(A)P(B)

  1. A set of events \{A_n ,n \ge 1\} is said to be pairwise independent if for any i and j A_i and A_j are independent.

  2. A set of events \{A_n ,n \ge 1\} is said to be independent if for any set of indexes \{i_1, ..., i_n\} we have

P\left(\bigcap_{i=1}^n A_i\right) =\prod_{i=1}^n P(A_i)

Theorem (1.2.14)

Pairwise independence does not imply independence.

proof Consider the sample space S=\{1,2,3,4\} where all outcomes are equally likely. Define the events

A=\{1,2\}, B=\{1,3\} and C=\{1,4\}

then we have

P(A) = P(B) = P(C) = 1/2

and

P(A \cap B) = P(B \cap C) = P(A \cap C) = 1/4

so we have pairwise independence, but

1/8 = P(A)P(B)P(C) \ne P(A \cap B \cap C) = 1/4

Theorem (1.2.15)

Second Borel-Cantelli lemma

If A_1 , A_2 , .. are independent events such that \sum P(A_i ) = \infty then

P\left(\{A_n \text{i.o.}\}\right) =1

proof if an infinite number of the A_i’s occur, then by the same reasoning as in the first Borel-Cantelli lemma we have \limsup A_n occur, so we need to show that

P(\limsup A_n ) = 1

or equally

1-P(\limsup A_n ) = 0

Now

\begin{aligned} &1-P\left(\limsup A_i\right) = \\ &1-P\left(\{A_n \text{i.o.}\}\right)=\\ &P\left(\{A_n \text{i.o.}\}^c\right)=\\ &P \left( \left\{ \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i\right\}^c\right) = \\ &P \left( \bigcup_{n=1}^\infty \left\{\bigcup_{i=n}^\infty A_i\right\}^c\right) = \\ &P \left( \bigcup_{n=1}^\infty \bigcap_{i=n}^\infty A_i^c\right) = \\ &\lim_{n\rightarrow\infty} P \left( \bigcap_{i=n}^\infty A_i^c\right) = \\ &\lim_{n\rightarrow\infty} \prod_{i=n}^\infty P \left( A_i^c\right) = \\ &\lim_{n\rightarrow\infty} \prod_{i=n}^\infty \left[1-P(A_i)\right] \le \\ &\lim_{n\rightarrow\infty} \prod_{i=n}^\infty \left[1-P(A_i)+\frac{P(A_i)^2}{2!}-\frac{P(A_i)^3}{3!}\pm ...\right] = \\ &\lim_{n\rightarrow\infty} \prod_{i=n}^\infty \exp\{-P(A_i)\} =\\ &\lim_{n\rightarrow\infty} \exp\{-\sum_{i=n}^\infty P(A_i)\} =\\ &\lim_{n\rightarrow\infty} \exp\{-\infty\} =0\\ \end{aligned} because in a divergent sum the “tail” is always infinite.

Example (1.2.16)

consider the following game we start with an urn with one white ball and one black ball. Let

A_1 =\{\text{white ball drawn}\}

next we add a black ball and let A_2 again be the event \{\text{white ball drawn}\}. We continue on like that.

Now P(A_n)=1/(n+1). Clearly the A_i’s are independent and

\sum P(A_i) = \sum 1/(n+1) = \infty

and therefore

P(\{A_n \text{i.o.}\}) = 1

So no matter how many balls are already in the urn, at some point in the future we will again pick a white one.

Say in each round we add k black balls, then again we have

\sum P(A_i ) = 1/(1+1) + 1/(2+k) + 1/(2+2k) + .. =\sum 1/(2+nk) =\infty

and for any k>0, so the same conclusion holds!

Now say that in each round we doubled the number of black balls added, so in the first round it is 1, then 2, 4, 8 etc

\sum P(A_i ) = 1/(1+1) + 1/(2+2) + 1/(4+4) + .. = \sum 1/2^n < \infty

and so now there will be a time after which we never pick the white ball again.


If we take the two Borel Cantelli lemmas together we have the following: Let \{A_n \} be a sequence of independent events, then

P(\{A_n \text{i.o.}\}) = 0 \text{ or }1

This is an example of a so called 0-1 law, of which there are quite a few in probability theory. Here is one of the famous ones:

Definition (1.2.17)

Let \{A_n \} be an infinite sequence of events. An event B is called a tail event if knowing whether or not A_n occurred for each n determines B, but B is independent of any finite collection of A_n’s.

Example (1.2.18)

P(\{A_n \text{i.o.}\}) is a tail event.

Theorem (1.2.19)

Kolmogorov’s 0-1 law

Let B be a tail event of the sequence \{A_n \}. Then P(B) is either 0 or 1.