(A little bit of) Measure Theory

Basics

This course is taught on what one might call the Master’s degree level. There is however a deper one, say PhD level. The difference is that we will avoid any discussion of measure theory, except for today!

A measure in mathematics is a systematic way of assigning a number to a set which we should think of its size. Reasonable requirements for such a measure are that is should never be negative and that the size of two disjoint sets should be the sum of the individual sizes. If we would add the requirement that the size of the whole space be finite we would lready have the Kolmogorov axioms!

Why is a discussion of measure theory necessary? Recall that a random variable is defined as a function from the set of all subsets to the real line. Now from Cantor’s theory of transfinite sets we know that the set of subsets of a set is always larger than the set itself, that is there can not be a 1-1 transformation from one to the other. For example, if a set has n elements, the set of all subsets has \(2^n>n\) elements. If a set is countably infinite (say the natural numbers), then the set of all subsets are the real numbers, and uncountable.

Now what is the size of the set of all subsets of the real numbers? It has to be more than uncountable!

This leads to the following problem: there are just to many subsets of the real numbers. It is not possible to define a measure (aka probability) for each of them without getting contradictions.

The solution is (in principle) obvious: define a measure only on a collection of subsets. The hard part is find a collection that is small enough to work but large enough to be useful. One collection was proposed by Emile Borel: start with open intervals \((a,b)\), where \(-\infty< a < b< \infty\). Now take any countable union or intersection of such intervals. The result is called a \(\sigma\)-algebra or \(\sigma\)-field.

So a \(\sigma\)-algebra includes (for example), the interval \((0, \infty)=\cap_{i=1}^n (0,i)\) or \((0, 1)=\cap_{i=1}^n (0,1+1/i)\).

Notice that the cardinality of the Borel \(\sigma\)-algebra is that of the real line (why?).

This even works when one considers spaces other than the real numbers by replacing the concept of an open interval with the idea of an open set, essentially a set with no boundary points.

The resulting set is called a Borel \(\sigma\)-algebra, and it is possible to define a consistent measure on such a space. Moreover, it turns out to be large enough to yield an interesting theory!

Now we have the following definition: A probability space is a triple \((\Omega, \mathcal{F} , P)\), where \(\Omega\) is the set of outcomes, \(\mathcal{F}\) is a \(\sigma\)-filed on \(\Omega\) and P is the probabilities on sets in \(\mathcal{F}\).

Example

\(\Omega=\{1,2, .., N\}\), \(\mathcal{F}\) is the set of all subsets of \(\Omega\), and \(P(i)=p_i\) such that \(p_i\ge0\) and \(\sum p_i=1\).

Example

\(\Omega=[0,1]\), \(\mathcal{F}\) is Boral \(\sigma\)-field and \(P(\{(a,b)\})=b-a\).

This of course is the Lebesgue measure on [0,1].

Random Variables

Next we can introcude random variables as follows: say we have a probability space \((\Omega, \mathcal{F} , P)\). A random variable X is a function from \(\mathcal{F}\rightarrow \mathcal{R}\) suh that for any Borel set \(B\subset \mathcal{R}\) we have

\[X^{-1}(B) =\{\omega\subset\Omega: X(\omega)\subset B\} \subset \mathcal{F}\]

If X is a random variable on \((\Omega, \mathcal{F} , P)\), then X induces a probability measure on \(\mathcal{R}\) called a distribution by setting \(\mu(A)=P(X\in A)\) for any Borel set A.

Example

\(\Omega=[0,1]\), \(\mathcal{F}\) is Boral \(\sigma\)-field and \(P(\{(a,b)\})=b-a\). Define \(X(\omega)=\omega\), so

\[ P(X^{-1}(\{(a,b\})) = P(\{(a,b)\})=b-a \]

so \(X\sim U[0,1]\)!

Example

\(\Omega=[0,1]\), \(\mathcal{F}\) is Boral \(\sigma\)-field and \(P(\{(a,b)\})=b-a\). Define \(X(\omega)=-\log \omega\), so

\[ \begin{aligned} &P(X^{-1}(\{(a,b\})) = \\ &P(\{\omega\in [0,1]: a<X(\omega)<b\}) = \\ &P(\{\omega\in [0,1]: a<-\log \omega<b\}) = \\ &P(\{\omega\in [0,1]: -b<\log \omega<-a\}) = \\ &P(\{\omega\in [0,1]: e^{-b}< \omega<e^{-a} \}) = \\ &P(\{(e^{-b},e^{-a}) \}) = e^{-a}-e^{-b}\\ \end{aligned} \]

so \(X\sim Exp(1)\)!

Independence

We originally defined independence for two events A and B by \(P(A\cap B)=P(A)P(B)\). Then we extended this to independence of two random variables. We can now extend this once more to the independence of two \(\sigma\)-fields: they are independent if for any set \(A\in\mathcal{F}\), \(B\in\mathcal{G}\), A and B are independent.

Expectation

Say we have a probability space \((\Omega, \mathcal{F} , P)\) and a random variable X. Then the expected value of X is defined by

\(E[X]=\int X dP\)

What is \(\int X dP\)? To define this would take us to far into measure theory, suffice it to say that (in simple cases) it ends up to be like a sum if X is discrete and like a Riemann integral if X is continuous.