Definition
A random variable (r.v.) X is set-valued function from the sample space into \(\mathbb{R}\). For any set of real numbers A\(\subset \mathbb{R}\) we define the probability P(X\(\in\) A) = P(X-1(A)), where X-1(A) is the set of all points in S such that X maps the points into A.
Say we flip a fair coin three times. Let X be the number of “heads” in these three flips.
Now S=({H,H,H}, (H,H,T), .., (T,T,T)}.
X maps S into \(\mathbb{R}\), for example X((H,H,H))=3 and X((H,H,T))=2.
What is P(X=2)?
P(X=2) = P(X-1(2)) = P( all the outcomes in S that are mapped onto 2 ) = P({(H,H,T), (H,T,H), (T,H,H)} = 3/8
There are some technical difficulties when defining a r.v. on a sample space like \(\mathbb{R}\), it turns out to be impossible to define it for every subset of \(\mathbb{R}\) without getting logical contradictions. In some (mathematically precise) sense there are too many subsets of \(\mathbb{R}\). The solution is to define a σ-algebra on the sample space and then define X only on that σ-algebra. We will ignore these technical difficulties.
There are two basic types of r.v.’s:
Definition
If X takes countably many values, X is called a discrete r.v.
If X takes uncountably many values, X is called a continuous r.v.
Almost everything to do with r.v.’s has to be done twice, once for discrete and once for continuous r.v.’s. This separation is only artificial, it goes away once a more general definition of “integral” is used (Riemann-Stilties or Lebesgue)
Definition
The (cumulative) distribution function of a r.v. X (or cdf) is defined by F(x) = P(X≤x) \(\forall\) x \(\in\) \(\mathbb{R}\)
Say we roll a fair die until the first “Six” comes up. Let X be the number of rolls needed.
First note X\(\in\) {1,2,…}
let Ai be the event “a six on the ith roll”, i=1,2,3, …. Then
so for k≤x<k+1 we have F(x)=1-(5/6)k
Proposition
Let F be the cdf of a rv X.
Then
0≤F(x)≤1 \(\forall\) x\(\in \mathbb{R}\)
F is non-decreasing
F is right-continuous
F(x)→0 as x→-∞ and F(x)→1 as x→∞
Theorem
Say F is a function with properties 1-4. Then there exists a random variable X which has F as its distribution function.
proof VERY DEEP, based on Kolmogorov’s extension theorem
Definition
\(f(x) = P(X=x)\) \(\forall x \in \mathbb{R}\)
\(f(x) = F'(x)\)
where the derivative exists
Say we roll a fair die until the first “Six” comes up. Let X be the number of rolls needed. Then the density of X is given by
f(x) = P(X=x) = 1/6*(5/6)x-1 if x\(\in\) {1,2,..}, 0 otherwise.
Note that it follows from the definition and the axioms that for any density f we have
f(x) = P(X=x) = 0 if x\(\not\in\) {1, 2, ..}
f(x) = P(X=x) = 1/6*(5/6)x-1 \(\in\) [0,1] if x\(\not\in\) {1, 2, ..}
Let X be a continuous rv, then again it follows from the definition and the axioms that for any pdf f we have
Let X be a continuous rv with density f(x)=λexp(-λx) if x>0, 0 otherwise, where λ>0
clearly f(x)≥0 for all x.
This r.v. X is called an exponential r.v. with rate λ. We use the notation X~Exp(λ)
Let X be a continuous rv with density f(x)=cxλ if x>1, 0 otherwise, where c is a constant (which depends on λ). For which values of λ is this a density, and what is c?
so it is is a density if λ<-1, and then c=-(1+ λ)
A random vector is a multi-dimensional random variable.
we roll a fair die twice. Let X be the sum of the rolls and let Y be the absolute difference between the two roles. Then (X,Y) is a 2-dimensional random vector. The joint density of (X,Y) is given by:
X.Y | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
2 | 1 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 2 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 2 | 0 | 0 | 0 |
5 | 0 | 2 | 0 | 2 | 0 | 0 |
6 | 1 | 0 | 2 | 0 | 2 | 0 |
7 | 0 | 2 | 0 | 2 | 0 | 2 |
8 | 1 | 0 | 2 | 0 | 2 | 0 |
9 | 0 | 2 | 0 | 2 | 0 | 0 |
10 | 1 | 0 | 2 | 0 | 0 | 0 |
11 | 0 | 2 | 0 | 0 | 0 | 0 |
12 | 1 | 0 | 0 | 0 | 0 | 0 |
where every number is divided by 36.
all definitions are straightforward extensions of the one-dimensional case.
for a discrete random vector we have the density
f(x,y) = P(X=x,Y=y)
f(4,0) = P(X=4, Y=0) = P({(2,2)}) = 1/36
or
f(7,1) = P(X=7,Y=1) = P({(3,4),(4,3)}) = 1/18
Definition
Say (X,Y) is a discrete (continuous) r.v. with joint density (pdf) f. Then the marginal density (pdf) fX is given by
For the example above we find fX(2) = f(2,0) + f(2,1) + .. + f(2,5) = 1/36 or fY(3) = 6/36
Let (X,Y) be a continuous random vector with joint pdf f(x,y) = cx, 0≤x<y≤1, 0 otherwise. Now
Next we find the marginals of f:
Note that the marginals are proper densities, for example fX(x)≥0 and
Definition
Let (X,Y) be a r.v. with joint density (pdf) f(x,y). For any x such that fX(x)>0 we can define the conditional rv Y|X=x.
If (X,Y) is discrete we find
P(Y=y|X=x) = P(Y=y,X=x)/P(X=x)
so the density of Y|X=x, denoted by fY|X=x(y|x) is given by
If (X,Y) are continuous the derivation is a little harder but the same formula holds.
say (X,Y) is a discrete rv with f(x,y)=cα-xy where α>1, x,y=1,2,3.. Find the conditional density’s and check that they are proper density’s
but this is the density of a geometric rv with p=α-x. Moreover the pdf f(x,y) is symmetric in x and y, so the same calculation gives
fX|Y=y(x|y)
Note that in this calculation we did not have to find the constant c! At least not if we don’t need to know the marginal of X.
for the continuous rv. above find fX|Y=y(x|y) and fY|X=x(y|x)
Conditional density’s and pdf’s are again proper density’s and pdf’s, for example
fX|Y=y(x|y) = 2x/y2 ≥0 for 0<x<y and
Note that a conditional density (pdf) requires a specification for a value of the random variable on which we condition, something like fX|Y=y for a fixed y. An expression like fX|Y is not defined!
say the rv (X,Y) has joint density f(x,y)=c if 0<x<yp<1, 0 otherwise, for some p>0. Find the conditional density of X|Y=y
Definition
Two r.v. X and Y are said to be independent iff
fX,Y(x,y)=fX(x)*fY(y) for all x,y
We write this here as a definition but it it actually follows straightforward from the definition of independence of events.
in the discrete example above we found f(7,1) = 1/18 but fX(7)*fY(1) = 1/6*10/36=5/108, so X and Y are not independent
For the continuous example we have
6x = f(x,y) ≠ 6x(1-x)×3y2 = fX(x)*fY(y)
so X and Y are not independent
Mostly the concept of independence is used in reverse: we assume X and Y are independent (based on good reason!) and then make use of the formula:
Say it is known that the lifetime of a light bulb has an exponential distribution with rate λ=1/500 days. We have 2 such light bulbs in a lamp. What is the probability that at least one flashbulb is still burning after 3 years?
Let the rv (X,Y) be the lifetimes of the 2 light bulbs. Then if the lifetimes are independent the joint density is given by
Now
By the way, do you think the assumption of independence is actually justified here?
Notation: we will use the notation X \(\perp\) Y if X and Y are independent