Processing math: 13%

Functions of a Random Variable - Transformations

Example (1.11.1)

say XU[0,1] and λ>0. We want to find the pdf of the random variable Y=λlog(X)

Let’s first find the cdf and then the pdf as follows:

FY(y)=P(Yy)=P(λlogXy)=P(logX>y/λ)=P(X>ey/λ)=1P(Xey/λ)=1FX(ey/λ)=1ey/λfY(y)=dFY(y)dy=ddy[1ey/λ]=ey/λ/λ

if y>0. For y<0 note that P(logX<y)=0 because 0<X<1, so logX<0, so logX>0 always.


This is an example of a function (or transformation) of a random variable. These transformations play a major role in probability and statistics. Actually, we already had a simple example before: if XU[0,1] and A0, then Y=AX+BU[A,B].

If X is some random vector and Y=f(X), then usually the question is what is the density of Y. We will see how to find the density by working through a number of typical examples.

Example (1.11.1a)

Say X is a random variable with P(X=k)=12N+1;k{N,N+1,...,0,...,N} for some N1.

  1. Let 0MN and Y=I{M,...,M}(X). Now Y{0,1} and

P(Y=1)=P(MXM)=2M+12N+1P(Y=0)=1P(Y=1)=12M+12N+1=2(NM)2N+1

  1. Let Y=|X|. Now Y{0,1,2,...,N} and

P(Y=0)=P(X=0)=12N+1P(Y=k)=P(X=k or X=k)=22N+1

  1. Let Y=X2. Now Y{0,1,4,...,N2} and

P(Y=0)=P(X=0)=12N+1P(Y=k2)=P(X=k or X=k)=22N+1

Example (1.11.2)

say X is the number of roles of a fair die until the first six. We have already seen that P(X=x)=1/6(5/6)x1,x=1,2,... Let Y be 1 if X is even and 0 if X is odd.

Note: here both X and Y are discrete.

let’s do this a little more general, with p instead of 1/6. Also let q=1p. Then

P(Y=0)=P(X{1,3,5,..})=k=0pq(2k+1)1=pk=0q2k=pk=0(q2)k=p11q2=p1(1+q)(1q)=11+q so P(Y=0)=1/(1+5/6)=6/11 and P(Y=1)=1P(Y=0)=5/11.

Example (1.11.3)

say X is a random variable with P(X=x)=p(1p)x1,x=1,2,... Let Y = X \mod r, that is Y is the remainder on division by r. For example 5\mod 2=1 and 10 \mod 5=0.

Note: here both X and Y are discrete.

First note that Y\in \{0,.., r-1 \}. Note that x\mod r = k means that x=nr+k for some n. Let k\in \{1,.., r-1 \}, then

\begin{aligned} &P(Y=k) =P(X\mod r = k) =\\ &P(X=nr+k; n=0, 1, ... ) = \\ &\sum_{n=0}^\infty pq^{nr+k-1} = \\ &p\sum_{n=0}^\infty (q^r)^n q^{k-1} = \\ &pq^{k-1}\sum_{n=0}^\infty (q^r)^n = \frac{pq^{k-1}}{1-q^r} \end{aligned}

The case k=0 is a bit different because X=0 is not possible:

\begin{aligned} &P(Y=0) =P(X\mod r = 0) =\\ &P(X=nr; n=1, 2,... ) = \\ &\sum_{n=1}^\infty pq^{nr-1} = \\ &\frac{p}q \sum_{n=1}^\infty q^{nr} = \\ &\frac{p}q\left(\frac{q^r}{1-q^r}\right) = \\ &\frac{pq^{r-1}}{1-q^r} \\ \end{aligned}

Let’s make sure this is a proper density:

\begin{aligned} &\sum_{k=0}^{r-1} f_Y(k) = \\ &\frac{pq^{r-1}}{1-q^r} + \sum_{k=1}^{r-1} \frac{pq^{k-1}}{1-q^r} = \\ &\frac{p}{1-q^r}\left(q^{r-1}+ \sum_{k=1}^{r-1} q^{k-1} \right) = \\ &\frac{p}{1-q^r}\left(q^{r-1}+ \sum_{k=0}^{r-2} q^{k} \right) = \\ &\frac{p}{1-q^r}\left(q^{r-1}+ \frac{1-q^{r-1}}{1-q} \right) = \\ &\frac{p}{1-q^r}\left(\frac{(1-q)q^{r-1}+1-q^{r-1}}{1-q} \right) = \\ &\frac{p}{1-q^r}\left(\frac{q^{r-1}-q^{r}+1-q^{r-1}}{1-q} \right) =1 \\ \end{aligned} Good!

Example (1.11.4)

say X is a continuous r.v with pdf

f_X (x) = \frac12 \exp(-|x|) \text{, } x \in \mathbb{R}. Let Y=I_{[-1,1]}(X).

Note: here X is continuous and Y is discrete.

\begin{aligned} &P(Y=1) = P(-1\le X\le 1)=\\ & \int_{-1}^1 \frac12 \exp(-|x|)dx = \\ & 2\int_{0}^1 \frac12 \exp(-x)dx = \\ &-\exp(-x)\vert_0^1 = 1-e^{-1}\\ &P(Y=0)=1-P(Y=1)=e^{-1} \end{aligned}


In all the examples so far Y was discrete, and so we could find the density f_Y(x)=P(Y=x) directly. Now we turn to cases where Y is continuous, and it turns out that we usually first have to find the cdf F, and then the density f:

Example (1.11.5)

Let X\sim U[0,1] and Y=X^n for n=2,3,....

Clearly F_Y(y)=0 for y<0 and F_Y(y)=1 if y>1. So let 0<y<1. Then

\begin{aligned} &F_Y(y) =P(Y<y) = P(X^n<y) =\\ &P(X< y^{1/n}) =y^{1/n} \\ &f_Y(y) = \frac{d}{dy}\left[y^{1/n}\right] = \frac1n y^{1/n-1} \\ \end{aligned}

Example (1.11.6)

Let X have pdf f_X (x) = \frac12 \exp(-|x|). Let Y =X^2. Then for y<0 we have P(Y \le y) = 0. So let y>0. Then

\begin{aligned} &F_Y(y) =P(Y\le y)= \\ &P(X^2\le y) = \\ &P(-\sqrt{y}\le X\le \sqrt{y}) = \\ &\int_{-\sqrt y}^{\sqrt y}\frac12 \exp(-|x|)dx = \\ &\int_{0}^{\sqrt y} \exp(-x)dx = \\ &-\exp({-x})|_0^{\sqrt y}=1-e^{-\sqrt y}\\ &\\ &f_Y(y)=-e^{-\sqrt y}\frac{-1}{2\sqrt y}=\frac{1}{2\sqrt y}e^{-\sqrt y};y>0 \end{aligned}


The last two examples were easy because solving the inequality f(x)\le y for x was easy. Let’s now do an example were that is not the case:

Example (1.11.7)

Let X \sim U[0,2], and let Y=\sin(2 \pi X).

First of course we always have -1 \le \sin(x) \le 1 and therefore F_Y (y)=0 if y<-1 and F_Y(y)=1 if y>1.

Now if -1<y<1 we have

P(Y \le y)=P(\sin(2 \pi X) \le y)

and so the hard part is solving the inequality

\sin(2 \pi X) \le y

The points were we have \sin(2 \pi X) = y are of course y=\arcsin(y)/2/\pi. Let a=\arcsin(y)/2/\pi and note that \arcsin(-y)=-\arcsin(y).

It is often a good idea to draw a simple example. Let’s consider the case y=-0.3, then \arcsin(-0.3)/2/\pi = -0.0485 and this looks like this

y=-0.3 is the thin horizontal red line, and the values of x with f(x)<y are the two “dips” between about 0.5 and 1 and then between 1.5 and 2 or so. More precisely we find

\begin{aligned} &P(Y\le y) =P(\sin(2\pi X)\le y)= \\ &P(1/2-a<X<1+a \bigcup 3/2-a<X<2+a) = \\ &P(1/2-a<X<1+a)+P(3/2-a<X<2+a) = \\ &\left[1+a-(1/2-a)\right]/2+\left[(2+a)-(3/2-a)\right]/2=\\ &\left[1/2+2a+2a+1/2\right]/2=2a+1/2 \end{aligned}

Now 0<y<1, f(x)<y happens for three intervals:

and so

\begin{aligned} &P(Y\le y) =P(\sin(2\pi X)\le y)= \\ &P(0<X<a \bigcup 1/2-a<X<1+a \bigcup 3/2-a<X<2) = \\ &P(0<X<a)+P(1/2-a<X<1+a)+P(3/2-a<X<2) = \\ &a/2+\left[1+a-(1/2-a)\right]/2+\left[2-(3/2-a)\right]/2=\\ &a/2+(1/2+2a)/2+(1/2+a)/2=2a+1/2 \end{aligned}

and that is the same as before! So now

f_Y(y) =F_Y'(y) = \frac{d}{dy} \left\{\arcsin(y)/\pi+\frac12\right\} = \frac1{\pi \sqrt{1-y^2}}

if |y|<1

Notice that

\begin{aligned} &\lim_{y \rightarrow -\infty}F_Y(y) = \lim_{y \rightarrow -\infty} \left[ \arcsin(y)/\pi+1/2 \right] =0\\ &\lim_{y \rightarrow \infty}F_Y(y) = \lim_{y \rightarrow \infty} \left[ \arcsin(y)/\pi+1/2 \right] = 1 \end{aligned}

and so this is a proper density!


Here is a transformation that is quite useful:

Lemma

Let X be a continuous rv with cdf F, and F is strictly increasing. Then Y=F(X) \sim U[0,1].

proof

If X is a continuous rv X with a cdf F which is strictly increasing, then F^{-1} exists on this set, and so

F_{Y}(x) = P(F(X) \le x) = P(X \le F^{-1}(x)) =F(F^{-1}(x)) = x

for 0<x<1, but this is the cdf of a uniform [0,1].

And this can be generalized even a bit more:

Theorem (1.11.8)

Probability Integral Transform

Let X be a continuous rv with cdf F. Then F(X) \sim U[0,1]

proof let F be the cdf and define the generalized inverse function F^{*} by

F^*(x) = \min\{t : F(t) \ge x \}

First note that if F is strictly increasing we have F^{*}=F^{-1}.

Moreover we always have F(F^{*}(x))=x. This is easiest to see with a graph:

take for example x=0.5, then F(0.5)=0.5 and F^{*}(0.5)=0.3. Finally F(F^{*}(0.5))=F(0.3)=0.5

So now

F_{F(X)}(x) = P(F(X) \le x) = P(X \le F^{*}(x)) =F(F^{*}(x)) = x

Notice that unlike a regular inverse we don’t necessarily have F^*(F(x))=x, but for our proof we don’t need that!

The “reverse” of this theorem is useful for simulating data from some distribution F as long as F has an inverse:

Example (1.11.8a)

Say we want to generate data from a random variable X with density f(x)=ax^{a-1};0<x<1 and a>0. Now

F(x)=\int_0^x at^{a-1}dt = t^a|_0^x=x^a Let U\sim U[0,1], then

F_{F^{-1}(U)}(x)=P(F^{-1}(U)\le x)=P(U\le F(x))=F(x) and so F^{-1}(U)\sim F!

Now

\begin{aligned} &y=x^a\rightarrow x=y^{1/a} \\ &F^{-1}(x) = x^{1/a} \end{aligned} and

n=1e4
a=3
x=runif(n)^(1/a)
hist(x, 100, freq=FALSE,main="")
curve(3*x^2, 0, 1, add=TRUE, col="blue",lwd=2)

Say we want to generate data from the density f(x)=6x(1-x);0<x<1:

\begin{aligned} &F(x) =\int_0^x 6t(1-t) dt = \\ &3t^2-2t^3|_0^x = 3x^2-2x^3\\ &y=3x^2-2x^3 \\ &2x^3-3x^2+y=0 \end{aligned} Now we need to solve a cubic equation. In general such an equation can have up to three real solutions, but in our case there is always only one:

par(mfrow=c(2,2))
curve(2*x^3-3*x^2+0.1,0,1,lwd=2);abline(h=0)
curve(2*x^3-3*x^2+0.33,0,1,lwd=2);abline(h=0)
curve(2*x^3-3*x^2+0.66,0,1,lwd=2);abline(h=0)
curve(2*x^3-3*x^2+0.9,0,1,lwd=2);abline(h=0)

and the easiest way to find the root is using R:

n=1e4
x=rep(0, n)
for(i in 1:n) {
   tmp=Re(polyroot(c(runif(1), 0, -3, 2)))
   x[i]=tmp[tmp>0&tmp<1]
}   
hist(x, 100, freq=FALSE,main="")
curve(6*x*(1-x), 0, 1, add=TRUE, col="blue",lwd=2)


Functions of Random Vectors:

Example (1.11.9)

Say (X,Y) is a discrete rv with joint pdf f_{X,Y}(x,y) given here:

1 2
1 1/10 1/10
2 1/10 1/2
3 1/10 1/10

Say U=2X-Y.

X takes values 1,2,3, so 2X takes values 2,4,6, so 2X-Y takes values 0, 1, 2, 3, 4 and 5. Now

\begin{aligned} &f_U(0)=P(U=0) =P(X=1, Y=2) = 1/10 \\ &f_U(1)=P(U=1) =P(X=1, Y=1) = 1/10 \\ &f_U(2)=P(U=2) =P(X=2, Y=2) = 1/2 \\ &f_U(3)=P(U=3) =P(X=2, Y=1) = 1/10 \\ &f_U(4)=P(U=4) =P(X=3, Y=2) = 1/10 \\ &f_U(5)=P(U=5) =P(X=3, Y=1) = 1/10 \\ \end{aligned}

Example (1.11.9a)

We roll two fair dice. Let X be the blue and Y the red die. Let S=X+Y. Now we have previously found

P(S=k)=\frac{k-1}{36};k=2,...,7\\P(S=k)=\frac{13-k}{36};k=8,...,12\\

Example (1.11.10)

Say (X,Y) is a discrete rv with joint pdf f_{X,Y}(x,y)=(1-p)^2p^x; x,y \in \{0,1,..\}, y \le x, 0<p<1. Let U=I(X=Y).

\begin{aligned} &P(U=1) =P(X=Y) = \\ &\sum_{x=0}^\infty (1-p)^2p^x = \\ &(1-p)^2\frac1{1-p} =1-p \\ &P(U=0)=1-P(U=1)=p \end{aligned}

Example (1.11.11)

Say (X,Y) is a discrete rv with joint pdf f_{X,Y}(x,y)=(1-p)^2p^x; x,y \in \{0,1,..\}, y \le x, and 0<p<1. Let U=X and V=X-Y.

First what are the possible values of (U,V)? We have u= x \in \{0,1,..\} and y \le x or 0 \le x-y=v and so v \in \{0,1,..\}.

Also v=x-y=u-y \le u because y \ge 0.

Now for any (u,v) \in \{0,1,..\} with v \le u we have

\begin{aligned} &f_{U,V}(u,v) = \\ &P(U=u,V=v) = \\ &P(X=u,X-Y=v) = \\ &P(X=u,u-Y=v) = \\ &P(X=u,Y=u-v) = \\ &(1-p)^2p^u \end{aligned}

So we see that f_{U,V}(u,v)=f_{X,Y}(u,v), or (X,Y) has the same distribution as (U,V)!


Next we turn to continuous random vectors:

Before we go on let’s generalize the first example above, where we had X \sim U[0,1], \lambda>0 and Y=-\lambda \log(X). Let’s say we have Y=g(X), where g is strictly increasing. Then g^{-1} exists and is also strictly increasing. Therefore

\begin{aligned} &F_Y(y) =P(Y\le y) = P(g(X)\le y) = \\ &P(X< g^{-1}(y)) = \\ &\int_{-\infty}^{g^{-1}(y)} f_X(x)dx = \\ &\\ &f_Y(y) = \frac{d}{dy} \left[\int_{-\infty}^{g^{-1}(y)} f_X(x)dx\right] =\\ &f_X(g^{-1}(y)) \frac{d}{dy}g^{-1}(y) \end{aligned}

Example (1.11.12)

Say X has density f_X(x) = 3x^2;0<x<1 and Y=\sqrt x, so g(x)=\sqrt x, g^{-1}(y)=y^2 and \frac{d}{dy}g^{-1}(y)=2y. Therefore

f_Y(y) = 3(y^2)^2\times 2y=6y^5;0<y<1

In one dimension this is rarely useful because solving the inequality g(x)<y is usually done directly, but it does become useful in higher dimensions:

Example (1.11.13)

say (X,Y) is a bivariate standard normal r.v, that is it has joint density given by

f(x,y)=\frac1{2\pi}\exp\left\{-\frac12(x^2+y^2) \right\}

for (x,y) \in \mathbb{R}^2

Let the r.v. (U,V) be defined by U=X+Y and V=X-Y.

To start let’s define the functions g_1(x,y) = x+y and g_2(x,y) = x-y, so that

U=g_1(X,Y)\text{ and }V = g_2(X,Y)

For what values of u and v is f_{U,V}(u,v) positive? Well, for any values for which the system of 2 linear equations in two unknowns u=x+y and v=x-y has a solution. These solutions are \begin{aligned} &x = h_1(u,v) = (u + v)/2\\ &y = h_2(u,v) = (u - v)/2 \end{aligned}

From this we find that for any (u,v) \in \mathbb{R}^2 there is a unique (x,y) \in \mathbb{R}^2 such that u=x+y and v=x-y. So the transformation (x,y) \rightarrow (u,v) is one-to-one and therefore has a Jacobian given by

J =\left| \begin{array}{cc} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{array} \right| = \left| \begin{array}{cc} \frac12 & \frac12 \\ \frac12 & -\frac12 \end{array} \right| = -\frac12

Now from multivariable calculus we have the change of variable formula:

f_{U,V}(u,v)=f_{X,Y}(h_1(x,y),h_2(x,y))\lvert J\rvert

and so

\begin{aligned} &f_{U,V}(u,v) = \\ &f_{X,Y}(h_1(x,y),h_2(x,y))\vert J \vert = \\ &\frac1{2\pi}\exp\left\{-\frac12\left([\frac{u+v}2]^2+[\frac{u-v}2]^2\right) \right\}\vert -\frac12\vert = \\ &\frac1{2\pi}\exp\left\{-\frac12\left(\frac{u^2+2uv+v^2}4+\frac{u^2-2uv+v^2}4\right) \right\}\frac12 = \\ &\frac1{4\pi}\exp\left\{-\frac{u^2+v^2}4 \right\} \end{aligned}

Note that the density factors into a function of u and a function of v. As we saw before this means that U and V are independent.

Example (1.11.14)

Say X_1 , .., X_n are iid U[0,1]. Let Y_1 =X_1, Y_2 =X_2-X_1 ,.., Y_n=X_n-X_{n-1}. Then

\begin{aligned} &x_1 = y_1\\ &x_2 = y_1+y_2\\ &\vdots \\ &x_n=y_{n-1}+y_n\\ &\\ &J=\begin{vmatrix} 1 & 0 & 0 & ... &0\\ 1 & 1 & 0 & ... &0\\ 0 & 1 & 1 & ... &0\\ \vdots & & & &\vdots\\ 0 & 0 &...& 1&1 \end{vmatrix}=1\\ &\\ &f_{Y_1,..,Y_n}(y_1,...,y_n) = 1\times 1=1 \end{aligned}

so the density is a constant, and therefore (Y_1, .., Y_n) is uniform. But careful, uniform on what set?

y_2~=x_2-x_1, 0 \le x_i \le 1, therefore -1 \le y_2 \le 1.

We have

0 \le y_1 \le 1
-y_k-1 \le y_k \le 1-y_k-1; k=2,..,n

For n=2 the set is shown here:

Example (1.11.15)

A rv X is called a normal (or Gaussian) rv with mean \mu and standard deviation \sigma if it had density

f(x)=\frac1{\sqrt{2\pi \sigma^2}}\exp\left\{- \frac{(x-\mu)^2}{2\sigma^2}\right\}

a special case is a standard normal rv, which has \mu=0 and \sigma=1.

Say X and Y are independent standard normal rv’s. Let Z = X + Y. Find the pdf of Z.

Note: now we have a transformation from \mathbb{R}^2 \rightarrow \mathbb{R}.

Z = X + Y = U in the example above, so the pdf of Z is just the marginal of U and we find

\begin{aligned} &f_Z(z) = \int_{-\infty}^{\infty} \frac1{4\pi}\exp\left\{-\frac{z^2+v^2}4 \right\} dv = \\ &\int_{-\infty}^{\infty} \frac1{4\pi}\exp\left\{-\frac{z^2}4\right\}\exp\left\{-\frac{v^2}4 \right\} dv = \\ &\frac1{\sqrt{2\pi\times 2}}\exp\left\{-\frac{z^2}{2\times 2}\right\}\int_{-\infty}^{\infty} \frac1{\sqrt{2\pi\times 2}}\exp\left\{-\frac{v^2}{2\times 2}\right\} dv = \\ &\frac1{\sqrt{2\pi\times 2}}\exp\left\{-\frac{z^2}{2\times 2}\right\} \end{aligned}

and we see that Z has a normal distribution with \mu=0 and \sigma=\sqrt2.


Say X and Y are two continuous independent r.v with pdf’s f_X and f_Y, and let Z = X+Y. If we repeat the above calculations we can show that in general the pdf of Z is given by

f_Z(z)=\int_{-\infty}^\infty f_X(t)f_Y(z-t)dt

This is called the convolution formula.

There is a second method for deriving the convolution formula which is useful. It uses the law of total probability:

In the setup from above we have

\begin{aligned} &F_{X+Y}(z) = P(X+Y\le z) =\\ &\int_{-\infty}^\infty P(X+Y\le z|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty P(X\le z-y|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty F_{X|Y=y}(z-y|y)f_Y(y)dy \\ &\\ &f_Z(z) = \frac{d}{dz} F_Z(z) =\\ &\frac{d}{dz} \int_{-\infty}^\infty F_{X|Y=y}(z-y|y)f_Y(y)dy =\\ &\int_{-\infty}^\infty \frac{d}{dz} F_{X|Y=y}(z-y|y)f_Y(y)dy = \\ &\int_{-\infty}^\infty f_{X|Y=y}(z-y|y)f_Y(y)dy = \\ &\int_{-\infty}^\infty f_{X}(z-y)f_Y(y)dy \\ \end{aligned}

and here we used the independence only at the very end. Therefore the formula up to the last line holds in general.

The tricky part of this is the interchange of the derivative and the integral. Working with densities and cdfs usually means they are ok.

Example (1.11.16)

say X and Y are independent exponential rv’s with rate \lambda. Find the pdf of Z=X+Y.

\begin{aligned} &f_Z(z)=\int_{-\infty}^\infty f_X(t)f_Y(z-t)dt =\\ &\int_0^z \lambda\exp(-\lambda t)\lambda\exp(-\lambda (z-t))dt = \\ &\lambda^2\exp(-\lambda z)\int_0^z dt = \\ &\lambda^2z\exp(-\lambda z) \end{aligned} for z>0.

Example (1.11.16a)

Let X,Y\sim U[0,1], X\perp Y, and let Z=X+Y. Now

\begin{aligned} &f_Z(z)=\int_{-\infty}^\infty f_X(t)f_Y(z-t)dt =\\ &\int_{-\infty}^\infty I_{[0,1]}(t)I_{[0,1]}(z-t)dt \end{aligned}

Now 0<z-t<1,0<t<1 implies \max\{0,z-1\}<t<z, so

\begin{aligned} &0<z<1: \\ &f_Z(z)= \int_0^z dz = z\\ &1<z<2: \\ &f_Z(z)= \int_{z-1}^1 dz = 1-(z-1)=2-z\\ \end{aligned}

n=1e4
z=runif(n)+runif(n)
hist(z, 100, freq=FALSE,main="")
curve(ifelse(x<1,x,2-x), 0, 2, add=TRUE, col="blue", lwd=2)


One nice feature of the second derivation of the convolution formula is that it often works for things other than sums:

Example (1.11.17)

say Y is an exponential rv with rate 1 and X|Y=y\sim U[0,y]. Find the pdf of Z=Y/X.

We have f_Y(y)=e^{-y}, y>0 and f_{X|Y=y}(x|y)=\frac1yI_{[0, y]}(x), and so

\begin{aligned} &F_{X/Y}(z) = P(X/Y\le z) =\\ &\int_{-\infty}^\infty P(X/Y\le z|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty P(X\le zy|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty F_{X|Y=y}(zy|y)f_Y(y)dy \\ &\\ &f_Z(z) = \frac{d}{dz} F_Z(z) =\\ &\frac{d}{dz} \int_{-\infty}^\infty F_{X|Y=y}(zy|y)f_Y(y)dy =\\ &\int_{-\infty}^\infty \frac{d}{dz} F_{X|Y=y}(zy|y)f_Y(y)dy = \\ &\int_{-\infty}^\infty f_{X|Y=y}(zy|y)yf_Y(y)dy = \\ &\int_0^\infty \frac1yI_{[0, y]}(zy) y e^{-y}dy =\\ &\int_0^\infty e^{-y} dy =1 \end{aligned}

For what z is this true? Notice the indicator function \frac1{y}I_{[0, y]}(zy), which is 1 if 0<zy<y or 0<z<1, and 0 otherwise. So f_{Y/X}=I_{[0,1]}(x) and so Y/X=x\sim U[0,1].


Finally we study a different kind of transformation:

Example (1.11.18)

Say X_1, .., X_n are iid U[0,1]. Let M=\max\{X_1 , .., X_n\}. We want to find E[M] and var(M).

First we need to find the density of M:

\begin{aligned} &F_M(x) = P(\max\{X_1 , .., X_n\}\le x)=\\ &P(X_1\le x,..,X_n\le x) =\\ &P(X_1\le x)\times ... \times P(X_n\le x) = \\ &\left[P(X_1\le x)\right]^n =x^n \\ &\\ &f_M(x)=nx^{n-1};0<x<1\\ &\\ &E[M^k]=\int_0^1 x^knx^{n-1}dx=\frac1{n+k}x^{n+k}|_0^1=\frac1{n+k}\\ &E[M]=\frac1{n+1}\\ &var(M)=\frac1{n+2}-(\frac1{n+1})^2 \end{aligned}


This is a special case of what are called order statistics. Many statistical methods, for example the median and the range, are based on an ordered data set.

One of the difficulties when dealing with order statistics are ties, that is the same observation appearing more than once. This should only occur for discrete data because for continuous data the probability of a tie is zero. They may happen anyway because of rounding, but we will ignore them in what follows.

Definition (1.11.19)

Say X_1, ..., X_n are iid with density f. Then X_{(i)} is the i^{th} order statistics if X_{(1)}<X_{(2)}<..<X_{(n)}.

Note X_{(1)} = \min \{X_i\} and X_{(n)} = \max \{X_i\}.

Theorem (1.11.20)

If X_1,..,X_n are continuous random variables we have

f_{X_{(i)}}(x)=\frac{n!}{(i-1)!(n-i)!}F(x)^{i-1}(1-F(x))^{n-i}f(x)

proof

Let Y be a r.v. that counts the number of X_j \le x for some fixed number x. We will see shortly that

P(Y=j)= {n\choose j}F(x)^j(1-F(x))^{n-j}

Note also that the event \{Y \ge i\} means that i or more observations are less or equal to x, so the i^{th} largest is less or equal to x. Therefore

F_{X_{(i)}}(x) =P(X_{(i)}\le x) = P(Y\ge i)=\sum_{k=i}^n {n\choose k}F(x)^k(1-F(x))^{n-k}

and so

\begin{aligned} &f_{X_{(i)}}(x) = \frac{d}{dx} F_{X_{(i)}}(x) = \\ &\frac{d}{dx} \sum_{k=i}^n {n\choose k}F(x)^k(1-F(x))^{n-k} = \\ & \sum_{k=i}^n {n\choose k} \frac{d}{dx}\left[F(x)^k(1-F(x))^{n-k}\right] = \\ & \sum_{k=i}^n {n\choose k}\left[kF(x)^{k-1}f(x)(1-F(x))^{n-k}+F(x)^k(n-k)(1-F(x))^{n-k-1}(-f(x))\right] = \\ &f(x) \sum_{k=i}^n {n\choose k}\left[kF(x)^{k-1}(1-F(x))^{n-k}-F(x)^k(n-k)(1-F(x))^{n-k-1}\right] \end{aligned} Let’s simplify the notation a bit by writing t=F(x), then

\begin{aligned} &\sum_{k=i}^n {n\choose k}\left[kt^{k-1}(1-t)^{n-k}-t^k(n-k)(1-t)^{n-k-1}\right] = \\ &\sum_{k=i}^n {n\choose k}kt^{k-1}(1-t)^{n-k}- \sum_{k=i}^n {n\choose k} t^k(n-k)(1-t)^{n-k-1} = \\ &\sum_{k=i}^n {n\choose k}kt^{k-1}(1-t)^{n-k}- \sum_{k=i}^{n-1} {n\choose k}(n-k) t^k(1-t)^{n-k-1} = \\ &\\ &{n\choose i}it^{i-1}(1-t)^{n-i}+\\ &\sum_{k=i+1}^n {n\choose k}kt^{k-1}(1-t)^{n-k}- \sum_{k=i}^{n-1} {n\choose k}(n-k) t^k(1-t)^{n-k-1} = \\ &\\ &\frac{n!}{(n-i)!i!}it^{i-1}(1-t)^{n-i}+\\ &\sum_{l=i}^{n-1} {n\choose l+1}(l+1)t^{l}(1-t)^{n-l-1}- \sum_{k=i}^{n-1} {n\choose k}(n-k) t^k(1-t)^{n-k-1} \end{aligned} where we change the summation index. Now notice that both sums go from i to n-1 and have terms of the form t^k(1-t)^{n-k-1}. Moreover

\begin{aligned} &{n\choose l+1}(l+1) =\frac{n!}{(n-l-1)!(l+1)!}(l+1) =\frac{n!}{l!(n-l-1)!}\\ &{n\choose k}(n-k) =\frac{n!}{(n-k)!k!}(n-k) =\frac{n!}{k!(n-k-1)!} \end{aligned} so the two sums are the same and cancel each other out!

Finally replacing t yields the result:

f_{X_{(i)}}(x) = \frac{n!}{(n-i)!(i-1)!}F(x)^{i-1}(1-F(x))^{n-i}f(x)

Corollary (1.11.21)

f_{X_{(1)}}(x) = n(1-F(x))^{n-1}f(x) f_{X_{(n)}}(x) = nF(x)^{n-1}f(x)

Example (1.11.22)

Say X_1, ..., X_n are iid U[0,1]. Then for 0<x<1 we have f(x)=1 and F(x)=x. Therefore

\begin{aligned} &f_{X_{(i)}}(x) = \frac{n!}{(n-i)!(i-1)!}x^{i-1}(1-x)^{n-i} \\ &f_{X_{(1)}}(x) = n(1-x)^{n-1} \\ &f_{X_{(n)}}(x) = nx^{n-1} \end{aligned}

Example (1.11.23)

Say X_1, ..., X_n are iid U[0,1]. Let g be the density of the order statistic (X_{(1)}, ..., X_{(n)}). Then

g(x_{(1)}, ..., x_{(n)})=n!\text{ for }0<x_{(1)}< ...<x_{(n)}<1

The simple “proof” is as follows: for any set of n distinct numbers there are n! permutations, exactly one of which has 0<x_{(1)}< ...<x_{(n)}<1.

A “formal” proof can be done using a generalization of the change of variables formula. The problem is that the inverse transform is not unique, in fact there are n! of them because the ordered set of numbers could have come from any of the n! permutations. Once the inverse transform is fixed, though, the Jacobian is just the identity matrix with the rows rearranged, and therefore has determinant 1. Then

g(x_{(1)}, ..., x_{(n)})=n!f(x_{(1)}, ..., x_{(n)})|J|=n!