transformations.knit

Functions of a Random Variable - Transformations

Example (1.11.1)

say \(X \sim U[0,1]\) and \(\lambda>0\). We want to find the pdf of the random variable \(Y=-\lambda \log(X)\)

Let’s first find the cdf and then the pdf as follows:

\[ \begin{aligned} &F_Y(y) = P(Y\le y) = \\ &P(-\lambda \log X\le y) = \\ &P( \log X\gt -y/\lambda) = \\ &P( X\gt e^{-y/\lambda}) = \\ &1-P( X\le e^{-y/\lambda}) = \\ &1-F_X(e^{-y/\lambda}) = \\ &1-e^{-y/\lambda} \\ &\\ &f_Y(y)=\frac{d F_Y(y)}{dy} = \\ & \frac{d}{dy} \left[1-e^{-y/\lambda}\right] = e^{-y/\lambda}/\lambda \end{aligned} \]

if \(y>0\). For \(y<0\) note that \(P(-\log X<y) = 0\) because \(0<X<1\), so \(\log X<0\), so \(-\log X>0\) always.

This is an example of a function (or transformation) of a random variable. These transformations play a major role in probability and statistics. Actually, we already had a simple example before: if \(X\sim U[0,1]\) and \(A\ne 0\), then \(Y=AX+B\sim U[A,B]\).

If \(X\) is some random vector and \(Y=f(X)\), then usually the question is what is the density of \(Y\). We will see how to find the density by working through a number of typical examples.

Example (1.11.1a)

Say X is a random variable with \(P(X=k)=\frac1{2N+1}; k\in\{-N,-N+1,...,0,...,N\}\) for some \(N\ge 1\).

Let \(0\le M\le N\) and \(Y=I_{\{-M,...,M\}}(X)\). Now \(Y\in\{0,1\}\) and

\[ \begin{aligned} &P(Y=1) =P(-M\le X\le M)=\frac{2M+1}{2N+1} \\ &P(Y=0) =1-P(Y=1) = 1-\frac{2M+1}{2N+1}=\frac{2(N-M)}{2N+1} \\ \\ \end{aligned} \]

Let \(Y=|X|\). Now \(Y\in\{0,1,2,...,N\}\) and

\[ \begin{aligned} &P(Y=0) =P(X=0)=\frac1{2N+1} \\ &P(Y=k) =P(X=-k\text{ or }X=k)=\frac2{2N+1} \end{aligned} \]

Let \(Y=X^2\). Now \(Y\in\{0,1,4,...,N^2\}\) and

\[ \begin{aligned} &P(Y=0) =P(X=0)=\frac1{2N+1} \\ &P(Y=k^2) =P(X=-k\text{ or }X=k)=\frac2{2N+1} \end{aligned} \]

Example (1.11.2)

say \(X\) is the number of roles of a fair die until the first six. We have already seen that \(P(X=x) = 1/6(5/6)^{x-1}, x=1,2,..\). Let \(Y\) be 1 if \(X\) is even and 0 if \(X\) is odd.

Note: here both \(X\) and \(Y\) are discrete.

let’s do this a little more general, with \(p\) instead of \(1/6\). Also let \(q=1-p\). Then

\[ \begin{aligned} &P(Y=0) = P(X \in \{1, 3, 5, ..\}) =\\ &\sum_{k=0}^\infty pq^{(2k+1)-1} = \\ &p \sum_{k=0}^\infty q^{2k} = \\ &p \sum_{k=0}^\infty (q^2)^{k} = \\ &p\frac1{1-q^2} =\\ &p\frac1{(1+q)(1-q)} = \frac1{1+q} \end{aligned} \] so \(P(Y=0) = 1/(1+5/6) = 6/11\) and \(P(Y=1) = 1 - P(Y=0) = 5/11\).

Example (1.11.3)

say \(X\) is a random variable with \(P(X=x) = p(1-p)^{x-1}, x=1,2,..\). Let \(Y = X \mod r\), that is \(Y\) is the remainder on division by \(r\). For example \(5\mod 2=1\) and \(10 \mod 5=0\).

Note: here both \(X\) and \(Y\) are discrete.

First note that \(Y\in \{0,.., r-1 \}\). Note that \(x\mod r = k\) means that \(x=nr+k\) for some n. Let \(k\in \{1,.., r-1 \}\), then

\[ \begin{aligned} &P(Y=k) =P(X\mod r = k) =\\ &P(X=nr+k; n=0, 1, ... ) = \\ &\sum_{n=0}^\infty pq^{nr+k-1} = \\ &p\sum_{n=0}^\infty (q^r)^n q^{k-1} = \\ &pq^{k-1}\sum_{n=0}^\infty (q^r)^n = \frac{pq^{k-1}}{1-q^r} \end{aligned} \]

The case \(k=0\) is a bit different because \(X=0\) is not possible:

\[ \begin{aligned} &P(Y=0) =P(X\mod r = 0) =\\ &P(X=nr; n=1, 2,... ) = \\ &\sum_{n=1}^\infty pq^{nr-1} = \\ &\frac{p}q \sum_{n=1}^\infty q^{nr} = \\ &\frac{p}q\left(\frac{q^r}{1-q^r}\right) = \\ &\frac{pq^{r-1}}{1-q^r} \\ \end{aligned} \]

Let’s make sure this is a proper density:

\[ \begin{aligned} &\sum_{k=0}^{r-1} f_Y(k) = \\ &\frac{pq^{r-1}}{1-q^r} + \sum_{k=1}^{r-1} \frac{pq^{k-1}}{1-q^r} = \\ &\frac{p}{1-q^r}\left(q^{r-1}+ \sum_{k=1}^{r-1} q^{k-1} \right) = \\ &\frac{p}{1-q^r}\left(q^{r-1}+ \sum_{k=0}^{r-2} q^{k} \right) = \\ &\frac{p}{1-q^r}\left(q^{r-1}+ \frac{1-q^{r-1}}{1-q} \right) = \\ &\frac{p}{1-q^r}\left(\frac{(1-q)q^{r-1}+1-q^{r-1}}{1-q} \right) = \\ &\frac{p}{1-q^r}\left(\frac{q^{r-1}-q^{r}+1-q^{r-1}}{1-q} \right) =1 \\ \end{aligned} \] Good!

Example (1.11.4)

say \(X\) is a continuous r.v with pdf

\[f_X (x) = \frac12 \exp(-|x|) \text{, } x \in \mathbb{R}\]. Let \(Y=I_{[-1,1]}(X)\).

Note: here \(X\) is continuous and \(Y\) is discrete.

\[ \begin{aligned} &P(Y=1) = P(-1\le X\le 1)=\\ & \int_{-1}^1 \frac12 \exp(-|x|)dx = \\ & 2\int_{0}^1 \frac12 \exp(-x)dx = \\ &-\exp(-x)\vert_0^1 = 1-e^{-1}\\ &P(Y=0)=1-P(Y=1)=e^{-1} \end{aligned} \]

In all the examples so far \(Y\) was discrete, and so we could find the density \(f_Y(x)=P(Y=x)\) directly. Now we turn to cases where \(Y\) is continuous, and it turns out that we usually first have to find the cdf \(F\), and then the density \(f\):

Example (1.11.5)

Let \(X\sim U[0,1]\) and \(Y=X^n\) for \(n=2,3,...\).

Clearly \(F_Y(y)=0\) for \(y<0\) and \(F_Y(y)=1\) if \(y>1\). So let \(0<y<1\). Then

\[ \begin{aligned} &F_Y(y) =P(Y<y) = P(X^n<y) =\\ &P(X< y^{1/n}) =y^{1/n} \\ &f_Y(y) = \frac{d}{dy}\left[y^{1/n}\right] = \frac1n y^{1/n-1} \\ \end{aligned} \]

Example (1.11.6)

Let \(X\) have pdf \(f_X (x) = \frac12 \exp(-|x|)\). Let \(Y =X^2\). Then for \(y<0\) we have \(P(Y \le y) = 0\). So let \(y>0\). Then

\[ \begin{aligned} &F_Y(y) =P(Y\le y)= \\ &P(X^2\le y) = \\ &P(-\sqrt{y}\le X\le \sqrt{y}) = \\ &\int_{-\sqrt y}^{\sqrt y}\frac12 \exp(-|x|)dx = \\ &\int_{0}^{\sqrt y} \exp(-x)dx = \\ &-\exp({-x})|_0^{\sqrt y}=1-e^{-\sqrt y}\\ &\\ &f_Y(y)=-e^{-\sqrt y}\frac{-1}{2\sqrt y}=\frac{1}{2\sqrt y}e^{-\sqrt y};y>0 \end{aligned} \]

The last two examples were easy because solving the inequality \(f(x)\le y\) for \(x\) was easy. Let’s now do an example were that is not the case:

Example (1.11.7)

Let \(X \sim U[0,2]\), and let \(Y=\sin(2 \pi X)\).

First of course we always have \(-1 \le \sin(x) \le 1\) and therefore \(F_Y (y)=0\) if \(y<-1\) and \(F_Y(y)=1\) if \(y>1\).

Now if \(-1<y<1\) we have

\[P(Y \le y)=P(\sin(2 \pi X) \le y)\]

and so the hard part is solving the inequality

\[\sin(2 \pi X) \le y\]

The points were we have \(\sin(2 \pi X) = y\) are of course \(y=\arcsin(y)/2/\pi\). Let \(a=\arcsin(y)/2/\pi\) and note that \(\arcsin(-y)=-\arcsin(y)\).

It is often a good idea to draw a simple example. Let’s consider the case \(y=-0.3\), then \(\arcsin(-0.3)/2/\pi = -0.0485\) and this looks like this

\(y=-0.3\) is the thin horizontal red line, and the values of \(x\) with \(f(x)<y\) are the two “dips” between about 0.5 and 1 and then between 1.5 and 2 or so. More precisely we find

\[ \begin{aligned} &P(Y\le y) =P(\sin(2\pi X)\le y)= \\ &P(1/2-a<X<1+a \bigcup 3/2-a<X<2+a) = \\ &P(1/2-a<X<1+a)+P(3/2-a<X<2+a) = \\ &\left[1+a-(1/2-a)\right]/2+\left[(2+a)-(3/2-a)\right]/2=\\ &\left[1/2+2a+2a+1/2\right]/2=2a+1/2 \end{aligned} \]

Now \(0<y<1\), \(f(x)<y\) happens for three intervals:

and so

\[ \begin{aligned} &P(Y\le y) =P(\sin(2\pi X)\le y)= \\ &P(0<X<a \bigcup 1/2-a<X<1+a \bigcup 3/2-a<X<2) = \\ &P(0<X<a)+P(1/2-a<X<1+a)+P(3/2-a<X<2) = \\ &a/2+\left[1+a-(1/2-a)\right]/2+\left[2-(3/2-a)\right]/2=\\ &a/2+(1/2+2a)/2+(1/2+a)/2=2a+1/2 \end{aligned} \]

and that is the same as before! So now

\[f_Y(y) =F_Y'(y) = \frac{d}{dy} \left\{\arcsin(y)/\pi+\frac12\right\} = \frac1{\pi \sqrt{1-y^2}}\]

if \(|y|<1\)

Notice that

\[ \begin{aligned} &\lim_{y \rightarrow -\infty}F_Y(y) = \lim_{y \rightarrow -\infty} \left[ \arcsin(y)/\pi+1/2 \right] =0\\ &\lim_{y \rightarrow \infty}F_Y(y) = \lim_{y \rightarrow \infty} \left[ \arcsin(y)/\pi+1/2 \right] = 1 \end{aligned} \]

and so this is a proper density!

Here is a transformation that is quite useful:

Lemma

Let \(X\) be a continuous rv with cdf \(F\), and \(F\) is strictly increasing. Then \(Y=F(X) \sim U[0,1]\).

proof

If \(X\) is a continuous rv \(X\) with a cdf \(F\) which is strictly increasing, then \(F^{-1}\) exists on this set, and so

\[F_{Y}(x) = P(F(X) \le x) = P(X \le F^{-1}(x)) =F(F^{-1}(x)) = x\]

for \(0<x<1\), but this is the cdf of a uniform [0,1].

And this can be generalized even a bit more:

Theorem (1.11.8)

Probability Integral Transform

Let \(X\) be a continuous rv with cdf F. Then \(F(X) \sim\) U[0,1]

proof let \(F\) be the cdf and define the generalized inverse function \(F^{*}\) by

\[F^*(x) = \min\{t : F(t) \ge x \}\]

First note that if F is strictly increasing we have \(F^{*}=F^{-1}\).

Moreover we always have \(F(F^{*}(x))=x\). This is easiest to see with a graph:

take for example \(x=0.5\), then \(F(0.5)=0.5\) and \(F^{*}(0.5)=0.3\). Finally \(F(F^{*}(0.5))=F(0.3)=0.5\)

So now

\[F_{F(X)}(x) = P(F(X) \le x) = P(X \le F^{*}(x)) =F(F^{*}(x)) = x\]

Notice that unlike a regular inverse we don’t necessarily have \(F^*(F(x))=x\), but for our proof we don’t need that!

The “reverse” of this theorem is useful for simulating data from some distribution F as long as F has an inverse:

Example (1.11.8a)

Say we want to generate data from a random variable X with density \(f(x)=ax^{a-1};0<x<1\) and \(a>0\). Now

\[F(x)=\int_0^x at^{a-1}dt = t^a|_0^x=x^a\] Let \(U\sim U[0,1]\), then

\[F_{F^{-1}(U)}(x)=P(F^{-1}(U)\le x)=P(U\le F(x))=F(x)\] and so \(F^{-1}(U)\sim F\)!

Now

\[ \begin{aligned} &y=x^a\rightarrow x=y^{1/a} \\ &F^{-1}(x) = x^{1/a} \end{aligned} \] and

n=1e4
a=3
x=runif(n)^(1/a)
hist(x, 100, freq=FALSE,main="")
curve(3*x^2, 0, 1, add=TRUE, col="blue",lwd=2)

Say we want to generate data from the density \(f(x)=6x(1-x);0<x<1\):

\[ \begin{aligned} &F(x) =\int_0^x 6t(1-t) dt = \\ &3t^2-2t^3|_0^x = 3x^2-2x^3\\ &y=3x^2-2x^3 \\ &2x^3-3x^2+y=0 \end{aligned} \] Now we need to solve a cubic equation. In general such an equation can have up to three real solutions, but in our case there is always only one:

par(mfrow=c(2,2))
curve(2*x^3-3*x^2+0.1,0,1,lwd=2);abline(h=0)
curve(2*x^3-3*x^2+0.33,0,1,lwd=2);abline(h=0)
curve(2*x^3-3*x^2+0.66,0,1,lwd=2);abline(h=0)
curve(2*x^3-3*x^2+0.9,0,1,lwd=2);abline(h=0)

and the easiest way to find the root is using R:

n=1e4
x=rep(0, n)
for(i in 1:n) {
   tmp=Re(polyroot(c(runif(1), 0, -3, 2)))
   x[i]=tmp[tmp>0&tmp<1]
}   
hist(x, 100, freq=FALSE,main="")
curve(6*x*(1-x), 0, 1, add=TRUE, col="blue",lwd=2)

Functions of Random Vectors:

Example (1.11.9)

Say \((X,Y)\) is a discrete rv with joint pdf \(f_{X,Y}(x,y)\) given here:

	1	2
1	1/10	1/10
2	1/10	1/2
3	1/10	1/10

Say \(U=2X-Y\).

\(X\) takes values 1,2,3, so \(2X\) takes values 2,4,6, so \(2X-Y\) takes values 0, 1, 2, 3, 4 and 5. Now

\[ \begin{aligned} &f_U(0)=P(U=0) =P(X=1, Y=2) = 1/10 \\ &f_U(1)=P(U=1) =P(X=1, Y=1) = 1/10 \\ &f_U(2)=P(U=2) =P(X=2, Y=2) = 1/2 \\ &f_U(3)=P(U=3) =P(X=2, Y=1) = 1/10 \\ &f_U(4)=P(U=4) =P(X=3, Y=2) = 1/10 \\ &f_U(5)=P(U=5) =P(X=3, Y=1) = 1/10 \\ \end{aligned} \]

Example (1.11.9a)

We roll two fair dice. Let X be the blue and Y the red die. Let S=X+Y. Now we have previously found

\[P(S=k)=\frac{k-1}{36};k=2,...,7\\P(S=k)=\frac{13-k}{36};k=8,...,12\\\]

Example (1.11.10)

Say \((X,Y)\) is a discrete rv with joint pdf \(f_{X,Y}(x,y)=(1-p)^2p^x; x,y \in \{0,1,..\}, y \le x, 0<p<1\). Let \(U=I(X=Y)\).

\[ \begin{aligned} &P(U=1) =P(X=Y) = \\ &\sum_{x=0}^\infty (1-p)^2p^x = \\ &(1-p)^2\frac1{1-p} =1-p \\ &P(U=0)=1-P(U=1)=p \end{aligned} \]

Example (1.11.11)

Say \((X,Y)\) is a discrete rv with joint pdf \(f_{X,Y}(x,y)=(1-p)^2p^x; x,y \in \{0,1,..\}, y \le x\), and \(0<p<1\). Let \(U=X\) and \(V=X-Y\).

First what are the possible values of \((U,V)\)? We have \(u= x \in \{0,1,..\}\) and \(y \le x\) or \(0 \le x-y=v\) and so \(v \in \{0,1,..\}\).

Also \(v=x-y=u-y \le u\) because \(y \ge 0\).

Now for any \((u,v) \in \{0,1,..\}\) with \(v \le u\) we have

\[ \begin{aligned} &f_{U,V}(u,v) = \\ &P(U=u,V=v) = \\ &P(X=u,X-Y=v) = \\ &P(X=u,u-Y=v) = \\ &P(X=u,Y=u-v) = \\ &(1-p)^2p^u \end{aligned} \]

So we see that \(f_{U,V}(u,v)=f_{X,Y}(u,v)\), or \((X,Y)\) has the same distribution as \((U,V)\)!

Next we turn to continuous random vectors:

Before we go on let’s generalize the first example above, where we had \(X \sim U[0,1], \lambda>0\) and \(Y=-\lambda \log(X)\). Let’s say we have \(Y=g(X)\), where g is strictly increasing. Then \(g^{-1}\) exists and is also strictly increasing. Therefore

\[ \begin{aligned} &F_Y(y) =P(Y\le y) = P(g(X)\le y) = \\ &P(X< g^{-1}(y)) = \\ &\int_{-\infty}^{g^{-1}(y)} f_X(x)dx = \\ &\\ &f_Y(y) = \frac{d}{dy} \left[\int_{-\infty}^{g^{-1}(y)} f_X(x)dx\right] =\\ &f_X(g^{-1}(y)) \frac{d}{dy}g^{-1}(y) \end{aligned} \]

Example (1.11.12)

Say \(X\) has density \(f_X(x) = 3x^2;0<x<1\) and \(Y=\sqrt x\), so \(g(x)=\sqrt x\), \(g^{-1}(y)=y^2\) and \(\frac{d}{dy}g^{-1}(y)=2y\). Therefore

\[f_Y(y) = 3(y^2)^2\times 2y=6y^5;0<y<1\]

In one dimension this is rarely useful because solving the inequality \(g(x)<y\) is usually done directly, but it does become useful in higher dimensions:

Example (1.11.13)

say \((X,Y)\) is a bivariate standard normal r.v, that is it has joint density given by

\[f(x,y)=\frac1{2\pi}\exp\left\{-\frac12(x^2+y^2) \right\}\]

for \((x,y) \in \mathbb{R}^2\)

Let the r.v. \((U,V)\) be defined by \(U=X+Y\) and \(V=X-Y\).

To start let’s define the functions \(g_1(x,y) = x+y\) and \(g_2(x,y) = x-y\), so that

\[U=g_1(X,Y)\text{ and }V = g_2(X,Y)\]

For what values of u and v is \(f_{U,V}(u,v)\) positive? Well, for any values for which the system of 2 linear equations in two unknowns \(u=x+y\) and \(v=x-y\) has a solution. These solutions are \[ \begin{aligned} &x = h_1(u,v) = (u + v)/2\\ &y = h_2(u,v) = (u - v)/2 \end{aligned} \]

From this we find that for any \((u,v) \in \mathbb{R}^2\) there is a unique \((x,y) \in \mathbb{R}^2\) such that \(u=x+y\) and \(v=x-y\). So the transformation \((x,y) \rightarrow (u,v)\) is one-to-one and therefore has a Jacobian given by

\[ J =\left| \begin{array}{cc} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{array} \right| = \left| \begin{array}{cc} \frac12 & \frac12 \\ \frac12 & -\frac12 \end{array} \right| = -\frac12 \]

Now from multivariable calculus we have the change of variable formula:

\[f_{U,V}(u,v)=f_{X,Y}(h_1(x,y),h_2(x,y))\lvert J\rvert\]

and so

\[ \begin{aligned} &f_{U,V}(u,v) = \\ &f_{X,Y}(h_1(x,y),h_2(x,y))\vert J \vert = \\ &\frac1{2\pi}\exp\left\{-\frac12\left([\frac{u+v}2]^2+[\frac{u-v}2]^2\right) \right\}\vert -\frac12\vert = \\ &\frac1{2\pi}\exp\left\{-\frac12\left(\frac{u^2+2uv+v^2}4+\frac{u^2-2uv+v^2}4\right) \right\}\frac12 = \\ &\frac1{4\pi}\exp\left\{-\frac{u^2+v^2}4 \right\} \end{aligned} \]

Note that the density factors into a function of u and a function of v. As we saw before this means that U and V are independent.

Example (1.11.14)

Say \(X_1 , .., X_n\) are iid U[0,1]. Let \(Y_1 =X_1, Y_2 =X_2-X_1 ,.., Y_n=X_n-X_{n-1}\). Then

\[ \begin{aligned} &x_1 = y_1\\ &x_2 = y_1+y_2\\ &\vdots \\ &x_n=y_{n-1}+y_n\\ &\\ &J=\begin{vmatrix} 1 & 0 & 0 & ... &0\\ 1 & 1 & 0 & ... &0\\ 0 & 1 & 1 & ... &0\\ \vdots & & & &\vdots\\ 0 & 0 &...& 1&1 \end{vmatrix}=1\\ &\\ &f_{Y_1,..,Y_n}(y_1,...,y_n) = 1\times 1=1 \end{aligned} \]

so the density is a constant, and therefore \((Y_1, .., Y_n)\) is uniform. But careful, uniform on what set?

\(y_2~=x_2-x_1\), \(0 \le x_i \le 1\), therefore \(-1 \le y_2 \le 1\).

We have

\[0 \le y_1 \le 1\]
\[-y_k-1 \le y_k \le 1-y_k-1; k=2,..,n\]

For n=2 the set is shown here:

Example (1.11.15)

A rv \(X\) is called a normal (or Gaussian) rv with mean \(\mu\) and standard deviation \(\sigma\) if it had density

\[f(x)=\frac1{\sqrt{2\pi \sigma^2}}\exp\left\{- \frac{(x-\mu)^2}{2\sigma^2}\right\}\]

a special case is a standard normal rv, which has \(\mu=0\) and \(\sigma=1\).

Say \(X\) and \(Y\) are independent standard normal rv’s. Let \(Z = X + Y\). Find the pdf of Z.

Note: now we have a transformation from \(\mathbb{R}^2 \rightarrow \mathbb{R}\).

\(Z = X + Y = U\) in the example above, so the pdf of \(Z\) is just the marginal of \(U\) and we find

\[ \begin{aligned} &f_Z(z) = \int_{-\infty}^{\infty} \frac1{4\pi}\exp\left\{-\frac{z^2+v^2}4 \right\} dv = \\ &\int_{-\infty}^{\infty} \frac1{4\pi}\exp\left\{-\frac{z^2}4\right\}\exp\left\{-\frac{v^2}4 \right\} dv = \\ &\frac1{\sqrt{2\pi\times 2}}\exp\left\{-\frac{z^2}{2\times 2}\right\}\int_{-\infty}^{\infty} \frac1{\sqrt{2\pi\times 2}}\exp\left\{-\frac{v^2}{2\times 2}\right\} dv = \\ &\frac1{\sqrt{2\pi\times 2}}\exp\left\{-\frac{z^2}{2\times 2}\right\} \end{aligned} \]

and we see that \(Z\) has a normal distribution with \(\mu=0\) and \(\sigma=\sqrt2\).

Say \(X\) and \(Y\) are two continuous independent r.v with pdf’s \(f_X\) and \(f_Y\), and let \(Z = X+Y\). If we repeat the above calculations we can show that in general the pdf of \(Z\) is given by

\[f_Z(z)=\int_{-\infty}^\infty f_X(t)f_Y(z-t)dt\]

This is called the convolution formula.

There is a second method for deriving the convolution formula which is useful. It uses the law of total probability:

In the setup from above we have

\[ \begin{aligned} &F_{X+Y}(z) = P(X+Y\le z) =\\ &\int_{-\infty}^\infty P(X+Y\le z|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty P(X\le z-y|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty F_{X|Y=y}(z-y|y)f_Y(y)dy \\ &\\ &f_Z(z) = \frac{d}{dz} F_Z(z) =\\ &\frac{d}{dz} \int_{-\infty}^\infty F_{X|Y=y}(z-y|y)f_Y(y)dy =\\ &\int_{-\infty}^\infty \frac{d}{dz} F_{X|Y=y}(z-y|y)f_Y(y)dy = \\ &\int_{-\infty}^\infty f_{X|Y=y}(z-y|y)f_Y(y)dy = \\ &\int_{-\infty}^\infty f_{X}(z-y)f_Y(y)dy \\ \end{aligned} \]

and here we used the independence only at the very end. Therefore the formula up to the last line holds in general.

The tricky part of this is the interchange of the derivative and the integral. Working with densities and cdfs usually means they are ok.

Example (1.11.16)

say \(X\) and \(Y\) are independent exponential rv’s with rate \(\lambda\). Find the pdf of \(Z=X+Y\).

\[ \begin{aligned} &f_Z(z)=\int_{-\infty}^\infty f_X(t)f_Y(z-t)dt =\\ &\int_0^z \lambda\exp(-\lambda t)\lambda\exp(-\lambda (z-t))dt = \\ &\lambda^2\exp(-\lambda z)\int_0^z dt = \\ &\lambda^2z\exp(-\lambda z) \end{aligned} \] for \(z>0\).

Example (1.11.16a)

Let \(X,Y\sim U[0,1]\), \(X\perp Y\), and let \(Z=X+Y\). Now

\[ \begin{aligned} &f_Z(z)=\int_{-\infty}^\infty f_X(t)f_Y(z-t)dt =\\ &\int_{-\infty}^\infty I_{[0,1]}(t)I_{[0,1]}(z-t)dt \end{aligned} \]

Now \(0<z-t<1,0<t<1\) implies \(\max\{0,z-1\}<t<z\), so

\[ \begin{aligned} &0<z<1: \\ &f_Z(z)= \int_0^z dz = z\\ &1<z<2: \\ &f_Z(z)= \int_{z-1}^1 dz = 1-(z-1)=2-z\\ \end{aligned} \]

n=1e4
z=runif(n)+runif(n)
hist(z, 100, freq=FALSE,main="")
curve(ifelse(x<1,x,2-x), 0, 2, add=TRUE, col="blue", lwd=2)

One nice feature of the second derivation of the convolution formula is that it often works for things other than sums:

Example (1.11.17)

say \(Y\) is an exponential rv with rate 1 and \(X|Y=y\sim U[0,y]\). Find the pdf of \(Z=Y/X\).

We have \(f_Y(y)=e^{-y}, y>0\) and \(f_{X|Y=y}(x|y)=\frac1yI_{[0, y]}(x)\), and so

\[ \begin{aligned} &F_{X/Y}(z) = P(X/Y\le z) =\\ &\int_{-\infty}^\infty P(X/Y\le z|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty P(X\le zy|Y=y)f_Y(y)dy = \\ &\int_{-\infty}^\infty F_{X|Y=y}(zy|y)f_Y(y)dy \\ &\\ &f_Z(z) = \frac{d}{dz} F_Z(z) =\\ &\frac{d}{dz} \int_{-\infty}^\infty F_{X|Y=y}(zy|y)f_Y(y)dy =\\ &\int_{-\infty}^\infty \frac{d}{dz} F_{X|Y=y}(zy|y)f_Y(y)dy = \\ &\int_{-\infty}^\infty f_{X|Y=y}(zy|y)yf_Y(y)dy = \\ &\int_0^\infty \frac1yI_{[0, y]}(zy) y e^{-y}dy =\\ &\int_0^\infty e^{-y} dy =1 \end{aligned} \]

For what z is this true? Notice the indicator function \(\frac1{y}I_{[0, y]}(zy)\), which is 1 if \(0<zy<y\) or \(0<z<1\), and 0 otherwise. So \(f_{Y/X}=I_{[0,1]}(x)\) and so \(Y/X=x\sim U[0,1]\).

Finally we study a different kind of transformation:

Example (1.11.18)

Say \(X_1, .., X_n\) are iid \(U[0,1]\). Let \(M=\max\{X_1 , .., X_n\}\). We want to find \(E[M]\) and \(var(M)\).

First we need to find the density of \(M\):

\[ \begin{aligned} &F_M(x) = P(\max\{X_1 , .., X_n\}\le x)=\\ &P(X_1\le x,..,X_n\le x) =\\ &P(X_1\le x)\times ... \times P(X_n\le x) = \\ &\left[P(X_1\le x)\right]^n =x^n \\ &\\ &f_M(x)=nx^{n-1};0<x<1\\ &\\ &E[M^k]=\int_0^1 x^knx^{n-1}dx=\frac1{n+k}x^{n+k}|_0^1=\frac1{n+k}\\ &E[M]=\frac1{n+1}\\ &var(M)=\frac1{n+2}-(\frac1{n+1})^2 \end{aligned} \]

This is a special case of what are called order statistics. Many statistical methods, for example the median and the range, are based on an ordered data set.

One of the difficulties when dealing with order statistics are ties, that is the same observation appearing more than once. This should only occur for discrete data because for continuous data the probability of a tie is zero. They may happen anyway because of rounding, but we will ignore them in what follows.

Definition (1.11.19)

Say \(X_1, ..., X_n\) are iid with density f. Then \(X_{(i)}\) is the \(i^{th}\) order statistics if \(X_{(1)}<X_{(2)}<..<X_{(n)}\).

Note \(X_{(1)} = \min \{X_i\}\) and \(X_{(n)} = \max \{X_i\}\).

Theorem (1.11.20)

If \(X_1,..,X_n\) are continuous random variables we have

\[f_{X_{(i)}}(x)=\frac{n!}{(i-1)!(n-i)!}F(x)^{i-1}(1-F(x))^{n-i}f(x)\]

proof

Let \(Y\) be a r.v. that counts the number of \(X_j \le x\) for some fixed number x. We will see shortly that

\[P(Y=j)= {n\choose j}F(x)^j(1-F(x))^{n-j} \]

Note also that the event \(\{Y \ge i\}\) means that i or more observations are less or equal to x, so the \(i^{th}\) largest is less or equal to x. Therefore

\[F_{X_{(i)}}(x) =P(X_{(i)}\le x) = P(Y\ge i)=\sum_{k=i}^n {n\choose k}F(x)^k(1-F(x))^{n-k}\]

and so

\[ \begin{aligned} &f_{X_{(i)}}(x) = \frac{d}{dx} F_{X_{(i)}}(x) = \\ &\frac{d}{dx} \sum_{k=i}^n {n\choose k}F(x)^k(1-F(x))^{n-k} = \\ & \sum_{k=i}^n {n\choose k} \frac{d}{dx}\left[F(x)^k(1-F(x))^{n-k}\right] = \\ & \sum_{k=i}^n {n\choose k}\left[kF(x)^{k-1}f(x)(1-F(x))^{n-k}+F(x)^k(n-k)(1-F(x))^{n-k-1}(-f(x))\right] = \\ &f(x) \sum_{k=i}^n {n\choose k}\left[kF(x)^{k-1}(1-F(x))^{n-k}-F(x)^k(n-k)(1-F(x))^{n-k-1}\right] \end{aligned} \] Let’s simplify the notation a bit by writing \(t=F(x)\), then

\[ \begin{aligned} &\sum_{k=i}^n {n\choose k}\left[kt^{k-1}(1-t)^{n-k}-t^k(n-k)(1-t)^{n-k-1}\right] = \\ &\sum_{k=i}^n {n\choose k}kt^{k-1}(1-t)^{n-k}- \sum_{k=i}^n {n\choose k} t^k(n-k)(1-t)^{n-k-1} = \\ &\sum_{k=i}^n {n\choose k}kt^{k-1}(1-t)^{n-k}- \sum_{k=i}^{n-1} {n\choose k}(n-k) t^k(1-t)^{n-k-1} = \\ &\\ &{n\choose i}it^{i-1}(1-t)^{n-i}+\\ &\sum_{k=i+1}^n {n\choose k}kt^{k-1}(1-t)^{n-k}- \sum_{k=i}^{n-1} {n\choose k}(n-k) t^k(1-t)^{n-k-1} = \\ &\\ &\frac{n!}{(n-i)!i!}it^{i-1}(1-t)^{n-i}+\\ &\sum_{l=i}^{n-1} {n\choose l+1}(l+1)t^{l}(1-t)^{n-l-1}- \sum_{k=i}^{n-1} {n\choose k}(n-k) t^k(1-t)^{n-k-1} \end{aligned} \] where we change the summation index. Now notice that both sums go from i to n-1 and have terms of the form \(t^k(1-t)^{n-k-1}\). Moreover

\[ \begin{aligned} &{n\choose l+1}(l+1) =\frac{n!}{(n-l-1)!(l+1)!}(l+1) =\frac{n!}{l!(n-l-1)!}\\ &{n\choose k}(n-k) =\frac{n!}{(n-k)!k!}(n-k) =\frac{n!}{k!(n-k-1)!} \end{aligned} \] so the two sums are the same and cancel each other out!

Finally replacing t yields the result:

\[f_{X_{(i)}}(x) = \frac{n!}{(n-i)!(i-1)!}F(x)^{i-1}(1-F(x))^{n-i}f(x)\]

Corollary (1.11.21)

\[f_{X_{(1)}}(x) = n(1-F(x))^{n-1}f(x)\] \[f_{X_{(n)}}(x) = nF(x)^{n-1}f(x)\]

Example (1.11.22)

Say \(X_1, ..., X_n\) are iid \(U[0,1]\). Then for \(0<x<1\) we have \(f(x)=1\) and \(F(x)=x\). Therefore

\[ \begin{aligned} &f_{X_{(i)}}(x) = \frac{n!}{(n-i)!(i-1)!}x^{i-1}(1-x)^{n-i} \\ &f_{X_{(1)}}(x) = n(1-x)^{n-1} \\ &f_{X_{(n)}}(x) = nx^{n-1} \end{aligned} \]

Example (1.11.23)

Say \(X_1, ..., X_n\) are iid \(U[0,1]\). Let g be the density of the order statistic \((X_{(1)}, ..., X_{(n)})\). Then

\[g(x_{(1)}, ..., x_{(n)})=n!\text{ for }0<x_{(1)}< ...<x_{(n)}<1\]

The simple “proof” is as follows: for any set of n distinct numbers there are n! permutations, exactly one of which has \(0<x_{(1)}< ...<x_{(n)}<1\).

A “formal” proof can be done using a generalization of the change of variables formula. The problem is that the inverse transform is not unique, in fact there are n! of them because the ordered set of numbers could have come from any of the n! permutations. Once the inverse transform is fixed, though, the Jacobian is just the identity matrix with the rows rearranged, and therefore has determinant 1. Then

\[g(x_{(1)}, ..., x_{(n)})=n!f(x_{(1)}, ..., x_{(n)})|J|=n!\]