moment-generating.knit

Moment Generating and Characteristic Functions

Definition (1.10.1)

The moment generating function of a rv \(X\) is defined by

\[\psi(t)=E[\exp (tX)]\]

The characteristic function of a rv \(X\) is defined by

\[\psi(t)=E[\exp (itX)]\]

In general characteristic functions are much more useful in Probability Theory, but they require some knowledge of complex analysis, and so we will just consider moment generating functions.

Example (1.10.2)

Say \(X\) has density \(f(x)={n\choose i}p^i(1-p)^{n-i}\), \(i=0,1,..,n\), then

\[ \begin{aligned} &\psi(t) = \sum_{i=0}^n e^{ti}{n\choose i}p^i(1-p)^{n-i} = \\ &\sum_{i=0}^n {n\choose i}(pe^t)^{i}(1-p)^{n-i} = \\ &\left(pe^t+1-p\right)^n \end{aligned} \]

\[ \begin{aligned} &\psi(t) = \sum_{x=1}^\infty e^{tx}p(1-p)^{x-1}=\\ &p\sum_{k=0}^\infty e^{t(k+1)}(1-p)^{k} =\\ &p\sum_{k=0}^\infty (e^{t})^{k+1}(1-p)^{k} =\\ &pe^{t}\sum_{k=0}^\infty (e^{t}(1-p)^{k} =\\ &\frac{pe^{t}}{1-e^{t}(1-p)} \end{aligned} \]

Example (1.10.4)

Let \(X \sim Exp(\lambda)\), then

\[ \begin{aligned} &\psi(t) = \int_0^\infty e^{tx}\lambda e^{-\lambda x}dx = \\ &\lambda\int_0^\infty e^{(t-\lambda) x}dx = \\ &\lambda \frac1{t-\lambda }e^{(t-\lambda) x}|_0^\infty = \\ &\frac{\lambda}{\lambda-t } \end{aligned} \] if \(t-\lambda<0\) or \(t<\lambda\). Otherwise the integral is \(\infty\).

Example (1.10.4a)

Let X be a random variable with density \(f(x)=\frac12 \exp(-|x|)\). This is called a double exponential rv.

Let \(|t|<1\), then

\[ \begin{aligned} &\psi_X(t) =E[e^{tX}] =\int_{-\infty}^\infty e^{tx} \frac12 \exp(-|x|) dx=\\ &\frac12\int_{-\infty}^\infty \exp(tx-|x|) dx = \\ &\frac12\left[\int_{-\infty}^0 \exp(tx-|x|) dx+\int_{0}^\infty \exp(tx-|x|) dx\right] = \\ &\frac12\left[\int_{-\infty}^0 \exp(tx+x) dx+\int_{0}^\infty \exp(tx-x) dx\right] = \\ &\frac12\left[\frac1{t+1}\exp(tx+x)|_{-\infty}^0 - \frac1{t-1}\exp(tx-x)|_0^{\infty}\right] = \\ &\frac12\left[\frac1{t+1}\exp(tx+x)|_{-\infty}^0 - \frac1{t-1}\exp(tx-x)|_0^{\infty}\right] = \\ &\frac12\left[\frac1{t+1} - \frac1{t-1}\right] = \frac1{2(1-t^2)} \end{aligned} \]

The name comes from the following theorem

Theorem (1.10.5)

Say \(\psi(t)\) is the mgf of a rv \(X\). Say there exists an \(\epsilon >0\) such that \(| \psi(t)|< \infty\) for all \(t \in (- \epsilon , \epsilon )\). Then

\[\psi^k(0) = E[X^k]\]

for all k.

proof

say \(X\) is a discrete rv with pdf \(f\) and \(X\) takes finitely many values. Then

\[ \begin{aligned} &\psi^k(0) = \frac{d^k E[e^{tX}]}{dt^k}\vert_{t=0} = \\ &\frac{d^k}{dt^k} \left(\sum_x e^{tx}f(x)\right)\vert_{t=0} = \\ & \left(\sum_x \frac{d^k}{dt^k} e^{tx}f(x)\right)\vert_{t=0} = \\ & \left(\sum_x x^ke^{tx}f(x)\right)\vert_{t=0} = \\ & \sum_x x^kf(x) = E[X^k]\\ \end{aligned} \] The extension to an infinite sample space and to a continuous rv requires some real analysis theorems.

Example (1.10.6)

Say\(X\) has density \(f(x)={n\choose i}p^i(1-p)^{n-i}\), \(i=0,1,..,n\), then

\[ \begin{aligned} &\psi(t) = \left(pe^t+1-p\right)^n \\ &\frac{d\psi(t)}{dt}=n\left(pe^t+1-p\right)^{n-1}pe^t \\ &\frac{d\psi(t)}{dt}|_{t=0}=n\left(pe^0+1-p\right)^{n-1}pe^0 \end{aligned} \]

but

\[ \begin{aligned} &E[X] =\sum_{i=0}^n i{n\choose i}p^i(1-p)^{n-i} =\\ &\sum_{i=1}^n i\frac{n!}{(n-i)!i!}p^i(1-p)^{n-i} = \\ &\sum_{i=1}^n \frac{n!}{(n-i)!(i-1)!}p^i(1-p)^{n-i} = \\ &np\sum_{i=1}^n \frac{(n-1)!}{(n-i)!(i-1)!}p^{i-1}(1-p)^{n-i} = \\ &np\sum_{k=0}^{n-1} \frac{(n-1)!}{(n-1-k)!k!}p^{k}(1-p)^{n-1-k} = \\ &np \end{aligned} \]

Example (1.10.7)

Let \(X \sim Exp(\lambda)\), then

\[ \begin{aligned} &\psi(t) = \frac{\lambda}{\lambda-t }; t<\lambda\\ &\frac{d\psi(t)}{dt}= \frac{\lambda}{(\lambda-t)^2}\\ &\frac{d\psi(t)}{dt}|_{t=0}=-\frac{\lambda}{(\lambda-0)^2}=\frac1\lambda=E[X]\\ &\frac{d^2\psi(t)}{dt^2}= \frac{2\lambda}{(\lambda-t)^3}\\ &\frac{d\psi(t)}{dt}|_{t=0}=\frac{2\lambda}{(\lambda-0)^3}=\frac2{\lambda^2}=E[X^2] \end{aligned} \]

Warning nobody uses the moment generating function to generates moments! It has other uses:

Theorem (1.10.8)

let \(X_1 ,..., X_n\) be a sequence of independent rv.s with mgf’s \(\psi_i\) , and let \(Z=\sum X_i\) , then

\[\psi_Z (t)=\prod \psi_i (t)\]

if the distributions of the \(X_i\) are the same as well, then \(\psi_i =\psi\) for all i and

\[\psi_Z =\left(\psi(t)\right)^n\]

proof

\[ \begin{aligned} &\psi_Z(t) = E \left[ e^{tZ}\right]= \\ &E \left[ e^{t\sum X_i}\right] = \\ &E \left[ \prod e^{tX_i}\right] = \\ &\prod E \left[ e^{tX_i}\right] = \\ &\prod\psi_i(t) \end{aligned} \]

here is a very deep theorem, without proof:

Theorem (1.10.9)

let \(X\) and \(Y\) be rv.s with mgf’s \(\psi_X\) and \(\psi_Y\), respectively. If both mgf’s are finite in an open neighborhood of 0 and if \(\psi_X(t) = \psi_Y(t)\) for all \(t\) in this neighborhood, then \(F_X (u)=F_Y (u)\) for all u.

In other words, the cdf determines the mgf and vice versa. This means that one way to show that two random variables have the same distribution is to show that they have the same mgf.

Example (1.10.9a)

Say \(X\) has density \(f(x)=\frac{\lambda^x}{x!}e^{-\lambda};x=0,1,2,...\) and \(Y\) has density \(g(x)=\frac{\tau^x}{x!}e^{-\tau};x=0,1,2,...\). Now

\[ \begin{aligned} &\psi_X(t) =\sum_{k=0}^\infty e^{tk}\frac{\lambda^k}{k!}e^{-\lambda}=\\ &e^{-\lambda}\sum_{k=0}^\infty \frac{(\lambda e^{t})^k}{k!} = \\ &e^{-\lambda}e^{\lambda e^{t}} \end{aligned} \] and so if X and Y are independent

\[ \begin{aligned} &\psi_{X+Y}(t) = \\ &e^{-\lambda}e^{\lambda e^{t}}e^{-\tau}e^{\tau e^{t}} = \\ &e^{-(\lambda+\tau)}e^{(\lambda+\tau) e^{t}} \end{aligned} \]

and so X+Y has a density of the same form.

Example (1.10.10)

show that the sum of two independent exponential rv. is not an exponential rv.

say \(X \sim Exp(\lambda)\) and \(Y \sim Exp(\rho)\), then

\(\psi_X (t)= \lambda /( \lambda -t)\) and \(\psi_Y (t)=\rho/(\rho-t)\), so

\[\psi_{X+Y} (t)= \lambda/(\lambda-t)\rho/(\rho-t) \ne a/(a-t)\]

for any a and all t.

Example (1.10.11)

Consider the two pdfs given by

\[ \begin{aligned} &f(x) = \frac1{\sqrt{2\pi}x}e^{-(\log x)^2/2};0<x<\infty\\ &g(x) = f(x)\left[1+\sin(2\pi\log x)\right] \end{aligned} \]

(f is called a log-normal distribution).

Now it turns out that if \(X\) has density \(f\) , then

\[ \begin{aligned} &E[X^k] = \int_0^\infty x^k \frac1{\sqrt{2\pi}x}e^{-(\log x)^2/2}dx=(t=\log x)\\ &\int_{-\infty}^\infty (e^t)^k \frac1{\sqrt{2\pi}}e^{-(t)^2/2}dt = \\ &\int_{-\infty}^\infty \frac1{\sqrt{2\pi}}e^{-\frac12(t^2-2tk)}dt = \\ &\int_{-\infty}^\infty \frac1{\sqrt{2\pi}}e^{-\frac12(t^2-2tk+k^2-k^2)}dt = \\ &e^{k^2/2}\int_{-\infty}^\infty \frac1{\sqrt{2\pi}}e^{-\frac12(t-k)^2}dt = \\ &e^{k^2/2} \end{aligned} \] Note that \(\int_{-\infty}^\infty \frac1{\sqrt{2\pi}}e^{-\frac12(t-k)^2}dt=1\) for all k will be shown later.

but

\[ \begin{aligned} &E[Y^k] =\int_0^\infty x^kf(x)\left[1+\sin(2\pi\log x\right]dx= \\ &\int_0^\infty x^kf(x)dx+\int_0^\infty x^kf(x)\sin(2\pi\log x)dx = \\ &e^{k^2/2} + \int_{-\infty}^{\infty} e^{k(t+k)} \frac1{\sqrt{2\pi}}e^{-\frac12(t+k)^2}\sin[2\pi(t+k)]dt = e^{k^2/2}\\ \end{aligned} \]

because \(e^{k(t+k)} \frac1{\sqrt{2\pi}}e^{-\frac12(t+k)^2}\sin[2\pi(t+k)]\) is an odd function. So here is an example that shows that the condition of the theorem above is also necessary, without it you can have two rv’s with all their moments equal but different distributions.

Moment Generating Functions of Random Vectors

Definition

Let \((X_1,..,X_n)\) be a random vector, then the moment generating function is defined by

\[\psi(t_1,..,t_n)=E\left[e^{t_1X_1+...+t_nX_n}\right]\]

Theorem

\[\frac{d^2 \psi(t,s)}{dsdt}|_{s=t=0}=E[XY]\]

proof

\[ \begin{aligned} &\frac{d^2 \psi(t,s)}{dsdt}|_{s=t=0} = \frac{d^2 }{dsdt} E[e^{tX+sY}]|_{s=t=0} = \text{?}\\ & E\left[\frac{d^2 }{dsdt}e^{tX+sY}\right]|_{s=t=0} = \\ &E[XYe^{tX+sY}]|_{s=t=0} = E[XY]\\ \end{aligned} \] so this function also generates moments of random vectors.

Example

say (X,Y) is a random vector with density \(f(x,y)=2\) for \(0<x<y<1\). Then

\[ \begin{aligned} &E[XY] =\int_0^1 \int_0^y xy dx dy = \\ &\int_0^1 y (x^2/2|_0^y dy = \\ &\int_0^1 y^3/2 dx = y^4/8|_0^1 =1/8\\ \end{aligned} \]

\[ \begin{aligned} &\psi(t,s) =E[e^{tX+sY}] = \\ &\int_0^1 \int_0^y e^{tx+sy} 2dx dy = \\ &2\int_0^1 e^{sy}\left\{ \int_0^y e^{tx} dx \right\}dy = \\ &2\int_0^1 e^{sy}\left\{ \frac1t e^{tx}|_0^y \right\}dy = \\ &2\int_0^1 e^{sy}\frac1t [e^{ty}-1] dy = \\ &\frac2t \left(\int_0^1 e^{(s+t)y} -e^{sy} dy \right)= \\ &\frac2t \left( \frac1{s+t}e^{(s+t)y} - \frac1{s}e^{sy}|_0^1 \right.= \\ &\frac2t \left( \frac{e^{s+t}-1}{s+t} - \frac{e^{s}-1}{s} \right) = \\ &2\frac{s(e^{s+t}-1)-(s+t)(e^s-1)}{ts(s+t)} =\\ &2\frac{se^se^t-s-se^s+s-te^s+t}{ts(s+t)} =\\ &2\frac{(se^t-s-t)e^s+t}{ts^2+st^2} \end{aligned} \] This is not a nice function, and so finding the second derivative is a bit ugly. Let’s at least find an approximation:

\[ \begin{aligned} &\frac{d^2 \psi(t,s)}{dsdt} = \\ &\frac{d}{ds} \frac{d}{dt} \psi(t,s) \approx\\ &\frac{d}{ds} \frac{\psi(t+h,s)-\psi(t,s)}{h} \approx\\ &\frac1{h}\frac{d}{ds} \psi(t+h,s)-\frac1{h}\frac{d}{ds} \psi(t,s) \approx\\ &\frac1{h}\frac{\psi(t+h,s+h)-\psi(t+h,s)}{h}-\frac1{h}\frac{\psi(t,s+h)-\psi(t,s)}{h} =\\ &(\psi(t+h,s+h)-\psi(t+h,s)-\psi(t,s+h)+\psi(t,s))/h^2 \end{aligned} \] Note that \(\psi(0,0)=E[X^0Y^0]=1\), so

f=function(s,t) 2*((s*exp(t)-s-t)*exp(s)+t)/(s*t^2+t*s^2)
h=0.01
z=0.00001
c(f(h,h),f(h,z),f(z,h))

## [1] 1.010059 1.006695 1.003348

(f(h,h)-f(h,z)-f(z,h)+1)/h^2

## [1] 0.1511972

1/8

## [1] 0.125

Moment Generating and Characteristic Functions

Definition (1.10.1)

Example (1.10.2)

Example (1.10.3)

Example (1.10.4)

Example (1.10.4a)

Theorem (1.10.5)

Example (1.10.6)

Example (1.10.7)

Theorem (1.10.8)

Theorem (1.10.9)

Example (1.10.9a)

Example (1.10.10)

Example (1.10.11)

Moment Generating Functions of Random Vectors

Definition

Theorem

Example