expectation.knit

Expectation and Variance

Definition (1.7.1)

The expectation (or expected value) of a function \(g\) of a random variable \(X\) is defined by

\(X\) discrete

\[E[g(X)]=\sum_x g(x)f(x)\]

\(X\) continuous

\[E[g(X)]= \int_{-\infty}^{\infty} g(x)f(x)dx\]

if \(E[|g(X)|]<\infty\). Notice the absolute value in the condition.

Let \(\mu= E[X]\), then \(\mu\) is called the mean of X.

\(\sigma^2 = var(X)=E[(X-\mu)^2]\) is called the variance of X. The square root of the variance \(\sigma\) is called the standard deviation.

Example (1.7.2)

Say \(X\) is the sum of two dice. Find \(E[X]\), \(E[X^2]\) and \(E[1/X]\).

we have

x	P(X=x)
2	1/36
3	2/36
4	3/36
5	4/36
6	5/36
7	6/36
8	5/36
9	4/36
10	3/36
11	2/36
12	1/36

so \(E[X] = 2\times 1/36 + 3\times 2/36+4\times 3/36+\) \(5\times 4/36+6\times 5/36+7\times 6/36+8\times 5/36+9\times 4/36+\) \(10\times 3/36+11\times 2/36+12\times 3/36 = 7\)

\(E[X^2] = 2^2\times 1/36+3^2\times 2/36+4^2\times 3/36+\) \(5^2\times 4/36+6^2\times 5/36+7^2\times 6/36+8^2\times 5/36+\) \(9^2\times 4/36+10^2\times 3/36+11^2\times 2/36+12^2\times 3/36 = 54.83\)

\(E[1/X] = 1/2\times 1/36+1/3\times 2/36+1/4\times 3/36+\) \(1/5\times 4/36+1/6\times 5/36+1/7\times 6/36+1/8\times 5/36+\) \(1/9\times 4/36+1/10\times 3/36+1/11\times 2/36+1/12\times 3/36 = 0.172\)

Note (Sample Mean - Expectation)

If we have a data set \(\{x_1,...,x_n\}\) from some random variable \(X\) we can find the sample mean

\[\bar{x}=\frac1n\sum_{i=1}^n x_i\]

and later we will see that under some mild conditions

\[\bar{x}\rightarrow E[X]\]

here is a simulation illustrating this using the sum of two dice:

n=1e4
dice=matrix(sample(1:6,size=2*n,replace=TRUE),ncol=2)
sm=apply(dice,1,sum)
c(mean(sm),mean(sm^2),mean(1/sm))

## [1]  6.9903000 54.6613000  0.1678531

Example (1.7.3)

we roll fair die until the first time we get a six. What is the expected number of rolls?

We saw that the density is given by \(f(x) = 1/6(5/6)^{x-1}\) if \(x \in \{1,2,..\}\). Here we just have \(g(x)=x\), so

\[E[X]=\sum_{k=1}^\infty g(k)f(k) = \sum_{k=1}^\infty k\frac16 (\frac56)^{k-1}\]

How do we compute this sum? Here is a “standard” trick that uses calculus:

\[ \begin{aligned} &\sum_{k=1}^\infty kt^{k-1} = \\ &\sum_{k=1}^\infty \left(\frac{d}{dt}t^k\right)=\\ &\sum_{k=0}^\infty \left(\frac{d}{dt}t^k\right)= \{\frac{d}{dt} t^0=0\}\\ &\frac{d}{dt} \left(\sum_{k=0}^\infty t^k\right) = \{*\}\\ &\frac{d}{dt} \frac1{1-t} = \frac1{(1-t)^2}\\ \end{aligned} \]

where the interchange of infinite sum and derivative is justified because both sides converge. So we find with t=5/6:

\[E[X] = \frac16 \frac1{(1-5/6)^2}=6\]

This is a special example of a geometric rv, that is a discrete rv \(X\) with pdf \(f(x)=p(1-p)^{x-1}, x=1,2,..\).

Note that if we replace 1/6 above with p, we can show that

\[E[X] = 1/p\]

we write \(X \sim\) Geom(p)

Theorem (1.7.4)

let \(X,Y\) be some random variables, and let a,b be some real numbers. Then

\[ \begin{aligned} &E[aX+b] =aE[X]+b \\ &E[X+Y] =E[X]+E[Y] \\ &var(aX+b)=a^2var(X)\\ &var(X) =E[X^2]-E[X]^2 \\ \end{aligned} \]

proof (all for \(X\) discrete)

\[ \begin{aligned} &E[aX+b] =\sum_x (ax+b)f(x)=\\ &a\sum_x xf(x)+b\sum_x f(x) =aE[X]+b\\ &E[X+Y] =\sum_{x,y} (x+y)f(x, y)=\\ &\sum_{x,y} xf(x, y)+\sum_{x,y} yf(x, y)= \\ &\sum_{x} x\left[\sum_yf(x, y)\right]+\sum_{y} y\left[\sum_xf(x, y)\right]= \\ &\sum_{x} xf_X(x) +\sum_{y} yf_Y(y) = \\ &E[X]+E[Y] \end{aligned} \] \[ \begin{aligned} &var(aX+b) =E\left[\left(aX+b-E[aX+b]\right)^2\right] = \\ &E\left[\left(aX+b-aE[X]-b\right)^2\right] = \\ &a^2E\left[\left(X-E[X]\right)^2\right] = a^2var(X)\\ &var(X) =E\left[\left(X-E[X]\right)^2\right]=\\ &E\left[X^2-2XE[X]+E[X]^2\right]=\\ &E[X^2]-2E[X]E[X]+E[X]^2=E[X^2]-E[X]^2 \end{aligned} \]

Comment combining the first two we have

\[E[aX+bY]=aEX+bEY\]

so expectations are linear functions in their arguments, one of their most important properties. Note that it holds even when \(X\) and Y are not independent!

Example (1.7.5)

Say \(X\) is the sum of two dice. What is \(var(X)\)?

\[var(X) = E[X^2]-E[X]^2 = 54.83-7^2 = 5.83\]

Example (1.7.5a)

Say X has density \(f(x)=cx^2(1-x)^2;0<x<1\). We want to find the mean and the standard deviation.

\[ \begin{aligned} &E[X^k] = \int_{-\infty}^{\infty} x^k f(x) dx = \\ &\int_0^1 x^k cx^2(1-x)^2 dx = \\ &c\int_0^1 x^{k+2}-2x^{k+3}+x^{k+4} dx = \\ &c\left( \frac1{k+3}x^{k+3}-\frac2{k+4}x^{k+4}+\frac1{k+5}x^{k+5} |_0^1\right. = \\ &c\left(\frac1{k+3}-\frac2{k+4}+\frac1{k+5}\right)\\ &\\ &1=E[X^0]=c\left(\frac1{3}-\frac2{4}+\frac1{5}\right)=\frac{c}{30}\\ &E[X]=30\left(\frac1{4}-\frac2{5}+\frac1{6}\right)=\frac{1}{2}\\ &E[X^2]=30\left(\frac1{5}-\frac2{6}+\frac1{7}\right)=\frac{2}{7}\\ &var(X)=E[X^2]-(E[X])^2=\frac{2}{7}-(\frac{1}{2})^2=\frac{1}{28}\\ &sd(X)=\sqrt{\frac{1}{28}}=\frac{1}{2\sqrt 7} \end{aligned} \]

Example (1.7.6)

find the mean and the standard deviation of a uniform \([A,B]\) r.v.

We will use a common trick for this: say \(X \sim U[0,1]\), and let \(Y=(B-A)X+A\), then for \(A<y<B\)

\[F_Y (y) = P(Y<y) = P( ((B-A)X+A <y ) =\] \[P(X <(y-A)/(B-A) ) = (y-A)/(B-A)\] and so

\[f_Y (y) = 1/(B-A) \text{ for } A<y<B\]

so \(Y \sim U[A,B]\). So \(X\) and \(Y\) are closely related, and often if we want to calculate something for \(Y\) we can do so by first calculating it for X, and then going to Y:

\[E[X^k] =\int_0^1 x^k\times 1dx=\frac{x^{k+1}}{k+1}|_0^1 = \frac{1}{k+1}\] so

\[ \begin{aligned} &E[X] =\frac12 \\ &var(X) =E[X^2]-E[X]^2=\frac13-(\frac12)^2=\frac1{12} \\ &E[Y] =E[(B-A)X+A]=(B-A)E[X]+A= \\ &(B-A)\frac12+A=\frac{A+B}2\\ &var(Y)=var((B-A)X+A)=(B-A)^2var(X)=(B-A)^2/12 \end{aligned} \]

Definition (1.7.7)

\(\mu_k = E[X^k]\) is called the \(k^{th}\) moment of \(X\).

\(\kappa_k = E[(X-\mu)^k]\) is called the \(k^{th}\) central moment of \(X\).

\(\gamma_1 = \frac{\kappa_3}{\kappa_2 ^{3/2}}\) is called the skewness of \(X\).

\(\gamma_2 = \frac{\kappa_4}{\kappa_2^2}-3=\frac{\kappa_4}{\sigma^4}-3\) is called the kurtosis of \(X\).

Some authors define the kurtosis as \(\frac{\kappa_4}{\sigma^4}\) and call \(\gamma_2\) the excess kurtosis of X.

We have

\[ \begin{aligned} &\kappa_k = E[(X-\mu)^k] =\\ &E\left[\sum_{j=0}^k{k\choose j} X^k\mu^{k-j}\right] = \\ &\sum_{j=0}^k{k\choose j} E[X^k]\mu^{k-j} = \\ &\sum_{j=0}^k{k\choose j} \mu_k \mu^{k-j} \end{aligned} \]

The kurtosis of a distribution measure its “peakness”, that is how sharp its maximum is. A distribution with \(\gamma_2 =0\) is called mesokurtic. This is the case for example for a standard normal (see later), which is then a kind of baseline example. If \(\gamma_2 <0\) it is called platykurtic and has a broader peak and thinner tails. If \(\gamma_2 >0\) it is called leptokurtic meaning it has a sharper peak than the standard normal and heavier tails.

Example (1.7.8)

say \(X\) has pdf

\[f(x)=\frac1{\sqrt{2\pi \tau}}\exp\left\{-\frac{x^2}{2 \tau} \right\}\]

for \(x \in \mathbb{R}\) and \(\tau>0\)

Then \(E[X^k]=0\) for all odd numbers \(k\) because then \(x^kf(x)\) is an odd function. For even moments we find

\[ \begin{aligned} &E[X^{2k}] = \int_{-\infty}^{\infty} x^{2k}\frac1{\sqrt{2\pi \tau}}\exp\left\{-\frac{x^2}{2 \tau} \right\}dx=\\ &\tau^k\int_{-\infty}^{\infty} (\frac{x}{\sqrt \tau})^{2k}\frac1{\sqrt{2\pi }}\exp\left\{-\frac{1}{2}(\frac{x}{\sqrt \tau})^2 \right\}(\frac{dx}{\sqrt \tau})=\\ &\frac{\tau^k}{\sqrt{2\pi }}\int_{-\infty}^{\infty} t^{2k}\exp\left\{-\frac{1}{2}t^2 \right\}dt=\\ &\frac{\tau^k}{\sqrt{2\pi }}\int_{-\infty}^{\infty} t^{2k-1}\times t\exp\left\{-\frac{1}{2}t^2 \right\}dt=\\ &\frac{\tau^k}{\sqrt{2\pi }}\left[t^{2k-1}(-\exp\left\{-\frac{1}{2}t^2 \right\}|_{-\infty}^\infty-\int_{-\infty}^{\infty} (2k-1)t^{2k-2}(-\exp\left\{-\frac{1}{2}t^2 \right\})dt\right] = \\ &\frac{\tau^k}{\sqrt{2\pi }}(2k-1)\int_{-\infty}^{\infty} t^{2k-2}\exp\left\{-\frac{1}{2}t^2 \right\}dt = \\ &\tau^k(2k-1)E[X^{2k-2}] \end{aligned} \]

using this recursion formula we find

\[ \begin{aligned} &\mu=E[X]=0\\ &E[X^2] =\tau(2\times1-1)E[X^0]=\tau \\ &E[X^4] =\tau(2\times2-1)E[X^2]=3\tau^2 \\ &\\ &var(X) = E[(X-\mu)^2]=E[X^2]=\tau\\ &\kappa_4=E[(X-\mu)^4]=E[X]^4=3\tau^2\\ &\\ &\gamma_2=\frac{\kappa_4}{\kappa_2^2}-3=\frac{3\tau^2}{\tau^2}-3=0 \end{aligned} \]

Theorem (1.7.9)

Say \(X\) is a non-negative rv, that is \(P(X \ge 0)=1\). Then

if \(X\) is a discrete rv we have

\[E[X]=\sum_{x=1}^\infty P(X\ge x)=\sum_{x=1}^\infty\left[1-F(x-1)\right]\]

if \(X\) is a continuous rv we have

\[E[X]=\int_0^\infty P(X>x)dx=\int_0^\infty\left[1-F(x)\right]dx\] proof

X discrete:

\[ \begin{aligned} &E[X] =\sum_i if(i) = \sum_i (\sum_{x=1}^i 1) f(x) =\\ &\sum_{x,i} f(x) =\sum_{x=1}^\infty \left[\sum_{i=x}^\infty f(x)\right]=\sum_{x=1}^\infty P(X\ge x) \\ \end{aligned} \]

X continuous

\[ \begin{aligned} &E[X]=\int_0^\infty tf(t) dt = \int_0^\infty (\int_0^t 1dx)f(t) dt = \\ &\int_0^\infty \left(\int_x^\infty f(t)dt\right) dx = \int_0^\infty P(X>x) dx \end{aligned} \]

Example (1.7.10)

say \(X \sim\) Geom(p), then

\[ \begin{aligned} &F(x) = P(X\le x) =\sum_{k=1}^x p(1-p)^{k-1} = \\ &p\sum_{i=0}^{x-1} (1-p)^{i} = p \frac{1-(1-p)^x}{1-(1-p)}=1-(1-p)^x\\ & \\ &E[X]=\sum_{x=1}^\infty\left[1-F(x-1)\right]=\\ &\sum_{x=1}^\infty\left[1-(1-(1-p)^{x-1})\right]=\\ &\sum_{x=1}^\infty(1-p)^{x-1}=\\ &\sum_{k=0}^\infty(1-p)^{k}=\frac1{1-(1-p)}=\frac1{p} \end{aligned} \]

Example (1.7.11)

Say \((X,Y)\) is a discrete rv with joint pdf \(f(x,y)=cp^x, x,y \in \{0,1,..\}\), \(y\le x\), and \(0<p<1\). Find c.

We already found c in the last chapter by summing first over y and then over x. We can use the above for an even simpler proof:

\[ \begin{aligned} &1 =\sum_{x,y} f(x,y) = \\ &\sum_{x=0}^\infty \sum_{y=0}^x cp^x = \\ &\sum_{x=0}^\infty cp^x \left(\sum_{y=0}^x 1 \right) = \\ &c\sum_{x=0}^\infty(x+1) p^x =\\ &c\sum_{x=1}^\infty x p^{x-1} =\\ &\frac{c}{1-p}\sum_{x=1}^\infty x(1-p) p^{x-1} =\\ &\frac{c}{1-p}E[G]=\frac{c}{(1-p)^2} \end{aligned} \]

where \(G\) is a geometric rv with rate 1-p.

Example (1.7.12)

Say \(X \sim\) U[0,1]. Find the kurtosis.

We saw before that \(E[X^k]=\frac1{k+1}\), so

\[ \begin{aligned} &\mu = E[X] =\frac12\\ &E[X^2] =\frac13 \\ &E[X^3] =\frac14 \\ &E[X^4] =\frac15 \\ &\\ &\kappa_4 = E\left[(X-\mu)^4\right] = \\ &E[X^4]-4\mu E[X^3]+6\mu^2 E[X]^2-4\mu^3 E[X]+\mu^4 =\\ &\frac15-4\frac12\frac14+6(\frac12)^2\frac13-4(\frac12)^3\frac12+(\frac12)^4=\frac1{80} &\\ &\gamma_2 = \frac{\kappa_4}{\kappa_2^2}-3= \frac{1/80}{(1/12)^2}-3=-1.2 \end{aligned} \]

so a \(U[0,1]\) is platykurtic.

Example (1.7.13)

Say \(X\) is a rv with \(f(x)=c/(1+x^2)\). (\(X\) is called a Cauchy random variable). Find c and show that \(E[X]\) does not exist.

\[ \begin{aligned} & \int_{-\infty}^{\infty} f(x) dx = \int_{-\infty}^{\infty} c/(1+x^2)dx = \\ &c \lim_{t\rightarrow \infty} \int_{-t}^ t 1/(1+x^2) dx = \\ &c \lim_{t\rightarrow \infty} \left\{\arctan(t)-\arctan(-t) \right\} = \\ &c \left\{ \frac{\pi}{2}-(-\frac{\pi}{2})\right\}=c\pi\\ &\\ &E[|X|] = \int_{-\infty}^{\infty} |x|f(x) dx = \\ &2\lim_{t\rightarrow \infty}\int_{0}^{t} \pi x/(1+x^2)dx = \\ &2\pi\lim_{t\rightarrow \infty}\log(1+t^2)=\infty \end{aligned} \]

What does it mean that \(E[X]\) does not exist? Essentially \(f(x)\rightarrow 0\) sufficiently fast so that \(\int f(x) dx <\infty\) and it is a density, but so slowly that \(\int |x|f(x) dx =\infty\). We say the this distribution has heavy tails.

If a real live experiment is described by a Cauchy rv, then most of the time the data is numbers between about -5 and 5 but on occasion one gets numbers way far away, maybe 100000. Calculating the mean of the data it can be literally anything. There are many real-live circumstances where something like that happens.

Example (1.7.14)

Find the mean and the standard deviation of an exponential rv with rate \(\lambda\).

So \(X\) has density \(f(x)=\lambda e^{-\lambda x};x>0\), and so

\[ \begin{aligned} &\mu=E[X] = \int_{0}^{\infty} x\times\lambda e^{-\lambda x} dx=\\ &-xe^{-\lambda x}\vert_0^\infty - \int_{0}^{\infty} - e^{-\lambda x} dx = \\ &\int_{0}^{\infty} e^{-\lambda x} dx = \frac1{\lambda}\int_{0}^{\infty} \lambda e^{-\lambda x} dx=\frac1{\lambda}\\ \end{aligned} \]

\[ \begin{aligned} &E[X^2] = \int_{0}^{\infty} x^2\times\lambda e^{-\lambda x} dx=\\ &-x^2e^{-\lambda x}\vert_0^\infty - \int_{0}^{\infty} - 2xe^{-\lambda x} dx = \\ &\frac2{\lambda}\int_{0}^{\infty} x\lambda e^{-\lambda x} dx \\ &\frac2{\lambda}\frac1{\lambda} =\frac2{\lambda^2}\\ &var(X) = \frac2{\lambda^2}-(\frac1{\lambda})^2=\frac1{\lambda^2}\\ &sd(X)=\sqrt{var(X)} = \frac1{\lambda} \end{aligned} \]

X is a non-negative random variable, and so we could also have used theorem 1.7.9 to find the mean:

\[ \begin{aligned} &P(X>x) = 1-P(X<x)=1-\int_0^x \lambda e^{-\lambda t}dt=\\ &1-\left( -e^{-\lambda t}|_0^x \right.= e^{-\lambda x}\\ &E[X] =\int_0^\infty P(X>x)dx = \\ &\int_0^\infty e^{-\lambda x} dx = -\frac1{\lambda}e^{-\lambda x}|_0^\infty=\frac1{\lambda} \end{aligned} \] Note The function \(h(x)=P(X>x)\) is often called the hazard function.

Example (1.7.15)

Let \(X\) be a rv with pdf \(f(x)=(a+1)x^a\), \(0<x<1, a>0\). For what values of a is \(X\) mesokurtic, platykurtic or leptokurtic?

\[ \begin{aligned} &E[X^k] = \int_{0}^{1} x^k(a+1)x^adx = \\ &\frac{a+1}{a+k+1}x^{a+k+1}|_0^1 = \frac{a+1}{a+k+1}\\ &\kappa_2 = var(X) = E[X^2]-(E[X])^2=\frac{a+1}{a+3}-(\frac{a+1}{a+2})^2\\ &\kappa_4 = E[X^4]-4\mu E[X^3]+6\mu^2 E[X]^2-4\mu^3 E[X]+\mu^4 =\\ &\frac{a+1}{a+5}-4\frac{a+1}{a+2}\frac{a+1}{a+4}+4(\frac{a+1}{a+2})^2\frac{a+1}{a+3}-\\ &4(\frac{a+1}{a+2})^3\frac{a+1}{a+2}-3(\frac{a+1}{a+2})^4 \\ &\\ &\gamma_2 = \frac{\kappa_4}{\kappa_2^2}-3 \end{aligned} \]

This is a rather complicated function of a, so it is best to use a computer to do a graph:

f <- function(a) {
  muk <- function(a, k) (a+1)/(a+k+1)
  mu <- muk(a, 1)
  sig2 <- muk(a, 2) - mu^2
  mu4 <-   muk(a, 4) - 
         4*muk(a, 3)*mu +
         6*muk(a, 2)*mu^2 - 
         4*muk(a, 1)*mu^3 + mu^4
  mu4/sig2^2-3  
}
a <- seq(0, 10, length=1000)
y <- f(a)
ggplot(data=data.frame(a=a, y=y), aes(a, y)) +
  geom_line(color="blue", size=1.2)

max(a[y<0])

## [1] 1.851852

therefore \(X\) is platykurtic for \(a<1.85\), mesokurtic for \(a=1.85\) and leptokurtic \(a>1.85\).

There is a way to “link” probabilities and expectations is via the indicator function \(I_A(x)\) defined as

\[I_A(x)=\left\{\begin{array}.1&\text{ if }x\in A\\0&\text{ if }x\not\in A\end{array}\right.\] because with this we have for a (continuous) r.v. \(X\) with density f:

\[E[I_A(X)]= \int_{-\infty}^{\infty} I_A(X)f(x)dx = \int_A f(x)dx=P(X\in A)\]

Theorem (1.7.16)

say we have a nonnegative rv \(X\), that is \(P(X \ge 0)=1\). Then \(P(X=0)=1\) iff \(E[X]=0\).

proof

say \(P(X=0)=1\), then \(X\) is a discrete rv with pdf \(f(0)=1\) and so \(E[X]=0\times 1=0\)

Now say \(E[X]=0\). Assume \(P(X=0)<1\), therefore

\[P(X>0) = 1-P(X=0) > 1-1 = 0\] so there exists \(\delta >0\) and \(\epsilon >0\) such that \(P(X> \delta )> \epsilon\). Then

\(X\) discrete

\[ \begin{aligned} &E[X] =\sum_x xP(X=x) \ge \sum_{x>\delta} xP(X=x)\ge\\ &\sum_{x>\delta} \delta P(X=x) = \delta\sum_{x>\delta} P(X=x)=\\ &\delta P(X>\delta) =\delta\epsilon>0 \\ \end{aligned} \]

\(X\) continuous

\[ \begin{aligned} &E[X] =\int_0^\infty xf(x)dx \ge \int_{\delta}^\infty xf(x)dx\ge\\ &\int_{\delta}^\infty \delta f(x)dx = \delta\int_{\delta}^\infty f(x)dx=\\ &\delta P(X>\delta) =\delta\epsilon>0 \\ \end{aligned} \]

in either case we have a contradiction with \(E[X]=0\).

Expectations of Random Vectors

The definition of expectation easily generalizes to random vectors:

Example (1.7.17)

say \((X,Y)\) is a discrete random vector with joint pdf

	1	2
0	0.1	0.1
1	0.0	0.5
2	0.1	0.2

Find \(E[XY]\), \(E[X^Y]\) and \(E[X/Y^2]\).

Let’s use R for this:

#E[XY]
0*1*0.1 + 0*2*0.1 +1*1*0 + 1*2*0.5 + 2*1*0.1 + 2*2*0.2

## [1] 2

#E[X^Y]
0^1*0.1 + 0^2*0.1 +1^1*0 + 1^2*0.5 + 2^1*0.1 + 2^2*0.2

## [1] 1.5

#E[X/Y^2]
2*0.1 + 0*1/2^2*0.1 + 1*1/1^2*0 + 1*1/2^2*0.5 + 2*1/1^2*0.1 + 2*1/2^2*0.2

## [1] 0.625

Example (1.7.18)

Let \((X,Y)\) be a discrete random vector with \(f(x,y) = (1/2)^{x+y}\), \(x \ge 1, y \ge 1\). Find \(E[XY]\)

\[ \begin{aligned} &E[XY] =\sum_{x,y} xy(1/2)^{x+y} =\\ &\sum_{x,y} [x(1/2)^{x}][y(1/2)^{y}] = \\ & \left\{ \sum_{x=1}^\infty x(1/2)^{x}\right\}^2 = \\ & \left\{ \sum_{x=1}^\infty x(1-1/2)(1/2)^{x-1}\right\}^2 = (\frac1{1/2})^2=4\\ \end{aligned} \]

because the sum is the mean of a geometric rv with p=1/2. Next

Example (1.7.19)

say \((X,Y)\) is a continuous rv with \(f(x,y)=cx(x+y)\) if \(0<x,y<1\). Find \(E[X^2Y]\).

Now we could first find the constant c, and then \(E[X^2Y]\), or:

\[ \begin{aligned} &E[X^kY^j]=\int_{-\infty}^\infty \int_{-\infty}^\infty x^ky^jf(x,y) dxdy = \\ &c\int_0^1 \left(\int_0^1 x^ky^jx(x+y)dx \right) dy = \\ &c\int_0^1 \left(\int_0^1 x^{k+2}y^j+x^{k+1}y^{j+1}dx \right) dy = \\ &c\int_0^1 \left(\frac1{k+3} x^{k+3}y^j+\frac1{k+2} x^{k+2}y^{j+1} \right|_0^1 dy = \\ &c\int_0^1 \frac1{k+3} y^j+\frac1{k+2} y^{j+1} dy = \\ &c\left(\frac1{(k+3)(j+1)} y^{j+1}+\frac1{(k+2){(j+2)}} y^{j+2}|_0^1 \right)=\\ &c\left(\frac1{(k+3)(j+1)} +\frac1{(k+2){(j+2)}} \right)\\ \end{aligned} \]

Now we have

\[E[X^0Y^0]=\int \int x^0y^0 f(x,y)dxdy = \int \int f(x,y)dxdy=1\]

\[ \begin{aligned} &c = 1/\left(\frac1{(0+3)(0+1)} +\frac1{(0+2){(0+2)}} \right) = \\ &1/\left( \frac13 +\frac14 \right) = 12/7\\ &E[X^2Y] = \frac{12}7\left(\frac1{(2+3)(1+1)} +\frac1{(2+2){(1+2)}} \right) = \\ &\frac{12}7\left(\frac1{10} +\frac1{12} \right) = \\ &\frac{12(12+10)}{7*10*12} =\frac{11}{7*5}=0.3143\\ \end{aligned} \]

Example (1.7.20)

say \((X,Y)\) is a continuous rv with \(f(x,y)=c\) if \(0<y<x^a<1\) for some \(a>0\). Find \(E[XY]\).

Notice that the definition of f does not include x or y, so what we have here is a uniform rv on the area described by \(0<y<x^a<1\), shown here for \(a=1/2\):

\[ \begin{aligned} &E[X^kY^k] = \int_{0}^{1}\int_0^{x^a} x^ky^k cdydx = \\ &c\int_0^1 x^k (\frac1{k+1}y^{k+1}|_0^{x^a}dx = \\ &c\int_0^1 x^k \frac1{k+1}x^{a(k+1)}dx = \\ &\frac{c}{k+1}\int_0^1 x^{ak+a+k}dx = \\ &\frac{c}{(k+1)(ak+a+k+1)}\\ &\\ &1=E[X^0Y^0]=\frac{c}{a+1}\\ &E[XY]=\frac{a+1}{2(2a+2)}=\frac14 \end{aligned} \]