Solution to Final Exam ESMA 6661

Problem 1

Say we have observations \(X_1,...,X_n\sim N(\mu_x, \sigma_x)\) and \(Y_1,...,Y_m\sim N(\mu_y, \sigma_y)\), all independent, and \(\sigma_x,\sigma_y\) known.

Find a \((1-\alpha)100\%\) confidence interval for \(\delta=\mu_x-\mu_y\).

Note that \(\bar{X}\sim N(\mu_x, \sigma_x/\sqrt{n})\) and \(\bar{Y}\sim N(\mu_y, \sigma_y/\sqrt{m})\), therefore

\[\bar{X}-\bar{Y}\sim N\left(\mu_x-\mu_y, \sqrt{\sigma_x^2/n+\sigma_y^2/m}\right)\] So a \((1-\alpha)100\%\) confidence interval is given by

\[\bar{X}-\bar{Y}\pm z_{\alpha/2}\sqrt{\sigma_x^2/n+\sigma_y^2/m}\]

Do a simulation that shows that your solution works for the case \(\mu_x=0, \mu_y=1, \sigma_x=2, \sigma_y=3, n=10, m=7\) and \(90\%\) confidence interval.

B=1e5
mux=0;muy=1
sigmax=2;sigmay=3
n=10;m=7
crit=qnorm(0.95)
A=matrix(0, B, 4)
for(i in 1:B) {
  x=rnorm(n,mux,sigmax)
  y=rnorm(m,muy,sigmay)
  A[i, 1]=mean(x)-mean(y)
  A[i, 2]=var(x)/n+var(y)/m
  A[i, 3]=A[i, 1]-crit*sqrt(sigmax^2/n+sigmay^2/m)
  A[i, 4]=A[i, 1]+crit*sqrt(sigmax^2/n+sigmay^2/m)
}
sum(A[, 3]<mux-muy & mux-muy<A[,4])/B

## [1] 0.89947

Say that we can carry out a total of k=n+m experiments, and we are free to do any combination of and and m. Which should we choose? What are n and m for the case of part b, that is k=17?

We should choose n and m such that the length of the interval is shortest. Therefore

\[ \begin{aligned} &E(n) = z_{\alpha/2}\sqrt{\sigma_x^2/n+\sigma_y^2/m}=z_{\alpha/2}\sqrt{\sigma_x^2/n+\sigma_y^2/(k-n)}\\ &\frac{dE(n)}{dn} = \frac{z_{\alpha/2}\left(-\sigma_x^2/n^2+\sigma_y^2/(k-n)^2\right)}{2z_{\alpha/2}\sqrt{\sigma_x^2/n+\sigma_y^2/(k-n)}}=0\\ & -\sigma_x^2/n^2+\sigma_y^2/(k-n)^2 = 0\\ &\sigma_x^2(k-n)^2=\sigma_y^2n^2 \\ &\sigma_x^2k^2-2k\sigma_x^2n+\sigma_x^2n^2-\sigma_y^2n^2=0\\ &(k/n-1)^2=\frac{\sigma_y^2}{\sigma_x^2} \\ &k/n=1+\frac{\sigma_y}{\sigma_x}=\frac{\sigma_x+\sigma_y}{\sigma_x} \\ &n=\frac{k\sigma_x}{\sigma_x+\sigma_y} \end{aligned} \]

k=17
n=round(k*sigmax/(sigmax+sigmay))
m=k-n
c(n,m)

## [1]  7 10

curve(sigmax^2/x+sigmay^2/(k-x), 4, 10)
abline(v=n)

Problem 2

Below is data from a distribution with density

\[f(x\vert \alpha)=\alpha+(1-\alpha)\frac1{\sqrt{2\pi0.1^2}}\exp \left\{-\frac1{2\times0.1^2}(x-0.5)^2 \right\}\]

that is a mixture of a U[0,1] and a N(0.5,0.1) random variable. Find a 95% Bayesian credible interval for \(\alpha\) if the prior is \(\alpha\sim U[0,1]\).

  [1] 0.00 0.03 0.05 0.06 0.09 0.10 0.15 0.15 0.18 0.19 0.20 0.20 0.21 0.27 0.28
 [16] 0.29 0.30 0.30 0.30 0.31 0.33 0.34 0.34 0.34 0.35 0.35 0.35 0.36 0.36 0.36
 [31] 0.37 0.38 0.38 0.39 0.39 0.41 0.41 0.41 0.43 0.43 0.43 0.44 0.44 0.45 0.46
 [46] 0.46 0.46 0.46 0.47 0.47 0.47 0.47 0.48 0.49 0.50 0.50 0.50 0.50 0.50 0.51
 [61] 0.52 0.52 0.53 0.53 0.53 0.53 0.55 0.55 0.56 0.57 0.57 0.57 0.57 0.59 0.60
 [76] 0.61 0.61 0.63 0.64 0.65 0.68 0.68 0.68 0.68 0.70 0.71 0.72 0.73 0.77 0.78
 [91] 0.78 0.80 0.81 0.82 0.87 0.93 0.94 0.98 0.99 0.99

\[ \begin{aligned} &f(\pmb{x},\alpha) =\prod_{i=1}^n \left[\alpha+(1-\alpha)\frac1{\sqrt{2\pi0.1^2}}\exp \left\{-\frac1{2\times0.1^2}(x_i-0.5)^2 \right\}\right]\\ &m(\pmb{x})=\int_0^1 \prod_{i=1}^n \left[\alpha+(1-\alpha)\frac1{\sqrt{2\pi0.1^2}}\exp \left\{-\frac1{2\times0.1^2}(x_i-0.5)^2 \right\}\right] d\alpha \end{aligned} \] Finding this integral analytically is impossible, so we proceed numerically:

m=1
post=function(a) {
  y=a
  for(i in seq_along(a))
    y[i]=prod(a[i]+(1-a[i])*dnorm(x, 0.5, 0.1))
  y/m
}  
m=integrate(post, 0, 1)$value
a=seq(0, 1, length=1000)
y=post(a)
plot(a, y, type="l")

z=cumsum(y)*(a[2]-a[1])
plot(a, z, type="l")
k=which.min(abs(z-0.025))
l=which.min(abs(z-0.975))
abline(v=a[c(k,l)])

round(a[c(k,l)], 3)

## [1] 0.381 0.700

Problem 3

In an experiment we observe the counts \(X\) of some event. However, a certain percentage of events fall below some thresh hold and can not be observed. An auxiliary experiment is done to yield information on this efficiency. Overall, we have a model of the form

\[X\sim Pois(\epsilon\lambda),Z\sim Bin(n,\epsilon)\] where n is known, X and Z are independent. Specifically, consider the case \(X=67;n=1000;Z=780\)

Find the mles of \(\lambda\) and \(\epsilon\).
Find the p-value of the test \(H_0:\lambda=60\) vs \(H_a:\lambda>60\).

\[ \begin{aligned} &f(x,z\vert\lambda,\epsilon) =\frac{(\epsilon\lambda)^{x}}{x!}e^{-\epsilon\lambda}{n\choose z}\epsilon^z(1-\epsilon)^{n-z} \\ &l(\lambda,\epsilon\vert x,z) = x\log(\epsilon\lambda)- \log x!-\epsilon\lambda+ \log {n\choose z}+z\log \epsilon+(n-z)\log(1-\epsilon)\\ &\frac{dl}{d\lambda} = \frac{x}{\lambda}-\epsilon=0\\ &\frac{dl}{d\epsilon} = \frac{x}{\epsilon}-\lambda+\frac{z}{\epsilon}-\frac{n-z}{1-\epsilon}=0\\ &\lambda=\frac{x}{\epsilon}\\ &(x+z)(1-\epsilon)-\lambda\epsilon(1-\epsilon)-(n-z)\epsilon=0\\ &(x+z)(1-\epsilon)-x(1-\epsilon)-(n-z)\epsilon=0 \\ &z(1-\epsilon)-(n-z)\epsilon=0\\ &\hat{\epsilon}=\frac{z}{n}\\ &\hat{\lambda}=\frac{x}{\hat{\epsilon}}= \frac{nx}{z}\\ \end{aligned} \]

x=67;n=1000;z=780
ehat=z/n
lhat=n*x/z
c(ehat, lhat)

## [1]  0.78000 85.89744

Consider the test \(H_0:\lambda=\lambda_0\) vs \(H_a:\lambda>\lambda_0\). Under the null hypothesis we have

\[ \begin{aligned} &l(\lambda_0,\epsilon\vert x,z) = x\log(\epsilon\lambda_0)- \log x!-\epsilon\lambda_0+ \log {n\choose z}+z\log \epsilon+(n-z)\log(1-\epsilon)\\ &\frac{dl}{d\epsilon} = \frac{x+z}{\epsilon}-\lambda_0-\frac{n-z}{1-\epsilon}=0\\ &(x+z)(1-\epsilon) -\lambda_0\epsilon(1-\epsilon) -(n-z)\epsilon=0 \\ &x+z-(n+x)\epsilon -\lambda_0\epsilon+\lambda_0\epsilon^2 =0 \\ &\hat{\hat{\epsilon}} =\left(n+x+\lambda_0\pm\sqrt{(n+x+\lambda_0)^2-4\lambda_0(x+z)}\right)/(2\lambda_0) \\ \end{aligned} \]

If the large sample approximation holds we have

\[\lambda(x,z)=2\left(l(\hat{\lambda},\hat{\epsilon}\vert x,z)-l(\lambda_0,\hat{\hat{\epsilon}}\vert x,z)\right)\sim \chi^2(1)\]

lambda0=60
loglike=function(l,e) x*log(e*l)-e*l+z*log(e)+(n-z)*log(1-e)
epshathat=(n+x+lambda0+(c(-1,1)*sqrt((n+x+lambda0)^2-4*lambda0*(x+z))))/2/lambda0
epshathat

## [1]  0.7843014 17.9990319

\(\hat{\hat{\epsilon}}\) has to be in [0,1], so it is the first solution

epshathat=epshathat[1]
lrt=2*(loglike(lhat,ehat)-loglike(lambda0,epshathat))
c(lrt, 1-pchisq(lrt,1))

## [1] 7.568547869 0.005939562

of course we could also have done all of this numerically:

loglike=function(p) -(x*log(p[1]*p[2])-p[1]*p[2]+z*log(p[2])+(n-z)*log(1-p[2]))
loglike1=function(e) -(x*log(e*lambda0)-e*lambda0+z*log(e)+(n-z)*log(1-e))
mle=optim(c(80,0.78), loglike)$par 
ehathat=nlminb(0.78, loglike1)$par 
c(mle, ehathat)

## [1] 85.8966610  0.7800129  0.7843014

Problem 4

Say we have a single observation \(X\) which takes values in \(\{0,1,2\}\) with \(P(X=k) = \frac{p+k}{3(p+1)}\).

Find the method of moments estimator of p.

We observe either \(x=0,1\) or \(2\). For the methof of moments estimator we need to solve the equation \(E[X^k]=x\) for some k. Now

\[ \begin{aligned} &x=E[X^k] = 0^k\times \frac{p+0}{3(p+1)}+1^k\times \frac{p+1}{3(p+1)}+2^k\times \frac{p+2}{3(p+1)} = \\ &\frac{1}{3(p+1)}\left(p+1+2^k(p+2)\right) = \frac{(1+2^k)p+1+2^{k+1}}{3(p+1)}\\ \end{aligned} \] Let’s see what this looks like for some value of k

par(mfrow=c(2,2))
for(k in 1:4)
 curve(((1+2^k)*x+1+2^(k+1))/3/(x+1), 0, 1, ylab="", xlab="p", main=paste0("k=",k))

so the expected value is never equal to 0, 1 or 2, and so the method of moments estimator does not exist

Find the maximum likelihood estimator of p

\[L(p\vert k) = \frac{p+k}{3(p+1)}\]

par(mfrow=c(2,2))
for(k in 0:2)
 curve((x+k)/3/(x+1), 0, 1, ylab="", xlab="p")

so we find

\[ \hat{p}=\left\{ \begin{matrix} 1 & \text{ if } & k=0 \\ 0 & \text{ if } & k=2 \\ \end{matrix} \right. \] and any number in [0,1] if k=1.

Find the Bayesian estimator for p if the prior is \(p\sim U[0,1]\) and using the posterior mean.

Do all the work analytically!

\[ \begin{aligned} &\sum_{k=0}^3 f(k\vert p) = c\left( p+0+p+1+p+2\right) = c(3p+3) = 1\\ &c = 1/[3(p+1)]\\ &f(k,p) = (p+k)/[3(p+1)]\\ &m(k) = \frac13\int_0^1 \frac{p+k}{p+1}dp = \\ &\frac13\int_0^1 \frac{p}{p+1}dp +\frac{k}3\int_0^1 \frac{1}{p+1}dp = \\ &\frac13\left(p-\log (p+1)\vert_0^1\right)+\frac{k}3\left(\log (p+1)\vert_0^1\right)=\\ &\frac13\left(1-\log (2) \right)+\frac{k}3\log (2)=\\ &\frac13+\frac{k-1}3\log (2) \end{aligned} \] and so the posterior distribution is given by

\[f(p\vert k) = \frac{(p+k)/[3(p+1)]}{\frac13+\frac{k-1}3\log (2)}= \frac{p+k}{(p+1)(1+(k-1)\log 2)}\] The mean of the posterior is given by

\[ \begin{aligned} &E[p\vert k] = \frac1{1+(k-1)\log 2}\int_0^1 p\frac{p+k}{p+1}dp = \\ &\int_0^1 p\frac{p+k}{p+1}dp = \int_0^1 \frac{p^2}{p+1}dp+k\int_0^1 \frac{p}{p+1}dp =\\ &\frac12(p+1)^2-2(p+1)+\log(p+1)\vert_0^1 + k\left(p-\log (p+1)\vert_0^1\right) = \\ &\frac12(2)^2-2(2)+\log(2) - \frac12 1+2 + k(1-\log(2)) = \\ &k+(1-k)\log(2)-\frac12 \\ &E[p\vert k] = \frac{k+(1-k)\log(2)-\frac12}{1+(k-1)\log 2} \end{aligned} \]

k=0:2
round((k+(1-k)*log(2)-1/2)/(1+(k-1)*log(2)), 3)

## [1] 0.629 0.500 0.477

Problem 5

Below we have observations from a rv \(X\) such that \(\frac{X-\mu}\sigma\sim t(3)\).

2.54 7.4 7.82 10.69 15.39 18.11 19.75 21.38 22.76 22.82 23.07 23.22 23.25 23.34 23.76 23.79 23.83 23.85 23.97 23.97 24.2 24.33 24.33 24.49 24.6 24.72 24.81 24.99 25.09 25.13 25.27 25.29 25.55 25.57 25.59 25.66 25.82 25.93 26.01 26.06 26.09 26.29 26.32 26.45 26.51 26.57 26.67 26.76 26.78 26.8 26.91 26.94 26.96 26.97 27 27.07 27.17 27.26 27.36 27.39 27.43 27.5 27.53 27.59 27.6 27.66 27.66 27.68 27.7 27.73 27.74 27.78 27.79 27.81 27.84 27.89 27.9 28.08 28.12 28.14 28.2 28.28 28.39 28.39 28.4 28.4 28.47 28.55 28.55 28.56 28.57 28.62 28.62 28.63 28.66 28.66 28.9 28.93 28.95 28.95 28.96 28.97 29.01 29.01 29.11 29.12 29.25 29.32 29.42 29.46 29.51 29.52 29.62 29.63 29.66 29.69 29.71 29.81 29.83 29.83 29.87 29.89 29.91 29.93 29.95 29.96 30.06 30.07 30.07 30.09 30.09 30.2 30.23 30.23 30.25 30.27 30.32 30.32 30.33 30.44 30.46 30.47 30.5 30.59 30.62 30.67 30.77 30.8 30.8 30.85 30.87 30.9 30.9 30.95 30.99 31.05 31.07 31.13 31.24 31.25 31.25 31.33 31.34 31.45 31.57 31.62 31.65 31.67 31.68 31.73 31.78 31.79 31.79 31.92 31.94 31.95 32 32.05 32.05 32.15 32.16 32.21 32.23 32.24 32.25 32.3 32.43 32.46 32.51 32.76 32.82 32.86 32.9 32.9 32.97 33.15 33.17 33.23 33.29 33.38 33.38 33.39 33.42 33.43 33.45 33.52 33.55 33.59 33.65 33.67 33.67 33.76 34.09 34.12 34.29 34.52 34.68 34.99 34.99 35.16 35.2 35.24 35.29 35.31 35.46 35.59 35.63 35.68 35.73 35.85 35.99 36.1 36.58 36.63 36.85 36.95 36.96 37.04 38.13 38.23 38.61 38.67 39.22 40.84 42.48 42.53 48.55 49.31 50 60.64

Find the mle’s of \(\mu,\sigma\). Compare them to the mles based on the normal distribution.

Say \(Y\sim t(n)\), then

\[f_y(y)=\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi n}}\frac1{(1+t^2/n)^{(n+1)/2}}\]

\[ \begin{aligned} &F_X(x) = P(X\le x) =\\ &P(\frac{X-\mu}\sigma\le \frac{x-\mu}\sigma) = F_y(\frac{x-\mu}\sigma)\\ &f_X(x) = f_y(\frac{x-\mu}\sigma)\frac1\sigma = \\ &\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi n}}\frac1{(1+(\frac{x-\mu}\sigma)^2/n)^{(n+1)/2}}\frac1\sigma = \\ &\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi n}}\frac{(n\sigma^2)^{(n+1)/2}}{(n\sigma^2+(x-\mu)^2)^{(n+1)/2}}\frac1\sigma = \\ &\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi }}\frac{n^{n/2}\sigma^{n}}{(n\sigma^2+(x-\mu)^2)^{(n+1)/2}} \end{aligned} \]

\[l(\mu,\sigma) = K+mn\log(\sigma)-\frac{n+1}2\sum_{i=1}^m\log\left(n\sigma^2+(x-\mu)^2\right)\]

loglike=function(par, n=3)
  -(length(x)*n*log(par[2])-(n+1)/2*sum(log(n*par[2]^2+(x-par[1])^2)))
mle=optim(c(mean(x),sd(x)), loglike)$par
par(mfrow=c(1,2))
a=seq(20, 40, length=100)
z=a
for(i in 1:100) z[i]=loglike(c(a[i], mle[2]))
plot(a,z,type="l")
a=seq(3, 4,length=100)
z=a
for(i in 1:100) z[i]=loglike(c(mle[1], a[i]))
plot(a,z,type="l")

mle

## [1] 30.143637  3.255504

c(mean(x), sd(x))

## [1] 29.935500  5.506038

so we see that the estimator of \(\sigma\) is much smaller.

Give some argument why this is a better solution than the one based on the normal distribution.

Let’s draw the histogram with the densities:

xnew=seq(min(x),max(x), length=250)
y1=dt( (xnew-mle[1])/ mle[2], 3)/mle[2]
y2=dnorm(xnew, mean(x), sd(x))
df1=data.frame(x=c(xnew, xnew), 
              y=c(y1,y2),
              Model=rep(c("t", "Normal"), each=250))
df=data.frame(x=x)
bw <- diff(range(x))/50 
ggplot(df, aes(x)) +
  geom_histogram(aes(y = ..density..),
    color = "black", 
    fill = "white", 
    binwidth = bw) + 
    labs(x = "x", y = "Density") +
  geom_line(data=df1, aes(x,y, color=Model))

so the t model clearly is better.