Say we have observations \(X_1,...,X_n\sim N(\mu_x, \sigma_x)\) and \(Y_1,...,Y_m\sim N(\mu_y, \sigma_y)\), all independent, and \(\sigma_x,\sigma_y\) known.
Note that \(\bar{X}\sim N(\mu_x, \sigma_x/\sqrt{n})\) and \(\bar{Y}\sim N(\mu_y, \sigma_y/\sqrt{m})\), therefore
\[\bar{X}-\bar{Y}\sim N\left(\mu_x-\mu_y, \sqrt{\sigma_x^2/n+\sigma_y^2/m}\right)\] So a \((1-\alpha)100\%\) confidence interval is given by
\[\bar{X}-\bar{Y}\pm z_{\alpha/2}\sqrt{\sigma_x^2/n+\sigma_y^2/m}\]
B=1e5
mux=0;muy=1
sigmax=2;sigmay=3
n=10;m=7
crit=qnorm(0.95)
A=matrix(0, B, 4)
for(i in 1:B) {
x=rnorm(n,mux,sigmax)
y=rnorm(m,muy,sigmay)
A[i, 1]=mean(x)-mean(y)
A[i, 2]=var(x)/n+var(y)/m
A[i, 3]=A[i, 1]-crit*sqrt(sigmax^2/n+sigmay^2/m)
A[i, 4]=A[i, 1]+crit*sqrt(sigmax^2/n+sigmay^2/m)
}
sum(A[, 3]<mux-muy & mux-muy<A[,4])/B
## [1] 0.89947
We should choose n and m such that the length of the interval is shortest. Therefore
\[ \begin{aligned} &E(n) = z_{\alpha/2}\sqrt{\sigma_x^2/n+\sigma_y^2/m}=z_{\alpha/2}\sqrt{\sigma_x^2/n+\sigma_y^2/(k-n)}\\ &\frac{dE(n)}{dn} = \frac{z_{\alpha/2}\left(-\sigma_x^2/n^2+\sigma_y^2/(k-n)^2\right)}{2z_{\alpha/2}\sqrt{\sigma_x^2/n+\sigma_y^2/(k-n)}}=0\\ & -\sigma_x^2/n^2+\sigma_y^2/(k-n)^2 = 0\\ &\sigma_x^2(k-n)^2=\sigma_y^2n^2 \\ &\sigma_x^2k^2-2k\sigma_x^2n+\sigma_x^2n^2-\sigma_y^2n^2=0\\ &(k/n-1)^2=\frac{\sigma_y^2}{\sigma_x^2} \\ &k/n=1+\frac{\sigma_y}{\sigma_x}=\frac{\sigma_x+\sigma_y}{\sigma_x} \\ &n=\frac{k\sigma_x}{\sigma_x+\sigma_y} \end{aligned} \]
k=17
n=round(k*sigmax/(sigmax+sigmay))
m=k-n
c(n,m)
## [1] 7 10
curve(sigmax^2/x+sigmay^2/(k-x), 4, 10)
abline(v=n)
Below is data from a distribution with density
\[f(x\vert \alpha)=\alpha+(1-\alpha)\frac1{\sqrt{2\pi0.1^2}}\exp \left\{-\frac1{2\times0.1^2}(x-0.5)^2 \right\}\]
that is a mixture of a U[0,1] and a N(0.5,0.1) random variable. Find a 95% Bayesian credible interval for \(\alpha\) if the prior is \(\alpha\sim U[0,1]\).
[1] 0.00 0.03 0.05 0.06 0.09 0.10 0.15 0.15 0.18 0.19 0.20 0.20 0.21 0.27 0.28
[16] 0.29 0.30 0.30 0.30 0.31 0.33 0.34 0.34 0.34 0.35 0.35 0.35 0.36 0.36 0.36
[31] 0.37 0.38 0.38 0.39 0.39 0.41 0.41 0.41 0.43 0.43 0.43 0.44 0.44 0.45 0.46
[46] 0.46 0.46 0.46 0.47 0.47 0.47 0.47 0.48 0.49 0.50 0.50 0.50 0.50 0.50 0.51
[61] 0.52 0.52 0.53 0.53 0.53 0.53 0.55 0.55 0.56 0.57 0.57 0.57 0.57 0.59 0.60
[76] 0.61 0.61 0.63 0.64 0.65 0.68 0.68 0.68 0.68 0.70 0.71 0.72 0.73 0.77 0.78
[91] 0.78 0.80 0.81 0.82 0.87 0.93 0.94 0.98 0.99 0.99
\[ \begin{aligned} &f(\pmb{x},\alpha) =\prod_{i=1}^n \left[\alpha+(1-\alpha)\frac1{\sqrt{2\pi0.1^2}}\exp \left\{-\frac1{2\times0.1^2}(x_i-0.5)^2 \right\}\right]\\ &m(\pmb{x})=\int_0^1 \prod_{i=1}^n \left[\alpha+(1-\alpha)\frac1{\sqrt{2\pi0.1^2}}\exp \left\{-\frac1{2\times0.1^2}(x_i-0.5)^2 \right\}\right] d\alpha \end{aligned} \] Finding this integral analytically is impossible, so we proceed numerically:
m=1
post=function(a) {
y=a
for(i in seq_along(a))
y[i]=prod(a[i]+(1-a[i])*dnorm(x, 0.5, 0.1))
y/m
}
m=integrate(post, 0, 1)$value
a=seq(0, 1, length=1000)
y=post(a)
plot(a, y, type="l")
z=cumsum(y)*(a[2]-a[1])
plot(a, z, type="l")
k=which.min(abs(z-0.025))
l=which.min(abs(z-0.975))
abline(v=a[c(k,l)])
round(a[c(k,l)], 3)
## [1] 0.381 0.700
In an experiment we observe the counts \(X\) of some event. However, a certain percentage of events fall below some thresh hold and can not be observed. An auxiliary experiment is done to yield information on this efficiency. Overall, we have a model of the form
\[X\sim Pois(\epsilon\lambda),Z\sim Bin(n,\epsilon)\] where n is known, X and Z are independent. Specifically, consider the case \(X=67;n=1000;Z=780\)
Find the mles of \(\lambda\) and \(\epsilon\).
Find the p-value of the test \(H_0:\lambda=60\) vs \(H_a:\lambda>60\).
\[ \begin{aligned} &f(x,z\vert\lambda,\epsilon) =\frac{(\epsilon\lambda)^{x}}{x!}e^{-\epsilon\lambda}{n\choose z}\epsilon^z(1-\epsilon)^{n-z} \\ &l(\lambda,\epsilon\vert x,z) = x\log(\epsilon\lambda)- \log x!-\epsilon\lambda+ \log {n\choose z}+z\log \epsilon+(n-z)\log(1-\epsilon)\\ &\frac{dl}{d\lambda} = \frac{x}{\lambda}-\epsilon=0\\ &\frac{dl}{d\epsilon} = \frac{x}{\epsilon}-\lambda+\frac{z}{\epsilon}-\frac{n-z}{1-\epsilon}=0\\ &\lambda=\frac{x}{\epsilon}\\ &(x+z)(1-\epsilon)-\lambda\epsilon(1-\epsilon)-(n-z)\epsilon=0\\ &(x+z)(1-\epsilon)-x(1-\epsilon)-(n-z)\epsilon=0 \\ &z(1-\epsilon)-(n-z)\epsilon=0\\ &\hat{\epsilon}=\frac{z}{n}\\ &\hat{\lambda}=\frac{x}{\hat{\epsilon}}= \frac{nx}{z}\\ \end{aligned} \]
x=67;n=1000;z=780
ehat=z/n
lhat=n*x/z
c(ehat, lhat)
## [1] 0.78000 85.89744
Consider the test \(H_0:\lambda=\lambda_0\) vs \(H_a:\lambda>\lambda_0\). Under the null hypothesis we have
\[ \begin{aligned} &l(\lambda_0,\epsilon\vert x,z) = x\log(\epsilon\lambda_0)- \log x!-\epsilon\lambda_0+ \log {n\choose z}+z\log \epsilon+(n-z)\log(1-\epsilon)\\ &\frac{dl}{d\epsilon} = \frac{x+z}{\epsilon}-\lambda_0-\frac{n-z}{1-\epsilon}=0\\ &(x+z)(1-\epsilon) -\lambda_0\epsilon(1-\epsilon) -(n-z)\epsilon=0 \\ &x+z-(n+x)\epsilon -\lambda_0\epsilon+\lambda_0\epsilon^2 =0 \\ &\hat{\hat{\epsilon}} =\left(n+x+\lambda_0\pm\sqrt{(n+x+\lambda_0)^2-4\lambda_0(x+z)}\right)/(2\lambda_0) \\ \end{aligned} \]
If the large sample approximation holds we have
\[\lambda(x,z)=2\left(l(\hat{\lambda},\hat{\epsilon}\vert x,z)-l(\lambda_0,\hat{\hat{\epsilon}}\vert x,z)\right)\sim \chi^2(1)\]
lambda0=60
loglike=function(l,e) x*log(e*l)-e*l+z*log(e)+(n-z)*log(1-e)
epshathat=(n+x+lambda0+(c(-1,1)*sqrt((n+x+lambda0)^2-4*lambda0*(x+z))))/2/lambda0
epshathat
## [1] 0.7843014 17.9990319
\(\hat{\hat{\epsilon}}\) has to be in [0,1], so it is the first solution
epshathat=epshathat[1]
lrt=2*(loglike(lhat,ehat)-loglike(lambda0,epshathat))
c(lrt, 1-pchisq(lrt,1))
## [1] 7.568547869 0.005939562
of course we could also have done all of this numerically:
loglike=function(p) -(x*log(p[1]*p[2])-p[1]*p[2]+z*log(p[2])+(n-z)*log(1-p[2]))
loglike1=function(e) -(x*log(e*lambda0)-e*lambda0+z*log(e)+(n-z)*log(1-e))
mle=optim(c(80,0.78), loglike)$par
ehathat=nlminb(0.78, loglike1)$par
c(mle, ehathat)
## [1] 85.8966610 0.7800129 0.7843014
Say we have a single observation \(X\) which takes values in \(\{0,1,2\}\) with \(P(X=k) = \frac{p+k}{3(p+1)}\).
We observe either \(x=0,1\) or \(2\). For the methof of moments estimator we need to solve the equation \(E[X^k]=x\) for some k. Now
\[ \begin{aligned} &x=E[X^k] = 0^k\times \frac{p+0}{3(p+1)}+1^k\times \frac{p+1}{3(p+1)}+2^k\times \frac{p+2}{3(p+1)} = \\ &\frac{1}{3(p+1)}\left(p+1+2^k(p+2)\right) = \frac{(1+2^k)p+1+2^{k+1}}{3(p+1)}\\ \end{aligned} \] Let’s see what this looks like for some value of k
par(mfrow=c(2,2))
for(k in 1:4)
curve(((1+2^k)*x+1+2^(k+1))/3/(x+1), 0, 1, ylab="", xlab="p", main=paste0("k=",k))
so the expected value is never equal to 0, 1 or 2, and so the method of moments estimator does not exist
\[L(p\vert k) = \frac{p+k}{3(p+1)}\]
par(mfrow=c(2,2))
for(k in 0:2)
curve((x+k)/3/(x+1), 0, 1, ylab="", xlab="p")
so we find
\[ \hat{p}=\left\{ \begin{matrix} 1 & \text{ if } & k=0 \\ 0 & \text{ if } & k=2 \\ \end{matrix} \right. \] and any number in [0,1] if k=1.
Do all the work analytically!
\[ \begin{aligned} &\sum_{k=0}^3 f(k\vert p) = c\left( p+0+p+1+p+2\right) = c(3p+3) = 1\\ &c = 1/[3(p+1)]\\ &f(k,p) = (p+k)/[3(p+1)]\\ &m(k) = \frac13\int_0^1 \frac{p+k}{p+1}dp = \\ &\frac13\int_0^1 \frac{p}{p+1}dp +\frac{k}3\int_0^1 \frac{1}{p+1}dp = \\ &\frac13\left(p-\log (p+1)\vert_0^1\right)+\frac{k}3\left(\log (p+1)\vert_0^1\right)=\\ &\frac13\left(1-\log (2) \right)+\frac{k}3\log (2)=\\ &\frac13+\frac{k-1}3\log (2) \end{aligned} \] and so the posterior distribution is given by
\[f(p\vert k) = \frac{(p+k)/[3(p+1)]}{\frac13+\frac{k-1}3\log (2)}= \frac{p+k}{(p+1)(1+(k-1)\log 2)}\] The mean of the posterior is given by
\[ \begin{aligned} &E[p\vert k] = \frac1{1+(k-1)\log 2}\int_0^1 p\frac{p+k}{p+1}dp = \\ &\int_0^1 p\frac{p+k}{p+1}dp = \int_0^1 \frac{p^2}{p+1}dp+k\int_0^1 \frac{p}{p+1}dp =\\ &\frac12(p+1)^2-2(p+1)+\log(p+1)\vert_0^1 + k\left(p-\log (p+1)\vert_0^1\right) = \\ &\frac12(2)^2-2(2)+\log(2) - \frac12 1+2 + k(1-\log(2)) = \\ &k+(1-k)\log(2)-\frac12 \\ &E[p\vert k] = \frac{k+(1-k)\log(2)-\frac12}{1+(k-1)\log 2} \end{aligned} \]
k=0:2
round((k+(1-k)*log(2)-1/2)/(1+(k-1)*log(2)), 3)
## [1] 0.629 0.500 0.477
Below we have observations from a rv \(X\) such that \(\frac{X-\mu}\sigma\sim t(3)\).
2.54 7.4 7.82 10.69 15.39 18.11 19.75 21.38 22.76 22.82 23.07 23.22 23.25 23.34 23.76 23.79 23.83 23.85 23.97 23.97 24.2 24.33 24.33 24.49 24.6 24.72 24.81 24.99 25.09 25.13 25.27 25.29 25.55 25.57 25.59 25.66 25.82 25.93 26.01 26.06 26.09 26.29 26.32 26.45 26.51 26.57 26.67 26.76 26.78 26.8 26.91 26.94 26.96 26.97 27 27.07 27.17 27.26 27.36 27.39 27.43 27.5 27.53 27.59 27.6 27.66 27.66 27.68 27.7 27.73 27.74 27.78 27.79 27.81 27.84 27.89 27.9 28.08 28.12 28.14 28.2 28.28 28.39 28.39 28.4 28.4 28.47 28.55 28.55 28.56 28.57 28.62 28.62 28.63 28.66 28.66 28.9 28.93 28.95 28.95 28.96 28.97 29.01 29.01 29.11 29.12 29.25 29.32 29.42 29.46 29.51 29.52 29.62 29.63 29.66 29.69 29.71 29.81 29.83 29.83 29.87 29.89 29.91 29.93 29.95 29.96 30.06 30.07 30.07 30.09 30.09 30.2 30.23 30.23 30.25 30.27 30.32 30.32 30.33 30.44 30.46 30.47 30.5 30.59 30.62 30.67 30.77 30.8 30.8 30.85 30.87 30.9 30.9 30.95 30.99 31.05 31.07 31.13 31.24 31.25 31.25 31.33 31.34 31.45 31.57 31.62 31.65 31.67 31.68 31.73 31.78 31.79 31.79 31.92 31.94 31.95 32 32.05 32.05 32.15 32.16 32.21 32.23 32.24 32.25 32.3 32.43 32.46 32.51 32.76 32.82 32.86 32.9 32.9 32.97 33.15 33.17 33.23 33.29 33.38 33.38 33.39 33.42 33.43 33.45 33.52 33.55 33.59 33.65 33.67 33.67 33.76 34.09 34.12 34.29 34.52 34.68 34.99 34.99 35.16 35.2 35.24 35.29 35.31 35.46 35.59 35.63 35.68 35.73 35.85 35.99 36.1 36.58 36.63 36.85 36.95 36.96 37.04 38.13 38.23 38.61 38.67 39.22 40.84 42.48 42.53 48.55 49.31 50 60.64
Say \(Y\sim t(n)\), then
\[f_y(y)=\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi n}}\frac1{(1+t^2/n)^{(n+1)/2}}\]
so
\[ \begin{aligned} &F_X(x) = P(X\le x) =\\ &P(\frac{X-\mu}\sigma\le \frac{x-\mu}\sigma) = F_y(\frac{x-\mu}\sigma)\\ &f_X(x) = f_y(\frac{x-\mu}\sigma)\frac1\sigma = \\ &\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi n}}\frac1{(1+(\frac{x-\mu}\sigma)^2/n)^{(n+1)/2}}\frac1\sigma = \\ &\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi n}}\frac{(n\sigma^2)^{(n+1)/2}}{(n\sigma^2+(x-\mu)^2)^{(n+1)/2}}\frac1\sigma = \\ &\frac{\Gamma(\frac{n+1}2)}{\Gamma(\frac{n}2)}\frac1{\sqrt{\pi }}\frac{n^{n/2}\sigma^{n}}{(n\sigma^2+(x-\mu)^2)^{(n+1)/2}} \end{aligned} \]
\[l(\mu,\sigma) = K+mn\log(\sigma)-\frac{n+1}2\sum_{i=1}^m\log\left(n\sigma^2+(x-\mu)^2\right)\]
loglike=function(par, n=3)
-(length(x)*n*log(par[2])-(n+1)/2*sum(log(n*par[2]^2+(x-par[1])^2)))
mle=optim(c(mean(x),sd(x)), loglike)$par
par(mfrow=c(1,2))
a=seq(20, 40, length=100)
z=a
for(i in 1:100) z[i]=loglike(c(a[i], mle[2]))
plot(a,z,type="l")
a=seq(3, 4,length=100)
z=a
for(i in 1:100) z[i]=loglike(c(mle[1], a[i]))
plot(a,z,type="l")
mle
## [1] 30.143637 3.255504
c(mean(x), sd(x))
## [1] 29.935500 5.506038
so we see that the estimator of \(\sigma\) is much smaller.
Let’s draw the histogram with the densities:
xnew=seq(min(x),max(x), length=250)
y1=dt( (xnew-mle[1])/ mle[2], 3)/mle[2]
y2=dnorm(xnew, mean(x), sd(x))
df1=data.frame(x=c(xnew, xnew),
y=c(y1,y2),
Model=rep(c("t", "Normal"), each=250))
df=data.frame(x=x)
bw <- diff(range(x))/50
ggplot(df, aes(x)) +
geom_histogram(aes(y = ..density..),
color = "black",
fill = "white",
binwidth = bw) +
labs(x = "x", y = "Density") +
geom_line(data=df1, aes(x,y, color=Model))
so the t model clearly is better.