Below is data from a Poisson distribution truncated at 0, that is all the observations that were equal to 0 where lost:

x Freq
1 24
2 34
3 16
4 4
5 3
6 2
7 1

Find a 90% confidence interval for the Poisson rate \(\lambda\). Give some argument why your solution is works.

Let \(X\sim Pois(\lambda)\) and \(Y=X|X>0\), then

Let \(x>0\), then \[ \begin{aligned} &P(Y=x) = P(X=x|X>0)=\\ &\frac{P(X=x,X>0)}{P(X>0)}=\frac{P(X=x)}{1-P(X=0)}=\\ &\frac{\frac{\lambda^x}{x!}e^{-\lambda}}{1-e^{-\lambda}} = \frac{\lambda^x}{x!}\frac{1}{e^{\lambda}-1} \end{aligned} \]

Let’s find the mle:

\[ \begin{aligned} &L(\lambda\vert \pmb{x}) = f_y(\pmb{x}\vert\lambda) =\prod_{i=1}^n \frac{\lambda^x_i}{x_i!}\frac{1}{e^{\lambda}-1} = \frac{\lambda^{\sum x_i}}{\prod x!}\frac{1}{(e^{\lambda}-1)^n} \\ &l(\lambda\vert \pmb{x}) = \log L(\lambda\vert \pmb{x}) =\sum x_i \log \lambda - \log \prod x_i! - n\log(e^{\lambda}-1)\\ &\frac{d l(\lambda\vert \pmb{x})}{d\lambda} = \frac{\sum x_i}{\lambda}-n\frac{e^{\lambda}}{e^{\lambda}-1} = 0 \end{aligned} \] This is a non-linear equation, and so we need to use a numerical method to find the solution:

loglike <- function(l, x) {
  -(log(l)*sum(x)-length(x)*log(exp(l)-1))
}  
mle=nlminb(1,loglike, x=x)$par
mle
## [1] 1.935343

The LRT is given by

\[ \begin{aligned} &\lambda(x)=2(l(\hat{\lambda})-l(\lambda_0)) = \\ &2\left( \sum x_i \log \hat{\lambda} - \log \prod x_i! - n\log(e^{\hat{\lambda}}-1) - \sum x_i \log \lambda_0 + \log \prod x_i! + n\log(e^{\lambda_0}-1) \right) = \\ &2n\left( \bar{x}\log \hat{\lambda} - \log(e^{\hat{\lambda}}-1) - \bar{x}\log \lambda_0 + \log(e^{\lambda_0}-1) \right) \\ \end{aligned} \]

according to the large sample theory \(\lambda(\pmb{x})\sim \chi^2(1)\), and so we reject the null hypothesis if \(\lambda(\pmb{x})>qchisq(1-0.1, 1)\). Therefor the acceptance region is \(\lambda(\pmb{x})<qchisq(1-0.1, 1)\).

Let’s invert the test:

ci<- function(x,  alpha=0.1, do.Plot=FALSE) {
  cr <- qchisq(1-alpha, 1)
  mle=nlminb(1,loglike, x=x)$par
  lrt <- function(l, x) {
    2*length(x)*(mean(x)*log(mle)-log(exp(mle)-1)-mean(x)*log(l)+log(exp(l)-1))
  }
  l <- seq(1, 3, length=1000)
  y <- lrt(l, x)
  y1 <- y[l<mle]
  l1 <- l[l<mle]
  left <- l1[which.min(abs(y1-min(y1)-cr))]
  y1 <- y[l>mle]
  l1 <- l[l>mle]
  right <- l1[which.min(abs(y1-min(y1)-cr))]
  if(do.Plot) {
    plot(l, y, type="l", xlim=c(0.75*left, 1.25*right))
    abline(h=min(y)+cr)
    abline(v=c(left, right))
  }  
  round(c(left, right), 4)
}
ci(x, 0.1, TRUE)

## [1] 1.6647 2.2292

To see whether this works we do a coverage study. Note our data set has 84 observations, so the untruncated one must have had about \(length(x)/(1-\exp(-1.56))\approx\) 117 observations.

What values of \(\lambda\) should we check? Let’s find a 99% confidence interval and check points inside:

cov= function(lambda, B=1000) {
   A=matrix(0, B, 2)
   n=round(50/(1-exp(-1.56)))
   for(i in 1:B) {
      x=rpois(n, lambda)
      x=x[x>0]
      A[i, ]=ci(x)
   }
   sum(A[, 1]<lambda & lambda<A[,2])/B
}
lms=ci(x, 0.01)
l=seq(lms[1],lms[2], length=100)
y=l
for(i in seq_along(l)) y[i]=cov(l[i])
plot(l, y, ylim=c(0.8, 1), type="l")
abline(h=0.9)

and (within simulation error) this looks ok.