Nuisance Parameters

Example (7.8.1)

say \(X \sim Pois(\mu+b)\). We have previously studied this problem under the assumption that b is known. Now let’s consider what we can do if b is unknown.

In order to do anything we need another measurement, so say we also have Y~Pois(b), X and Y independent

Let’s say we want to find interval estimates \(\mu\).

First note that the problem does not say anything about b. In such a situation b is called a nuisance parameter.

Whether (or which) parameter is a nuisance parameter is entirely dependent on the interest of the investigator, it could just as well have been \(\mu\), or none of the two.

Frequentist Inference

Let’s derive and then invert the likelihood ratio test for this problem. First we have the joint density of X and Y:

\[f(x,y\vert\mu,b)=\frac{(\mu+b)^x}{x!}e^{-\mu-b}\frac{b^y}{y!}e^{-b}\]

so we find the mle’s as:

\[ \begin{aligned} &l(\mu,b\vert x,y) =x\log(\mu+b)-\log x!-\mu-b+y\log b-\log y!-b \propto\\ &x\log(\mu+b)+y\log b-\mu-2b\\ &\frac{dl}{d\mu} =\frac{x}{\mu+b}-1=0\\ &\hat{\mu}=x-b \\ &\frac{dl}{db} =\frac{x}{\mu+b}+\frac{y}{b}-2=0 \\ &\frac{x}{x-b+b}+\frac{y}{b}=2\\ &\hat{b}=y \end{aligned} \]

Under the null hypothesis we have \(H_0:\mu=\mu_0\), and so

\[ \begin{aligned} &\frac{dl(\mu_0,b\vert x,y)}{db} =\frac{x}{\mu_0+b}+\frac{y}{b}-2=0\\ &xb+y(\mu_0+b) -2(\mu_0+b)b = 0\\ &-2b^2+(x+y-2\mu_0)b+y\mu_0 =0 \\ &b_{1,2}=\left(-(x+y-2\mu_0)b\pm \sqrt{(x+y-2\mu_0)^2-4(-2)y\mu_0}\right)/[2(-2)]\\ &\hat{\hat{b}}=\left(x+y-2\mu_0\pm \sqrt{(x+y-2\mu_0)^2+8y\mu_0}\right)/4\\ \end{aligned} \] Now we have two solutions. Which one do we want? Let’s check a simple case:

x=10;y=5;mu0=8
(x+y-2*mu0+c(-1,1)*sqrt((x+y-2*mu0)^2+8*y*mu0))/2
## [1] -9.458236  8.458236

and so it seems we need the “+” solution.

Next we plug this into the likelihood ratio test statistic. Notice that \(\hat{\mu}+\hat{b}=x\)

\[ \begin{aligned} &\lambda(x,y) =2\left(l(\hat{\mu},\hat{b})-l(\mu_0,\hat{\hat{b}})\right) \\ \end{aligned} \]

Let’s see what this looks like:

loglike=function(mu, b)
  x*log(mu+b)+y*log(b)-mu-2*b
bhathat=function(mu) 
  (x+y-2*mu+sqrt((x+2-2*mu)^2+8*mu*y))/4
loglrt=function(mu) {
  z=mu
  for(i in seq_along(mu)) {
    b=bhathat(mu[i])
    z[i]=2*(loglike(x-y, y)-loglike(mu[i], b))
  }
  z
}
x=25;y=10
mu=seq(5,30,length=1000)
z=loglrt(mu)
plot(mu, z, type="l")
alpha=0.05
crit=qchisq(1-alpha, 1)
K=max(z)
z1=z[mu<x-y]
mu1=mu[mu<x-y]
L=(mu1)[which.min(abs(z1-K+crit))]
z2=z[mu>x-y]
mu2=mu[mu>x-y]
R=(mu2)[which.min(abs(z2-K+crit))]
abline(v=c(L,R))

round(c(L,R),2)
## [1]  7.50 23.17

Here we find the interval as the increase by \(qchisq(1-\alpha, 1)\) from the minimum.

Eliminating the nuisance parameters by fixing the parameter of interest and maximizing the likelihood ratio test statistic is called the method of profile likelihood.

Now intervals can be found using the chisquare approximation to the likelihood ratio statistic. The hard question as always is how large x and y need to be to make the approximation work. In a paper I published in 2005 my collaborators and I could show that, with some adjustments, it works even for x=0!

The resulting limits are called the Rolke-Lopez-Conrad limits. These also allow for efficiency, that is the fact that sometimes an observation can not be observed. The model then becomes

\(X \sim Pois(\epsilon\mu+b)\), \(Y \sim Pois(b)\), \(Z \sim Bin(n, \epsilon)\)

and so has two nuisance parameters. Later work extended this to multiple channels, and the model is

\(X_i \sim Pois(\epsilon_i\mu+b_i)\), \(Y_i \sim Pois(b_i)\), \(Z_i \sim Bin(n_i, \epsilon_i)\), \(i=1,..,k\)

and then there are 2k nuisance parameters.

Bayesian Solution

Here we need priors for \(\mu\) and b. Let’s use \(\mu\sim 1\) and \(b\sim 1\). Then

\[ \begin{aligned} &m(x,y) =\int_0^{\infty}\int_0^{\infty}f(x,y\vert\mu,b)\pi(\mu)\pi(b)d\mu db= \\ &\frac1{x!y!}\int_0^{\infty}b^ye^{-2b}\left[\int_0^{\infty} (\mu+b)^xe^{-\mu}d\mu \right]db \\ &\text{ } \\ &\int_0^{\infty} (t+b)^xe^{-t}dt = \\ &\int_0^{\infty} \sum_{n=0}^x{x\choose n} b^x t^{x-n}e^{-t}dt = \\ &\sum_{n=0}^x{x\choose n} b^x\int_0^{\infty} t^{x-n}e^{-t}dt = \\ &\sum_{n=0}^x{x\choose n} b^{x}\int_0^{\infty} t^{(x-n+1)-1}e^{-t}dt = \\ &\sum_{n=0}^x{x\choose n}b^x\Gamma(x-n+1) \end{aligned} \] \[ \begin{aligned} &m(x,y) = \\ &\frac1{x!y!}\int_0^{\infty}b^ye^{-2b}\left[\sum_{n=0}^x{x\choose n}b^x\Gamma(x-n+1) \right]db = \\ &\frac1{x!y!2^{y+n+1}}\sum_{n=0}^x{x\choose n}\Gamma(x-n+1)\int_0^{\infty}(2b)^{(y+n+1)-1}e^{-2b}(2db) = \\ &\frac1{x!y!2^{y+n+1}}\sum_{n=0}^x{x\choose n}\Gamma(x-n+1)\Gamma(y+n+1) \end{aligned} \]

and so the posterior distribution is given by

\[ f(\mu,b\vert x,y) =\frac{f(x,y,\mu,b)}{m(x,y)} \] and for the parameter of interest we can find the marginal

\[ \begin{aligned} &f(\mu\vert x,y) =\int f(\mu,b\vert x,y)db= \\ &\frac1{x!y!2^{x+y-n+1}m(x,y)}\sum_{n=0}^x{x\choose n}(x+y-n)!\mu^ne^{-\mu} \end{aligned} \] and now intervals can be derived from the posterior distribution, for example via the highest posterior density method.

In a paper in 2015 I could show that intervals derived in this way but then treated as frequentist confidence intervals actually have good coverage properties. If one uses the prior \(\mu\)~1/ \(\mu\), though, they do not.

Notice an important distinction between the way nuisance parameters are treated by frequentists and by Bayesians: in one case we use differentiation (to find the profile likelihood), in the other integration (to find the marginal distribution).