Methods for Finding Interval Estimates

Inverting a Hypothesis Test

Hypothesis testing and confidence intervals are closely related, in fact, a hypothesis test can always be turned into a confidence interval and vice versa. Say we have a hypothesis test with type I error probability \(\alpha\) and we define the acceptance region of the test \(A(\theta_0)\) as the complement of the critical region when testing \(H_0: \theta=\theta_0\). That is A consists of all those points in \(\mathbf{R}^n\) that would have lead to a failure to reject the null hypothesis.

Define the set \(C(\pmb{x})\) in the parameter space by

\[C(\pmb{x}) = \left\{\theta: \pmb{x} \in A(\theta) \right\}\]

In other words, the confidence interval is the set of all parameters for which the hypothesis test would have accepted \(H_0\).

Example (6.2.1)

Let \(X_1,..,X_n \sim N(\mu, \sigma)\). Say we want to find a confidence interval for \(\mu\). To do this we first need a hypothesis test for

\[H_0: \mu=\mu_0\text{ vs }H_1: \mu \ne \mu_0\]

Of course we have the 1-sample t-test with the test statistic

\[T=\sqrt{n}\frac{\bar{x}-\mu_0}{s}\]

which rejects the null if \(|T|> t_{\alpha/2, n-1}\). So it accepts \(H_0\) if \(|T|\le t_{\alpha/2, n-1}\) and we have the acceptance region

\[A(\mu_0) = \left\{\pmb{x} \vert |T|\le t_{\alpha/2, n-1} \right\}\]

Now

\[ \begin{aligned} &1-\alpha=P\left(\vert T\vert <t_{n-1,\alpha/2}\right) = \\ &P\left(\vert \sqrt{n}\frac{\bar{X}-\mu}{s}\vert <t_{n-1,\alpha/2}\right) = \\ &P\left(-t_{n-1,\alpha/2}< \sqrt{n}\frac{\bar{X}-\mu}{s} <t_{n-1,\alpha/2}\right) = \\ &P\left(\mu- t_{n-1,\alpha/2}\frac{s}{\sqrt{n}}<\bar{X}< \mu + t_{n-1,\alpha/2}\frac{s}{\sqrt{n}}\right) = \\ &P\left(\bar{X} - t_{n-1,\alpha/2}\frac{s}{\sqrt{n}}<\mu< \bar{X} + t_{n-1,\alpha/2}\frac{s}{\sqrt{n}}\right) \end{aligned} \]

and so a \(100(1-\alpha)\%\) confidence interval for \(\mu\) is given by \(\bar{x} \pm t_{n-1, \alpha/2}\frac{s}{\sqrt{n}}\)

Example (6.2.2)

Let’s look at another example to illustrate the important point here. Let’s say we have \(X_1,..,X_n \sim Ber(p)\) and we want to find a \(100(1-\alpha)\%\) CI for p. To do so we first need to find a test for

\[H_0:p=p_0\text{ vs }H_a: p \ne p_0\]

Let \(x=\sum x_i\) be the number of successes. We have previously found the likelihood ratio test to be: reject \(H_0\) if \(\vert x-np_0\vert >c\).

What is c? We have

\[ \begin{aligned} & \alpha = P_{p=p_0}(\text{reject H}_0) = \\ &P_{p=p_0}(|X-np_0|> c) = \\ &1-P_{p=p_0}(|X-np_0|\le c) = \\ &1-P_{p=p_0}(np_0-c \le X \le np_0+ c) \end{aligned} \]

so

\[1-\alpha =pbinom(np_0+c,n,p_0)-pbinom(np_0-c-1,n,p_0)\]

Let’s use R and a simple search to find c:

find.c <- function(p, n=100, alpha=0.05) {
  k <- 0
  repeat {
    k <- k + 1
    if(pbinom(n*p+k, n, p) - 
       pbinom(n*p-k-1, n, p) > 1 - alpha) 
            break
  }
  k
  cv <- k
  x <- 0:n
  y <- pbinom(n*p+x, n, p) - pbinom(n*p-x-1, n, p)
  reject <- ifelse(x<n*p-cv | x>n*p+cv, TRUE, FALSE)
  data.frame(x, y, p=rep(p, n+1), reject)
}
df <- find.c(0.4)
ggplot(data=df[1:20, ], aes(x, y)) +
  geom_point() +
  geom_vline(xintercept = 10) +
  geom_hline(yintercept = 0.95) +
  xlab("c") + ylab("")

Notice that the actual probability is a bit higher than 0.95. This is again because we have a discrete random variable.

So for \(p_0=0.4\) and n=100 we find c=10. Therefore the test is as follows: reject the null hypothesis if \(\vert x-40\vert >10\). This is the same as \(x<30\) or \(x>50\).

We can illustrate this acceptance region as follows:

ggplot(data=df, aes(x, p, color=reject)) +
  geom_point() + ylim(c(0,1))

This plots a dot for each possible observation x (0-n), in red if observing this value leads to accepting the null hypothesis, in blue if it means rejecting \(H_0\).

Let’s do this now for other values of p as well:

p <- seq(0.01, 0.99, length=50)
df <- find.c(p[1])
for(i in 2:50)
  df <- rbind(df, find.c(p[i]))
ggplot(data=df, aes(x, p, color=reject)) +
  geom_point() + ylim(c(0,1))

This graph tells us everything about the test (for a fixed n).

For example, say we wish to test \(H_0: p=0.25\) and we observe \(x=31\), then

ggplot(data=df, aes(x, p, color=reject)) +
  geom_point() + 
  geom_hline(yintercept = 0.25, size=2) +
  geom_vline(xintercept = 31, size=2)

shows that we should accept the null hypothesis because the intersection of the two lines is in the red (acceptance) region.

The idea of inverting the test is now very simple: for a fixed (observed) x0, what values of p lead to accepting the null hypothesis? That is, for a given vertical line which p’s are in the red region?

x0 <- 31
z <- rep(FALSE, dim(df)[1])
z[df$x==x0 & !df$reject] <- TRUE
p0 <- range(df$p[z])
ggplot(data=df, aes(x, p, color=reject)) +
  geom_point() + 
  geom_segment(x=x0, y=p0[1],xend=x0, yend=p0[2], 
               size=2, color="black") 

p0
## [1] 0.23 0.41

here is the same graph as before, but now finding those values of p which lead to accepting the null hypothesis. This gives us the 95% confidence interval of (0.23, 0.41).

In essence we have the following:

  • to do a hypothesis test fix p0 on the y axis and scan across the graph horizontally.

  • to find a confidence interval fix x0 on the x axis and scan across the graph vertically.

Example (6.2.3)

Suppose we have \(X_1,..,X_n \sim Exp(\beta)\) and we want a confidence interval for \(\beta\). Again we start by considering a hypothesis test:

\[H_0: \beta=\beta_0\text{ vs. }H_1: \beta \ne \beta_0\]

In (5.4.1) we found the LRT for this problem to be

\[(-2)\log\lambda (\pmb{x})=(-2)\left[n\log (\beta_0 \bar{x})+n-\beta_0 \bar{x}\right]\] and so the acceptance region is given by

\[A(\beta)= \left\{\pmb{x}: n\log (\beta_0 \bar{x})+n-\beta_0 \bar{x}\ge c \right\} \] Let’s draw the graph:

n <- 50; beta <- 1 
xbar <- mean(rexp(n, 1/beta))
lrt <- function(x, b=1) n*log(x*b)-n*b*x
ggcurve(fun=lrt, A=0.5, B=1.75) +
  scale_y_continuous(labels = NULL) +
  geom_hline(yintercept = lrt(1/xbar)-3, size=1.2) +
  ylab("C(x)")

and so we reject the null if \(\bar{x}\) is either small or large. To invert the test we can do the graph again, but now as a function of beta with \(\bar{x}\) fixed:

lrt <- function(b, x=xbar) n*log(x*b)-n*b*x
ggcurve(fun=lrt, A=0.4, B=1.5) +
  scale_y_continuous(labels = NULL) +
  geom_hline(yintercept = lrt(xbar)-3, size=1.2) +
  ylab(expression("A("*beta*")"))

The expression defining the confidence interval depends on \(\pmb{x}\) only through \(\bar{x}\) . So we can write it in the form

\[C(\mathbf{x}) = \left\{\beta: L(\bar{x} ) \le \beta \le U(\bar{x} )\right\}\]

for some functions L and U which are determined so that the interval has probability \(1-\alpha\).

Also note that the height of the curve at the left and the right confidence interval limit is the same, so

\[(L(\bar{x})\bar{x})^n\exp\left(-nL(\bar{x})\bar{x}\right) = (U(\bar{x})\bar{x})^n\exp\left(-nU(\bar{x})\bar{x}\right)\] Let’s denote \(a=L(\bar{x})\bar{x}\) and \(b=U(\bar{x})\bar{x}\), then we have the equation

\[a^ne^{-a}=b^ne^{-b}\] Now note that

\[\sum X_i \sim \Gamma(n,1/\beta)\]

and it is easy to show that then

\[\beta \sum X_i \sim \Gamma(n,1)\]

and so

\[ \begin{aligned} &1-\alpha = P\left({a/\bar{X}<\beta<b/\bar{X}}\right) = \\ &P\left({a<\beta\bar{X}<b}\right) = \\ &P\left({na<\beta\sum X<nb}\right)= \\ &pgamma(nb,n,1)-pgamma(na,n,1) \end{aligned} \]

So the confidence interval becomes

\[\left\{\beta: a/\bar{x} \le \beta \le b/\bar{x}\right\}\]

where a and b satisfy

\[ \begin{aligned} &a^ne^{-a}=b^ne^{-b} \\ &1-\alpha = pgamma(nb,n,1)-pgamma(na,n,1) \end{aligned} \]

This system of nonlinear equations will of course have to be solved numerically.

Example (6.2.4)

Say \(X_1,..,X_n \sim Pois(\lambda)\). Let \(Y=\sum X_i\), than \(Y \sim Pois(n\lambda)\). Say we observe \(Y=y_0\). We have previously found a hypothesis test for

\[H_0: \lambda=\lambda_0\text{ vs }H_a: \lambda \ne \lambda_0\]

It had the acceptance region

\[A(\lambda_0) = \left\{\text{qpois}(\alpha/2,n\lambda_0)/n \le \bar{x} \le \text{qpois}(1-\alpha/2,n\lambda_0)/n\right\}\]

Inverting this test means solving the equations

\[ \begin{aligned} &\sum_{i=0}^{y_0} \frac{(nL)^i}{i!}e^{-nL} = \frac{\alpha}2\\ &\sum_{i=y_0}^{\infty} \frac{(nU)^i}{i!}e^{-nU} = \frac{\alpha}2 \end{aligned} \]

In general this might have to be done numerically. Here, though, we can take advantage of the equation linking the Poisson and the Gamma distributions:

If \(X \sim Gamma(n,\beta)\) and \(Y \sim Pois(x/\beta)\) then \(P(X \le x) = P(Y \ge n)\).

Using \(\beta=2, n=y_0+1, x = 2n\lambda\) we have

\[ \begin{aligned} &\frac{\alpha}2=P(Y\le y_0) =\\ &P(Y<y_0+1) = \\ &1-P(Y\ge y_0+1)=\\ &1-P(Y\ge \frac{2(y_0+1)}2)= \\ &P(X\le 2n\lambda) = \\ &P(X>2n\lambda) \end{aligned} \] and so

\[\lambda=qchisq(\alpha/2,2(y_0+1)/(2n)\]

Using a similar calculation for the lower bound we find

\[qchisq(1-\alpha/2,2(y_0+1)/(2n)<\lambda=qchisq(\alpha/2,2(y_0+1)/(2n)\] where if \(y_0=0\) we have \(\chi^2(1-\alpha/2,0) = 0\)

Let’s implement this method:

poisci <- function(x, alpha = 0.05) {
    c(qchisq(alpha/2, 2*sum(x)), 
      qchisq(1-alpha/2, 2*(sum(x)+1))
      )/2/length(x)
}
round(poisci(rpois(30, 5.8)), 3)
## [1] 5.217 7.015

This method was first invented by Garwood in 1932.

This confidence interval has correct coverage by construction, so we don’t need to worry about that as we would if our method used some approximation, say. Nevertheless, let’s do a coverage study of our method. As before we can do this without simulation:

n <- 10; alpha = 0.05
X <- matrix(0, 30*n+1, 3)
X[, 1] <- 0:(30*n)
X[, 2] <- qchisq(alpha/2, 2*X[, 1])/2/n
X[, 3] <- qchisq(1-alpha/2, 2*(X[, 1]+1))/2/n
lambda <- seq(5, 10, length=250)
Coverage = 0*lambda
for(i in 1:250) {
  tmp <- X[X[, 2]<lambda[i] & lambda[i]<X[, 3], ]
  Coverage[i] = sum(dpois(tmp, n*lambda[i]))
}
df <- data.frame(x=lambda, y=Coverage)
ggplot(df, aes(x, y)) +
  geom_line() + geom_hline(yintercept = 0.95)

so again we see this ragged appearance typical for coverage graphs of discrete random variables.

Using the Large Sample Theory of Maximum Likelihood Estimators

From our previous discussion we know (under some regularity conditions) that the mle has an (approximate) normal distribution. This can be used to derive confidence intervals:

Example (6.2.5)

Let’s say we have \(X_1,..,X_n \sim N(\mu,\sigma)\)

first we consider the case where \(\sigma\) is fixed but unknown. Let’s assume we have not done anything yet with this model, and we want to estimate \(\mu\) using the maximum likelihood estimator. Moreover we will use Newton’s method for finding it. So we need the first two derivatives of the log likelihood function:

\[ \begin{aligned} &f(x\vert\mu,\sigma) =\frac1{\sqrt{2\pi\sigma^2}}\exp \left\{-\frac1{2\sigma^2}(x-\mu)^2 \right\} \\ &l(\mu,\sigma\vert\pmb{x}) = -\frac12\log(2\pi) - n\log \sigma - \frac1{2\sigma^2}\sum(x_i-\mu)^2\\ &\frac{dl}{d\mu} = \frac1{\sigma^2}\sum(x_i-\mu)\\ &\frac{d^2l}{d\mu^2} = -\frac{n}{\sigma^2} \end{aligned} \] and so Newton’s method yields

\[\mu_{n+1}=\mu_n-\left(\frac1{\sigma^2}\sum(x_i-\mu)\right)/\left(-\frac{n}{\sigma^2}\right)=\\\mu_n+\sum(x_i-\mu)/n\]

and this will converge to the sample mean.

Now what can we say about the mle? Because of the large sample theorem for mle’s we know that

\[ \begin{aligned} &E\left[\frac{d^2\log f(X\vert\mu,\sigma)}{d\mu^2}\right] = E[-\frac1{\sigma^2}]=-\frac1{\sigma^2}\\ &v(\mu) = -\frac1{nE\left[\frac{d^2\log f(X\vert\mu,\sigma)}{d\mu^2}\right] } = \frac{\sigma^2}{n}\\ &\hat{\mu} \sim N\left(\mu, \sqrt{v(\mu)}\right) = N(\mu,\sigma/\sqrt{n}) \\ \end{aligned} \]

one problem: we don’t know \(\sigma\), so what can we do to find the variance of the mle? But notice that:

\[v(\mu) = \sigma^2/n = -\left(\frac{d^2 f}{d \mu^2}\right)^{-1}\]

and we already have \(\frac{d^2 l}{d \mu^2}\) from when we ran Newton’s method! So all we have to do is use the last value from the iterations and we immediately have an estimate of the variance!

Now we have

\[ \begin{aligned} &f_{\hat{\mu}}(x) =\frac{\sqrt{n}}{\sqrt{2\pi\sigma^2}}\exp \left\{-\frac{n}{2\sigma^2}(x-\mu)^2 \right\} = L(\mu\vert x) \\ &\log L(\mu\vert x) = K-\frac{n}{2\sigma^2}(\mu-x)^2 \\ &2\log L(\mu\vert \hat{\mu}) = K-\frac{n}{\sigma^2}(\mu-\hat{\mu})^2 \\ \end{aligned} \]

so if we use the mle for x, 2 log-likelihood as a function of \(\mu\) is a quadratic. Let’s call this function \(\psi\):

\[\psi(\mu)= K-\frac{n(\mu-\hat{\mu})^2}{\sigma^2}\]

Let’s say we want to find a \((1-\alpha)100\%\) confidence interval. We already know one solution, from our previous discussion:

\[\left(\hat{\mu}-z_{\alpha/2}s/\sqrt{n},\hat{\mu}+z_{\alpha/2}s/\sqrt{n} \right)\]

What can we say about those point(s) on the 2 log-likelihood curve? We find

\[ \begin{aligned} &\psi(\hat{\mu})-\psi(\hat{\mu}-z_{\alpha/2}s/\sqrt{n}) = \\ &(K-\frac{n(\hat{\mu}-\hat{\mu})^2}{\sigma^2})-(K-\frac{n(\hat{\mu}-z_{\alpha/2}s/\sqrt{n}-\hat{\mu})^2}{\sigma^2}) = z_{\alpha/2}^2 \end{aligned} \] and so an (at this point admittedly very weird!) way to find the confidence interval is to find the points where the 2log-likelihood curve drops down by \(z_{\alpha/2}^2\) from its maximum!

Let’s do the graph:

alpha <- 0.95; n <- 100; mu0 <- 0; sigma0 <- 1
crit <- qnorm(1-(1-alpha)/2)
x <- rnorm(n, mu0, sigma0)
muhat <- mean(x)
shat <- sqrt(sum((x-muhat)^2)/n)
K <- 2*sum(log(dnorm(x, muhat, sigma0)))      
mu <- seq(-5*sigma0/sqrt(n), 5*sigma0/sqrt(n), length=500)
y <- 0*mu
for(i in 1:500) 
  y[i] <- 2*sum(log(dnorm(x, mu[i], sigma0)))
Hess <- (-n)/sigma0^2
v <- 1/sqrt(-Hess)
L <- (mu[1:250])[which.min(abs(y[1:250]-K+crit^2))]
U <- (mu[250:500])[which.min(abs(y[250:500]-K+crit^2))]
df <- data.frame(x=mu, y=y, yquad=K-(mu-muhat)^2/v^2)
ggplot(df) +
  geom_line(aes(x, y), linetype="solid") +
  geom_line(aes(x, yquad), 
            linetype = "dashed", color="lightblue") +
  geom_hline(yintercept = K-crit^2, size=1.2) +
  geom_vline(xintercept = c(L, U), size=1.2)

c(L, U)
## [1] -0.1212425  0.2695391

of course the 2 log-likelihood curve and the quadratic are the same.


Let’s turn next to the case where we know \(\mu\) but want to find a confidence interval for \(\sigma\). Following along the same arguments as for the mean we find

\[ \begin{aligned} &f(x\vert\mu,\sigma) =\frac1{\sqrt{2\pi\sigma^2}}\exp \left\{-\frac1{2\sigma^2}(x-\mu)^2 \right\} \\ &l(\mu,\sigma\vert\pmb{x}) = -\frac12\log(2\pi) - n\log \sigma - \frac1{2\sigma^2}\sum(x_i-\mu)^2\\ &\frac{dl}{d\sigma} = -\frac{n}{\sigma} + \frac1{\sigma^3}\sum(x_i-\mu)^2\\ &\frac{dl}{d\sigma} = \frac{n}{\sigma^2} - \frac3{\sigma^4}\sum(x_i-\mu)^2\\ \end{aligned} \] \[ \begin{aligned} &E\left[\frac{d^2\log f(X\vert\mu,\sigma)}{d\sigma^2}\right] = \\ &E\left[\frac{1}{\sigma^2} - \frac3{\sigma^4}(X-\mu)^2\right]=\\ &\frac{1}{\sigma^2} - \frac3{\sigma^4}E(X-\mu)^2 =\\ &\frac{1}{\sigma^2} - \frac3{\sigma^4}\sigma^2 =-\frac2{\sigma^2}\\ &v(\sigma) = -\frac1{nE\left[\frac{d^2\log f(X\vert\mu,\sigma)}{d\sigma^2}\right] } = \frac{\sigma^2}{2n}\\ &\hat{\sigma} \sim N\left(\sigma, \sqrt{v(\sigma)}\right) = N(\sigma,\sigma/\sqrt{2n}) \\ \end{aligned} \]

and so we find the (approximate) confidence interval

\[\left(\hat{\sigma}-z_{\alpha/2}\frac{\hat{\sigma}}{\sqrt{2n}},\hat{\sigma}+z_{\alpha/2}\frac{\hat{\sigma}}{\sqrt{2n}} \right)\]

Let’s illustrate this:

alpha <- 0.95; n <- 100; mu0 <- 0; sigma0 <- 1
crit <- qnorm(1-(1-alpha)/2)
x <- rnorm(n, mu0, sigma0)
muhat <- mean(x)
shat <- sqrt(sum((x-muhat)^2)/n)
K <- 2*sum(log(dnorm(x, mu0, shat)))      
sig <- seq(sigma0*0.7, sigma0*1.25, length=500)
y <- 0*mu
for(i in 1:500) 
  y[i] <- 2*sum(log(dnorm(x, mu0, sig[i])))
Hess <- n/shat^2-3/shat^4*sum((x-mu0)^2)
v <- 1/sqrt(-Hess)
L <- (sig[1:150])[which.min(abs(y[1:150]-K+crit^2))]
U <- (sig[250:500])[which.min(abs(y[250:500]-K+crit^2))]
df <- data.frame(x=sig, y=y, yquad=K-(sig-shat)^2/v^2)
ggplot(df) +
  geom_line(aes(x, y), linetype="solid", size=1.1) +
  geom_line(aes(x, yquad), 
            linetype = "dashed", color="lightblue",
            size=1.1) +
  geom_hline(yintercept = K-crit^2, size=1.2) +
  geom_vline(xintercept = c(L, U), size=1.2)

c(L, U)
## [1] 0.8212425 1.0835671

Now the parabola and the 2 log-likelihood curve are not exactly the same, because the true distribution of the mle is not (for this sample size) a normal. So now we could use the approximating parabola to find intervals (black line) or the exact 2 log-likelihood curve. Which one is better? It depends which one has correct coverage, and the only way to tell is by running a simulation study.

One advantage of the parabola solution is that we can find the interval points explicitly:

\[t = \hat{s} \pm z_{\alpha/2}v\]

Example (6.2.6)

Let’s say we have \(X_1,..,X_n \sim \Gamma(\alpha,\beta)\), both \(\alpha\),\(\beta\) unknown. We want to find confidence intervals for both.

Now we could do the math, but let’s try to use R as much as possible. To find the mle we need a routine that maximizes a function of a vector and allows for restricting the range of the variables (because \(\alpha,\beta>0\)). One such routine is nlminb.

mle.gamma <- function(dens="gamma", alpha=1, beta=1, n=100) {
  library(numDeriv)
  rdens <- get(paste0("r", dens))
  ddens <- get(paste0("d", dens))
  crit <- qnorm(1-0.05)
  dta <- rdens(n, alpha, beta)
  loglike <- function(x, dta)    
     (-2)*sum(log(ddens(dta, x[1], x[2])))
  z <- nlminb(c(alpha, beta), loglike,
              lower=c(0, 0), dta=dta)
  mle <- z$par
  K <- (-z$objective)
  Hess <- hessian(loglike, x=mle, dta=dta)
  a <- seq(mle[1]*0.75, mle[1]*1.25, length=500)
  y <- 0*a
  for(i in 1:500) 
    y[i] <- (-loglike(c(a[i], mle[2]), dta))
  y1 <- y[a<mle[1]]
  alphaL <- a[a<mle[1]] 
  alphaL <- alphaL[abs(y1-K+crit^2)==min(abs(y1-K+crit^2))]
  y1 <- y[a>mle[1]]
  alphaR <- a[a>mle[1]] 
  alphaR <- alphaR[abs(y1-K+crit^2)==min(abs(y1-K+crit^2))]  
  v <- sqrt(solve(Hess)[1,1])
  alpha2 <- mle[1]+c(-1,1)*v*crit 
  df <- data.frame(x=a, y=y, yquad=K-(a-mle[1])^2/v^2)
  plt1 <- ggplot(df) +
          geom_line(aes(x, y)) +
          geom_line(aes(x, yquad), color="blue") +
          geom_hline(yintercept=K-crit^2, size=1.2) +
          geom_vline(xintercept = c(alphaL, alphaR)) +
          geom_vline(xintercept = alpha2, color="blue") +
          labs(x=expression(alpha), y="")
  b <- seq(mle[2]*0.75, mle[2]*1.25, length=500)
  y <- 0*b
  for(i in 1:500) y[i] <- (-loglike(c(mle[1], b[i]), dta))  
  v <- sqrt(solve(Hess)[2,2])
  beta2 <- mle[2] + c(-1, 1)*v*crit
  y1 <- y[b<mle[2]]
  betaL <- b[b<mle[2]] 
  betaL <- betaL[abs(y1-K+crit^2)==min(abs(y1-K+crit^2))]
  y1 <- y[b>mle[2]]
  betaR <- b[b>mle[2]] 
  betaR <- betaR[abs(y1-K+crit^2)==min(abs(y1-K+crit^2))]
  df <- data.frame(x=b, y=y, yquad=K-(b-mle[2])^2/v^2)
  plt2 <- ggplot(df) +
          geom_line(aes(x, y)) +
          geom_line(aes(x, yquad), color="blue") +
          geom_hline(yintercept=K-crit^2, size=1.2) +
          geom_vline(xintercept = c(betaL, betaR)) +
          geom_vline(xintercept = beta2, color="blue") +
          labs(x=expression(beta), y="")
  pushViewport(viewport(layout = grid.layout(1, 2)))
  print( plt1,
    vp=viewport(layout.pos.row=1, layout.pos.col=1))
  print( plt2,
    vp=viewport(layout.pos.row=1, layout.pos.col=2))        
  
}
mle.gamma()

How about the parabola solution? For this we need the Hessian matrix, which we can find also using numerical methods. This can be done using the hessian function in the numDeriv library.

Also new: now we have a two-dimensional problem , so

\[v_i = \sqrt{H^{-1}_{i,i}}\]

Example (6.2.7)

Let’s say we have \(X_1,..,X_n \sim Beta(\alpha,\beta)\), both \(\alpha,\beta\) unknown. We want to find confidence intervals for both.

actually, the routine mle.gamma works just as is, we only need to call it with

mle.gamma("beta", alpha=2, beta=3)

In some way what we have here is a fully automatic confidence interval calculator! Of course it is based on a large sample theorem, so to what degree it works (aka the resulting intervals have coverage) needs to be checked in each case.

One-sided Confidence Intervals

Sometimes one is interested in an upper or a lower bound for a parameter, so what we need (for an upper bound) is a function U(\(\pmb{x}\)) such that \(P(\theta<U(X))=1-\alpha\). We can again derive such an interval by inverting a hypothesis test, this time a test with an alternative of > or <.

Example (6.2.8)

Say we have observations \(X_1,..,X_n\sim N(\mu, 1)\). We want a \(90\%\) upper bound for \(\mu\).

We will derive the LRT test for

\[H_0:\mu\ge\mu_0\text{ vs }H_a:\mu<\mu_0\]

We already know that the mle is \(\bar{x}\). Now under the null we find the maximum to also \(\bar{x}\) if \(\bar{x}\) is allowed, that is if \(\bar{x}\ge\mu_0\), or the maximum is at \(\mu_0\). of course \(L(\bar{x})/L(\bar{x})=1\), So

\[ LRT(\pmb{x})=\frac{L(\hat{\hat{\mu}})}{L(\hat{\mu})}=\\ \left\{ \begin{array}{cc} \frac{L(\mu_0)}{L(\bar{x})}&\text{ if }\bar{x}< \mu_0\\ 1 & \text{if }\bar{x}>\mu_0\\ \end{array}\right.\\ \]

Here is an example what this might look like:

so clearly “LRT is small” means “\(\bar{x}\) is small”. So

\[ \begin{aligned} &\alpha =P(\bar{X}<c)= \\ &P(\sqrt n(\bar{X}-\mu_0) < \sqrt n(c-\mu_0)) = \Phi(\sqrt n(c-\mu_0))\\ &\sqrt n(c-\mu_0) =z_\alpha \\ &c=\mu_0+\frac{1}{\sqrt n}z_\alpha\\ &1-\alpha =P(\bar{X}>\mu_0+\frac{1}{\sqrt n}z_\alpha)= \\ &P(\mu_0<\bar{X}-\frac{1}{\sqrt n}z_\alpha)= \\ &P(\mu_0<\bar{X}+\frac{z_{1-\alpha}}{\sqrt n}) \\ \end{aligned} \]