A Longer Example - Intervals

\(X_1, .., X_n\sim Beta(a,1)\), and we want to find a \((1-\alpha)100\%\) confidence interval for a.

Frequentist Intervals

Let’s again first consider the case n=1. Previously we had the following hypothesis test for

\[H_0:a=a_0\text{ vs }H_a: a \ne a_0\]

reject the null hypothesis if

\(x<(\alpha/2)^{1/a0}\) or \(x>(1-\alpha/2)^{1/a0}\)

therefore we find

\[x=(1-\alpha/2)^{1/a}\\\log x=\frac1{a}\log(1-\alpha/2)\\ a=\frac{\log(1-\alpha/2)}{\log x}\]

and so we have the interval

\[\left(\frac{\log(1-\alpha/2)}{\log x},\frac{\log(\alpha/2)}{\log x}\right)\]

x <- 0.45;alpha <- 0.05
round(c(log(1-alpha/2), log(alpha/2))/log(x), 3)
## [1] 0.032 4.620

Here we split \(\alpha\) 50-50 on the left and the the right. Is this optimal? Let’s put \(\lambda\alpha\) on the left and \((1- \lambda)\alpha\) on the right for some \(0 \le \lambda \le 1\). Then

\[ \begin{aligned} &\lambda \alpha=P(X<c_1)=c_1^{a}\text{, so } c_1=(\lambda\alpha)^{1/a}\\ &a=\log(\lambda \alpha)/\log x\\ &(1-\lambda) \alpha=P(X<c_2) = 1-P(X<c_2)=1-c_2^{a}\\ &c2=\left[ 1-(1-\lambda)\alpha)\right]^{1/a}\\ &a=\log(1-(1-\lambda)\alpha)/\log x\\ \end{aligned} \]

so the interval is of the form

\[ \left(\log(\lambda \alpha), \log(1-(1-\lambda)\alpha)) \right)/\log x \] say we want to find a shortest interval. Let’s draw the length of the interval as a function of \(\lambda\):

fun <- function(lambda, alpha=0.05)
  log(1-(1-lambda)*alpha)-log(lambda*alpha)
ggcurve(fun=fun, A=0, B=1)

so this is strictly decreasing, so it is minimized at \(\lambda=1\), and we find the optimal interval to be

\[ \left(0, \log(1-\alpha) \right)/\log x \]

and now if we observe X=0.45 a 95% CI for a is

round(c(0, log(alpha))/log(x), 3)
## [1] 0.000 3.752

Note that these intervals always start at 0, which might not be a good idea for some experiments if it is known that \(a>0\).

How about the case n=2?

We have qbeta2(\(\alpha/2,a\))=x which is equivalent to pbeta2(x,a)=\(\alpha/2\), and we need to solve this for a. Again this needs to be done numerically:

invbeta2 <- function (x, alpha=0.05)  {
  x <- sum(x)
  a <- 0
  repeat {
    a <- a+0.001
    if(pbeta2(x, a)<1-alpha/2)
      break
  }
  L <- a
  repeat {
    a <- a+0.001
    if(pbeta2(x, a)<alpha/2)
      break
  }  
  c(L, a)
}
invbeta2(c(0.31, 0.59))
## [1] 0.078 3.043

For example, say we observe (x,y) = (0.31, 0.59) note that (x+y)/2 = 0.45, same as above with n=1. Then we find a 95% CI for a to be (0.078, 3.043).


Previously we also used simulation to estimate pbeta2(x,a). If we could not calculate it numerically we could use this as well, but clearly we are going to get routines that take take rather a long time to run.


Finally for the case of large n we had the Wald test: reject \(H_0\) if

\[\bar{x}<\frac{1}{a_0+1}\left(a_0- z_{\alpha/2}\sqrt{\frac{a_0}{n(a_0+2)}}\right)\]

or

\[\bar{x}>\frac{1}{a_0+1}\left(a_0+ z_{\alpha/2}\sqrt{\frac{a_0}{n(a_0+2)}}\right)\] so the acceptance region of the test is

\[\frac{1}{a_0+1}\left(a_0- z_{\alpha/2}\sqrt{\frac{a_0}{n(a_0+2)}}\right)<\bar{x}<\frac{1}{a_0+1}\left(a_0+ z_{\alpha/2}\sqrt{\frac{a_0}{n(a_0+2)}}\right)\]

to get a confidence interval we need to “invert the test”. This means solving the double-inequality in the acceptance region for a (which now replaces a0). The next graph shows the left and the right side of the double-inequality as a function of a:

n <- 50; xbar <- 0.45; alpha <- 0.05
crit <- qnorm(1-alpha/2)
a <- seq(0, 2, length=500)
y1 <- (a-sqrt(a/(n*(a+2)))*crit)/(a+1)
y2 <- (a+sqrt(a/(n*(a+2)))*crit)/(a+1)
R <- a[which.min(abs(y1-xbar))]
L <- a[which.min(abs(y2-xbar))]
df <- data.frame(a=c(a, a), y=c(y1, y2),
        which=rep(c("L", "U"), each=500))
ggplot(df, aes(a, y, color=which)) + 
  geom_line() +
  geom_hline(yintercept = xbar) +
  geom_vline(xintercept = c(L, R))

round(c(L, R), 3)
## [1] 0.581 1.118

Formally we need to solve the equation

\[x=\frac{1}{a+1}\left(a\pm z\sqrt{\frac{a}{n(a+2)}}\right)\]

doing the arithmetic yields the cubic equation

\[n(x-1)^2a^3+2n(2x-1)(x-1)a^2+[nx(5x-4)-z^2]a+2nx^4=0\]

so now we have a cubic equation, which we can solve. In R this is done by the routine polyroot:

z <- qnorm(alpha/2)
cf <- c(n*(xbar-1)^2, 2*n*(2*xbar-1)*(xbar-1), 
        n*xbar*(5*xbar-4)-z^2, 2*n*xbar^2)
round(Re(polyroot(cf[4:1])), 3)
## [1]  0.579 -2.063  1.120

Of course there are generally three roots, but one of them is negative and the other two are our solutions.


How about the mle? This one is easy: recall that \(T=-\sum \log x_i\), and

\[ \begin{aligned} &1-\alpha= \\ &P(\left(qgamma(\alpha/2,n,1)<aT<qgamma(1-\alpha/2,n,1)\right)) = \\ &P(\left(qgamma(\alpha/2,n,1)/T<a<qgamma(1-\alpha/2,n,1)/T\right)) \end{aligned} \]

Bayesian Interval

Previously we saw that if we use a prior Exp(1) we get a posterior distribution

\[a\vert\pmb{x} \sim \Gamma(n+1, 1/(T+1))\]

using this we can find the equal-tail probability interval by solving

\[\alpha/2=P(a<l|\pmb{x})=pgamma(Tl, n+1, 1/(T+1))\] so we have

\[l=qgamma(\alpha/2, n+1, 1/(T+1))\\ u=qgamma(1-\alpha/2, n+1, 1/(T+1))\]

which is (almost) the same as the confidence interval based on the LRT test.

Which Interval is Better?

First of frequentist confidence intervals and Bayesian credible intervals can not really be compared directly. We will just compare the two confidence intervals.

First we need to check that the two methods yield proper confidence intervals, that is that they have coverage. So if we calculate a 90% interval it really contains the true parameter 90% of the time. This is true for the LRT interval by construction because we could find the exact distribution of the LRT statistic and invert the interval analytically. The other test is a large sample test and needs to be checked via simulation. Here are some results (red=MM, blue=LRT)

wald.test <- function(x,  alpha=0.05 ) {
  crit <- qnorm(1-alpha/2)
  a <- seq(0, 5, length=500)
  n <- length(x)
  xbar <- mean(x)
  y1 <- (a-sqrt(a/(n*(a+2)))*crit)/(a+1)
  y2 <- (a+sqrt(a/(n*(a+2)))*crit)/(a+1)
  R <- a[which.min(abs(y1-xbar))]
  L <- a[which.min(abs(y2-xbar))]
  round(c(L, R), 3)
}
lrt.test <- function(x,  alpha=0.05 ) {
  Ts=-sum(log(x))
  n=length(x)
  round(c(qgamma(alpha/2, n, 1)/Ts, qgamma(1-alpha/2, n, 1)/Ts), 3) 
}  
coverage <- function(n, a, B=1e4) {
  A=rep(0,B)
  for(i in 1:B) {
    x=rbeta(n, a, 1)
    tmp=wald.test(x)      
    if(tmp[1]<a & a<tmp[2]) A[i]=1
  }
  sum(A)/B
}
a=seq(0.2, 2, length=25)
out=a
for(i in seq_along(a))
  out[i]=coverage(10, a[i])
rbind(a,out)
##      [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]  [,11]
## a   0.200 0.2750 0.3500 0.4250 0.5000 0.5750 0.6500 0.7250 0.8000 0.8750 0.9500
## out 0.957 0.9568 0.9532 0.9523 0.9511 0.9505 0.9515 0.9528 0.9497 0.9499 0.9503
##      [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]  [,19]  [,20]  [,21]
## a   1.0250 1.1000 1.1750 1.2500 1.3250 1.4000 1.4750 1.5500 1.6250 1.7000
## out 0.9501 0.9522 0.9503 0.9552 0.9489 0.9535 0.9539 0.9532 0.9542 0.9495
##      [,22]  [,23]  [,24]  [,25]
## a   1.7750 1.8500 1.9250 2.0000
## out 0.9554 0.9501 0.9538 0.9497
df=data.frame(a=c(a,a),
              Coverage=c(100*out, rep(95,length(out))),
              Method=rep(c("Wald", "LRT"), each=length(out)))
ggplot(data=df, aes(a, Coverage, color=Method)) +
  geom_point() +
  geom_hline(yintercept = 95) +
  lims(y=c(93,97))

so we see that the Wald method has some over-coverage. Over-coverage is acceptable but not really desirable because it generally means larger intervals.

How can we choose between the two? We need a measure of performance for a confidence interval. Again we can consider the mean length of the interval E[U(X)-L(X)]. In the case of the MM method this will need to be found via simulation. As for the LRT method, we have

\[ \begin{aligned} &E\left[U(\pmb{x})-L(\pmb{x})\right] = \\ &\left[qgamma(1-\alpha/2,n,1)-qgamma(\alpha/2,n,1)\right]E[1/T] = \\ &\left[qgamma(1-\alpha/2,n,1)-qgamma(\alpha/2,n,1)\right]\frac1n E[n/T] = \\ &\left[qgamma(1-\alpha/2,n,1)-qgamma(\alpha/2,n,1)\right]\frac1n \frac{na}{n-1}=\\ &\left[qgamma(1-\alpha/2,n,1)-qgamma(\alpha/2,n,1)\right]\frac{a}{n-1} \end{aligned} \]

Here is a graph of the mean lengths:

expected.length <- function(n, a, B=1e5) {
  A=rep(0,B)
  for(i in 1:B) {
    x=rbeta(n, a, 1)
    A[i]=diff(wald.test(x))
  }
  A
}
a=seq(0.2, 2, length=25)
out=a
for(i in seq_along(a))
  out[i]=expected.length(10, a[i])
el.lrt=(qgamma(0.975,10,1)-qgamma(0.025,10,1))*a/9
df=data.frame(a=c(a,a),
              Exp.Lenght=c(out, el.lrt),
              Method=rep(c("Wald", "LRT"), each=length(out)))
ggplot(data=df, aes(a, Exp.Lenght, color=Method)) +
  geom_point() 

and there does not seem to be a big difference.

Above we used the fact that aT has distribution that does not depend on the parameter a. We then found the interval by dividing the error probability \(\alpha\) equally on the left and the right. But is that the best option? If we are interested in shortest-length intervals, can we find the intervals that are optimal? That is can we find L(T) and U(T) such that

minimize \(E[U(T)-L(T)]\) subject to \(P(L(T)<a<U(T))=1-\alpha\)

Let’s set \(q(z)=qgamma(z,n,1)\) and let’s consider intervals of the form \(L(T)=q(z_1)/T\) and \(U(T)=q(1-z_2)/T\).

Clearly we need \(z_1+z_2=1-\alpha\) , so we have \(z_2=1-\alpha\)-z_1$ or we just write

\(L(T)=q(z)/T\) and \(U(T)=q(1-\alpha-z)/T\)

This has length

\[E\left[U(\pmb{x})-L(\pmb{x})\right] = \left[q(1-\alpha-1,n,1)-q(z,n,1)\right]\frac{a}{n-1}\]

This can not be done analytically, and so again we need to resort to a numerical solution. It turns out that assigning a smaller part of \(\alpha\) to the left side leads to shorter intervals.

How much better are these intervals? Here are the mean lengths (for a=1)

For example if n=2 (and a=1) the mean length of the equal-tail intervals is 5.33 but for the “optimal” length intervals it is 4.72, an improvement of about 10%.