A Longer Example - Testing

Say \(X_1, .., X_n\sim Beta(a,1)\)

We want to test

\[H_0:a=a_0\text{ vs }H_1: a \ne a_0\]

Case n=1

We already know that E[X]=a/(a+1), so \(x \approx a/(a+1)\) or \(a \approx x/(1-x)\). So a test could be based on the rejection region

\[ x/(1-x) < c_1 \text{ or } x/(1-x)>c_2 \] But the function x/(1-x) is monotonically increasing on [0, 1], so this rejection region is equivalent to one with

\[ x < c_1 \text{ or } x>c_2 \]

Now

\[ \begin{aligned} &\alpha/2 = P(X<c_1) = c_1^{a_0}\\ &c_1 = (\alpha/2)^{1/a_0}\\ &\alpha/2 = P(X>c_2) = c_2^{a_0}\\ &c_2 = (1-\alpha/2)^{1/a_0}\\ \end{aligned} \]

For example if we wish to test a0=0.5 at the 5% level we reject the null hypothesis if

a0=0.5;alpha=0.05
round(c(alpha/2,1-alpha/2)^(1/a0), 5)
## [1] 0.00063 0.95062

Case n=2

say x and y. Again we might try to use the fact that E[(X+Y)/2] = a/(1+a) But eventually we would need to find the density of X+Y. Using the convolution formula if \(0<t<1\) this would mean finding the integral

\[\int_{t-1}^t[x(1-x)]^{a-1}dx\]

and this integral does not exist analytically. What can we do? One idea is to find the integral numerically:

dbeta2 <-function (x, a=1) {
  f <- function(x, t) {(x*(t-x))^(a-1)}
  y <- 0*x
  for(i in 1:length(x)) {
    if(x[i] <1) y[i] <- integrate(f, 0, x[i], t=x[i])$value
    else y[i] <- integrate(f,x[i]-1, 1, t=x[i])$value
  } 
  a^2*y
}
pbeta2  <-  function(x, a=1) {
  y <- x
  for(i in 1:length(x)) 
    y[i] <- integrate(dbeta2, 0, x[i], a=a)$value
  y 
}
qbeta2  <- function (p, a=1)   {
  pbeta2 <- function(x) {integrate(dbeta2, 0, x, a=a)$value}
  low <- 0
  high <- 2
  repeat {
    mid <- (low+high)/2
    fmid <- pbeta2(mid)
    if(abs(fmid-p) <0.001) break
    if(fmid<p) low <- mid
    else high <- mid
  }
  mid 
}
round(c(qbeta2(0.025, 1/2), qbeta2(0.975, 1/2)), 3)
## [1] 0.031 1.594

Alternatively we can use simulation to find the null distribution:

B <- 1e4
xy <- matrix(rbeta(2*B, 0.5, 1), ncol=2)
z <- apply(xy, 1, sum)
round(quantile(z,c(0.025,0.975)), 3)
##  2.5% 97.5% 
## 0.034 1.565

This of course works equally well for n=3, 4, … whereas the numerical solution gets much harder quickly.

Case large n

From the CLT we know that

\[\sqrt{n}\frac{\bar{x}-\mu_0}{\sigma_0}\rightarrow N(0,1)\] where

\[\mu_0=\frac{a_0}{a_0+1}\]

and

\[\sigma_0^2=\frac{a_0}{(a_0+1)^2(a_0+2)}\] therefore

\[cv=\mu_0\pm z_{\alpha/2}\sigma_0/\sqrt{n}=\frac{1}{a_0+1}\left(a_0\pm z_{\alpha/2}\sqrt{\frac{a_0}{n(a_0+2)}}\right)\] So we have the following test: reject \(H_0\) if \(\bar{x} <cv_1\) or \(\bar{x} >cv_2\).

One-sided tests

above we tested

\[H_0:a=a_0\text{ vs }H_1: a \ne a_0\]

Often we want to test instead alternatives of the form

\[H_0:a=a_0\text{ vs }H_1: a < a_0\]

or

\[H_0:a=a_0\text{ vs }H_1: a > a_0\]

In that case choose the appropriate rejection region, with \(\alpha\) instead of \(\alpha/2\)

p-value

if we want to quote the p-value of this test we calculate it as follows: say we observed \(\bar{x} =t\) in our experiment. The p-value is the probability of repeating the experiment and observing a value of the test statistic as “unlikely” (given the null hypothesis) as that seen in the original experiment.

Say we observed \(t>a_0\) and let \(\bar{y}\) be the sample mean of the new experiment, then

\[p=2P(\bar{Y}>t)=2\left(1-\Phi(\sqrt{n}\frac{t-\mu_1}{\sigma_0})\right)\] the “2” is because we do a two-sided test.

Likelihood ratio test

we saw before that the mle was given by

\[\hat{a}_2=n/T\] where \(T=-\sum \log x_i\). So using the results of section 4.3 we have

\[ \begin{aligned} &-2\log \lambda({\pmb{x}}) = \\ &2\left[l(\hat{a}_2)-l(a_0)\right]=\\ &2\left[n\log\hat{a}_2 +(\hat{a}_2-1)\sum_{i=1}^n\log x_i-n\log a_0 -(a_0-1)\sum_{i=1}^n\log x_i\right] = \\ &2\left[n\log \frac{\hat{a}_2}{a_0}-(\hat{a}_2-a_0)\sum_{i=1}^n\log x_i\right] = \\ &2\left[n\log \frac{\hat{a}_2}{a_0}-(\hat{a}_2-a_0)(-n/\hat{a}_2)\right] = \\ &2n\left[\log \frac{\hat{a}_2}{a_0}+\frac{a_0}{\hat{a}_2}-1\right] \end{aligned} \]

and we reject \(H_0\) if

\[-2 \log \lambda(\pmb{x})>qchisq(1-\alpha,1)\]

One problem with both these methods is that they are large sample methods, they rely on the CLT. Can we derive a method that also works for small samples? The basic idea of the LRT is to reject the null hypothesis if \(\lambda(\pmb{x})\) is small, which is the same as \((-2) \log \lambda(\pmb{x})\)is large. Now consider this:

\[ \begin{aligned} &h(a) = \log \frac{a}{a_0}+\frac{a_0}{a}-1\\ &\frac{dh}{da} = \frac1{a}-\frac{a_0}{a^2}=\frac{a-a_0}{a^2}>0 \end{aligned} \]

if \(a>a_0\). So h is decreasing on \((0,a_0)\) and increasing on \((a_0, \infty)\), and so h is large for small or large values of a.

Now \(\hat{a}_2=n/T\), and so

\(\hat{a}_2\) is small or large

iff

T is small or large

If \(H_0\) is true \(T \sim \Gamma(n,1/a_0)\) and if we use \(\alpha/2\) on the left and the right we find

\[\alpha/2=P(T<x)=pgamma(x,n,a_0)\]

\[x=qgamma(1-\alpha/2,n,a_0)\]

Similarly \(\alpha/2=P(T>y)\) yields \(T>qgamma(1-\alpha/2,n,a_0)\).

Note

\[ \begin{aligned} &pgamma(x,n,a) =\\ &\int_0^x \frac{a^n}{(n-1)!}t^{n-1}e^{-at}dt= \\ &\int_0^{ax} \frac{1}{(n-1)!}y^{n-1}e^{y}dy= \\ &pgamma(ax,n,1) \end{aligned} \] and so

\[ \begin{aligned} &qamma(y,n,a) =x\\ &y = pamma(x,n,a)=gamma(ax,n,1) \\ &qamma(y,n,1) =ax \\ &qamma(y,n,a) = qamma(y,n,1)/a \end{aligned} \] so we reject \(H_0\) if

\(a_0T<qgamma(\alpha/2,n,1)\) or \(a_0T>qgamma(1-\alpha/2,n,1)\)

One-sided tests

Say we want to test \(H_a:a<a_0\). Then we have the rejection region \(a_0T<qgamma(\alpha,n,1)\)

p-value

\[ p=2P(T>t|a=a_0) =2\left(1-pgamma(t,n,a_0)\right) \] if \(t>a_0\).

Bayesian analysis

Let’s use again the prior Exp(1), then we know that \(a|x \sim \Gamma(n+1,T+1)\). A test could be designed as follows: reject H_0 if

\(a_0<qgamma(\alpha/2,n+1,T+1)\) or \(a_0>qgamma(1-\alpha/2,n+1,T+1)\)

But from the above we know that

\[a_0 < qgamma(alpha/2,n+1,T+1)=qgamma(\alpha/2,n+1,1)/(T+1)\]

iff

\[a_0(T+1)<qgamma(\alpha/2,n+1,1)\]

and we see that this is essentially the same as the test based on the mle. (except with n+1 instead of n and T+1 instead of T)

Power

Let’s go back to the two tests based on the sample mean and the likelihood ratio. Which of these is best? That depends on the power of the test. First we have

  • Wald test

\[ \begin{aligned} &x = \mu+\frac{\sigma}{\sqrt{n}}\Phi^{-1}(\alpha/2)\\ &y = \mu+\frac{\sigma}{\sqrt{n}}\Phi^{-1}(1-\alpha/2)\\ &\mu_1 =\frac{a_1}{a_1+1} \\ &\sigma_1^2 = \frac{a_1}{(a_1+1)^2(a_1+2)} \\ &1-Pow(a_1,n)=P(x<\bar{X}<y\vert a_1) = \\ &P\left(\sqrt{n}\frac{x-\mu_1}{\sigma_1}<\sqrt{n}\frac{\bar{X}-\mu_1}{\sigma_1}<\sqrt{n}\frac{y-\mu_1}{\sigma_1}\right) =\\ &\Phi(\sqrt{n}\frac{y-\mu_1}{\sigma_1})-\Phi(\sqrt{n}\frac{x-\mu_1}{\sigma_1}) \end{aligned} \]

  • lrt test

\[ \begin{aligned} &P\left(qgamma(\alpha/2,n,1)<a_0T<qgamma(1-\alpha/2,n,1)\vert a_1\right) = \\ &P\left(qgamma(\alpha/2,n,1)\frac{a_1}{a_0}<a_1T<qgamma(1-\alpha/2,n,1)\frac{a_1}{a_0}\vert a_1\right) = \\ &pgamma(qgamma(\alpha/2,n,1)\frac{a_1}{a_0},n,1)-pgamma(qgamma(\alpha/2,n,1)\frac{a_1}{a_0},n,1) \end{aligned} \]

so

a0 <- 1; n <- 50; alpha <- 0.05
a <- seq(a0/3, 2*a0, length=250)
mu <- a0/(a0+1)
sigma <- sqrt(a0/(a0+1)^2/(a0+2))
mu1 <- a/(a+1)
sigma1 <- sqrt(a/(a+1)^2/(a+2))
xy <- mu+sigma/sqrt(n)*qnorm(c(alpha/2, 1-alpha/2))
Pow1 <- 1-(pnorm(sqrt(n)/sigma1*(xy[2]-mu1)) -
           pnorm(sqrt(n)/sigma1*(xy[1]-mu1)))
Pow2 <- 1-(pgamma(a/a0*qgamma(1-alpha/2, n, 1), n, 1) -
          pgamma(a/a0*qgamma(alpha/2, n, 1), n, 1))
df <- data.frame(a=c(a, a),
          Power=c(Pow1, Pow2),
          Test=rep(c("Wald", "Lrt"), each=250))
ggplot(data=df, aes(a, Power, color=Test)) +
  geom_line()

Of course the Wald type test only works for large samples. A difficult question is how large n needs to be. Simulation can help to decide that.

We also have two tests for the case n=1. How do they relate to each other? For the test based on the likelihood ratio statistic we reject \(H_0\) if

\(T < qgamma(\alpha/2,n,a_0)\) or \(T>qgamma(1-\alpha/2,n,a_0)\)

but now \(T=-\log X\) so

\(-\log x < qgamma(\alpha/2,n,a_0)\) or \(-\log x > qgamma(1-\alpha/2,n,a_0)\)

iff

\(x<\exp[-qgamma(1-\alpha/2,n,a_0)]\) or \(x> \exp[-qgamma(\alpha/2,n,a_0)]\)

for the direct test we reject H_0 if

\(x<(\alpha/2)^{1/a0}\) or \(x>(1-\alpha/2)^{1/a0}\)

Let’s find some of these critical values

a0 <- seq(0.1, 10, length=10)
cv1 <- (alpha/2)^(1/a0)
cv2 <- (1-alpha/2)^(1/a0)
cv1a <- exp(-qgamma(1-alpha/2, 1, a0))
cv2a <- exp(-qgamma(alpha/2, 1, a0)) 
kable.nice(round(cbind(a0,cv1,cv1a,cv2,cv2a), 4),
           do.row.names = FALSE)
a0 cv1 cv1a cv2 cv2a
0.1 0.0000 0.0000 0.7763 0.7763
1.2 0.0462 0.0462 0.9791 0.9791
2.3 0.2011 0.2011 0.9891 0.9891
3.4 0.3379 0.3379 0.9926 0.9926
4.5 0.4405 0.4405 0.9944 0.9944
5.6 0.5175 0.5175 0.9955 0.9955
6.7 0.5766 0.5766 0.9962 0.9962
7.8 0.6232 0.6232 0.9968 0.9968
8.9 0.6607 0.6607 0.9972 0.9972
10.0 0.6915 0.6915 0.9975 0.9975

and we see they are the same! Here is why:

\(qgamma(y,1,a)\) is the solution to the equation

\[ \begin{aligned} & y = \int_0^x \frac{a^1}{(1-1)!}t^{1-1}e^{-at}dt= \\ &\int_0^{x} ae^{-ay}dy= 1-e^{-ax}\\ &e^{-x} =(1-y)^{1/a} \end{aligned} \]

How about n=2? This is much trickier because one test is based on x+y, the other one on log(x)+log(y).

Here is a little simulation:

B <- 1e4
r <- matrix(0, B, 2)
for(i in 1:B) {
  xy <- rbeta(2, 1/2, 1)
  if(sum(xy)<0.03125 | sum(xy)>1.593) r[i,1] <- 1
  if(-sum(log(xy))<0.484 | -sum(log(xy))>11.14) r[i,2] <- 1
  if(r[i, 1]!=r[i, 2]) break
}
xy
## [1] 0.002283813 0.021309225

which shows that sometime one test rejects \(H_0\) while the other does not.


So then which is better? We do have a formula for the power of the likelihood ratio test, the other one needs to be done via simulation. Here is the result for \(a_0=0.5\):

a0 <- 1; B <- 1e4
xy <- matrix(rbeta(2*B, a0, 1), ncol=2)
z <- apply(xy, 1, sum)
cv <- quantile(z, c(0.025,0.975))
a <- seq(0.1, 10,length=250)
Pow1 <- rep(0, 250)
for(i in 1:250) {
  xy <- matrix(rbeta(2*B, a[i],1), ncol=2)
  z <- apply(xy, 1, sum)
  z1 <- z[z<cv[1]]
  z2 <- z[z>cv[2]]
  Pow1[i] <- length(c(z1, z2))/B
}
Pow2 <- 1-(pgamma(a/a0*qgamma(0.975,2,1),2,1)-
             pgamma(a/a0*qgamma(0.025,2,1),2,1))
df <- data.frame(a=c(a, a),
          Power=c(Pow1, Pow2),
          Test=rep(c("Wald", "Lrt"), each=250))
ggplot(data=df, aes(a, Power, color=Test)) +
  geom_line()

and we see that the two tests have very similar power.