Say \(X_1, .., X_n\sim Beta(a,1)\)
We want to test
\[H_0:a=a_0\text{ vs }H_1: a \ne a_0\]
We already know that E[X]=a/(a+1), so \(x \approx a/(a+1)\) or \(a \approx x/(1-x)\). So a test could be based on the rejection region
\[ x/(1-x) < c_1 \text{ or } x/(1-x)>c_2 \] But the function x/(1-x) is monotonically increasing on [0, 1], so this rejection region is equivalent to one with
\[ x < c_1 \text{ or } x>c_2 \]
Now
\[ \begin{aligned} &\alpha/2 = P(X<c_1) = c_1^{a_0}\\ &c_1 = (\alpha/2)^{1/a_0}\\ &\alpha/2 = P(X>c_2) = c_2^{a_0}\\ &c_2 = (1-\alpha/2)^{1/a_0}\\ \end{aligned} \]
For example if we wish to test a0=0.5 at the 5% level we reject the null hypothesis if
a0=0.5;alpha=0.05
round(c(alpha/2,1-alpha/2)^(1/a0), 5)
## [1] 0.00063 0.95062
say x and y. Again we might try to use the fact that E[(X+Y)/2] = a/(1+a) But eventually we would need to find the density of X+Y. Using the convolution formula if \(0<t<1\) this would mean finding the integral
\[\int_{t-1}^t[x(1-x)]^{a-1}dx\]
and this integral does not exist analytically. What can we do? One idea is to find the integral numerically:
dbeta2 <-function (x, a=1) {
f <- function(x, t) {(x*(t-x))^(a-1)}
y <- 0*x
for(i in 1:length(x)) {
if(x[i] <1) y[i] <- integrate(f, 0, x[i], t=x[i])$value
else y[i] <- integrate(f,x[i]-1, 1, t=x[i])$value
}
a^2*y
}
pbeta2 <- function(x, a=1) {
y <- x
for(i in 1:length(x))
y[i] <- integrate(dbeta2, 0, x[i], a=a)$value
y
}
qbeta2 <- function (p, a=1) {
pbeta2 <- function(x) {integrate(dbeta2, 0, x, a=a)$value}
low <- 0
high <- 2
repeat {
mid <- (low+high)/2
fmid <- pbeta2(mid)
if(abs(fmid-p) <0.001) break
if(fmid<p) low <- mid
else high <- mid
}
mid
}
round(c(qbeta2(0.025, 1/2), qbeta2(0.975, 1/2)), 3)
## [1] 0.031 1.594
Alternatively we can use simulation to find the null distribution:
B <- 1e4
xy <- matrix(rbeta(2*B, 0.5, 1), ncol=2)
z <- apply(xy, 1, sum)
round(quantile(z,c(0.025,0.975)), 3)
## 2.5% 97.5%
## 0.034 1.565
This of course works equally well for n=3, 4, … whereas the numerical solution gets much harder quickly.
From the CLT we know that
\[\sqrt{n}\frac{\bar{x}-\mu_0}{\sigma_0}\rightarrow N(0,1)\] where
\[\mu_0=\frac{a_0}{a_0+1}\]
and
\[\sigma_0^2=\frac{a_0}{(a_0+1)^2(a_0+2)}\] therefore
\[cv=\mu_0\pm z_{\alpha/2}\sigma_0/\sqrt{n}=\frac{1}{a_0+1}\left(a_0\pm z_{\alpha/2}\sqrt{\frac{a_0}{n(a_0+2)}}\right)\] So we have the following test: reject \(H_0\) if \(\bar{x} <cv_1\) or \(\bar{x} >cv_2\).
above we tested
\[H_0:a=a_0\text{ vs }H_1: a \ne a_0\]
Often we want to test instead alternatives of the form
\[H_0:a=a_0\text{ vs }H_1: a < a_0\]
or
\[H_0:a=a_0\text{ vs }H_1: a > a_0\]
In that case choose the appropriate rejection region, with \(\alpha\) instead of \(\alpha/2\)
if we want to quote the p-value of this test we calculate it as follows: say we observed \(\bar{x} =t\) in our experiment. The p-value is the probability of repeating the experiment and observing a value of the test statistic as “unlikely” (given the null hypothesis) as that seen in the original experiment.
Say we observed \(t>a_0\) and let \(\bar{y}\) be the sample mean of the new experiment, then
\[p=2P(\bar{Y}>t)=2\left(1-\Phi(\sqrt{n}\frac{t-\mu_1}{\sigma_0})\right)\] the “2” is because we do a two-sided test.
we saw before that the mle was given by
\[\hat{a}_2=n/T\] where \(T=-\sum \log x_i\). So using the results of section 4.3 we have
\[ \begin{aligned} &-2\log \lambda({\pmb{x}}) = \\ &2\left[l(\hat{a}_2)-l(a_0)\right]=\\ &2\left[n\log\hat{a}_2 +(\hat{a}_2-1)\sum_{i=1}^n\log x_i-n\log a_0 -(a_0-1)\sum_{i=1}^n\log x_i\right] = \\ &2\left[n\log \frac{\hat{a}_2}{a_0}-(\hat{a}_2-a_0)\sum_{i=1}^n\log x_i\right] = \\ &2\left[n\log \frac{\hat{a}_2}{a_0}-(\hat{a}_2-a_0)(-n/\hat{a}_2)\right] = \\ &2n\left[\log \frac{\hat{a}_2}{a_0}+\frac{a_0}{\hat{a}_2}-1\right] \end{aligned} \]
and we reject \(H_0\) if
\[-2 \log \lambda(\pmb{x})>qchisq(1-\alpha,1)\]
One problem with both these methods is that they are large sample methods, they rely on the CLT. Can we derive a method that also works for small samples? The basic idea of the LRT is to reject the null hypothesis if \(\lambda(\pmb{x})\) is small, which is the same as \((-2) \log \lambda(\pmb{x})\)is large. Now consider this:
\[ \begin{aligned} &h(a) = \log \frac{a}{a_0}+\frac{a_0}{a}-1\\ &\frac{dh}{da} = \frac1{a}-\frac{a_0}{a^2}=\frac{a-a_0}{a^2}>0 \end{aligned} \]
if \(a>a_0\). So h is decreasing on \((0,a_0)\) and increasing on \((a_0, \infty)\), and so h is large for small or large values of a.
Now \(\hat{a}_2=n/T\), and so
\(\hat{a}_2\) is small or large
iff
T is small or large
If \(H_0\) is true \(T \sim \Gamma(n,1/a_0)\) and if we use \(\alpha/2\) on the left and the right we find
\[\alpha/2=P(T<x)=pgamma(x,n,a_0)\]
\[x=qgamma(1-\alpha/2,n,a_0)\]
Similarly \(\alpha/2=P(T>y)\) yields \(T>qgamma(1-\alpha/2,n,a_0)\).
Note
\[ \begin{aligned} &pgamma(x,n,a) =\\ &\int_0^x \frac{a^n}{(n-1)!}t^{n-1}e^{-at}dt= \\ &\int_0^{ax} \frac{1}{(n-1)!}y^{n-1}e^{y}dy= \\ &pgamma(ax,n,1) \end{aligned} \] and so
\[ \begin{aligned} &qamma(y,n,a) =x\\ &y = pamma(x,n,a)=gamma(ax,n,1) \\ &qamma(y,n,1) =ax \\ &qamma(y,n,a) = qamma(y,n,1)/a \end{aligned} \] so we reject \(H_0\) if
\(a_0T<qgamma(\alpha/2,n,1)\) or \(a_0T>qgamma(1-\alpha/2,n,1)\)
Say we want to test \(H_a:a<a_0\). Then we have the rejection region \(a_0T<qgamma(\alpha,n,1)\)
\[ p=2P(T>t|a=a_0) =2\left(1-pgamma(t,n,a_0)\right) \] if \(t>a_0\).
Let’s use again the prior Exp(1), then we know that \(a|x \sim \Gamma(n+1,T+1)\). A test could be designed as follows: reject H_0 if
\(a_0<qgamma(\alpha/2,n+1,T+1)\) or \(a_0>qgamma(1-\alpha/2,n+1,T+1)\)
But from the above we know that
\[a_0 < qgamma(alpha/2,n+1,T+1)=qgamma(\alpha/2,n+1,1)/(T+1)\]
iff
\[a_0(T+1)<qgamma(\alpha/2,n+1,1)\]
and we see that this is essentially the same as the test based on the mle. (except with n+1 instead of n and T+1 instead of T)
Let’s go back to the two tests based on the sample mean and the likelihood ratio. Which of these is best? That depends on the power of the test. First we have
\[ \begin{aligned} &x = \mu+\frac{\sigma}{\sqrt{n}}\Phi^{-1}(\alpha/2)\\ &y = \mu+\frac{\sigma}{\sqrt{n}}\Phi^{-1}(1-\alpha/2)\\ &\mu_1 =\frac{a_1}{a_1+1} \\ &\sigma_1^2 = \frac{a_1}{(a_1+1)^2(a_1+2)} \\ &1-Pow(a_1,n)=P(x<\bar{X}<y\vert a_1) = \\ &P\left(\sqrt{n}\frac{x-\mu_1}{\sigma_1}<\sqrt{n}\frac{\bar{X}-\mu_1}{\sigma_1}<\sqrt{n}\frac{y-\mu_1}{\sigma_1}\right) =\\ &\Phi(\sqrt{n}\frac{y-\mu_1}{\sigma_1})-\Phi(\sqrt{n}\frac{x-\mu_1}{\sigma_1}) \end{aligned} \]
\[ \begin{aligned} &P\left(qgamma(\alpha/2,n,1)<a_0T<qgamma(1-\alpha/2,n,1)\vert a_1\right) = \\ &P\left(qgamma(\alpha/2,n,1)\frac{a_1}{a_0}<a_1T<qgamma(1-\alpha/2,n,1)\frac{a_1}{a_0}\vert a_1\right) = \\ &pgamma(qgamma(\alpha/2,n,1)\frac{a_1}{a_0},n,1)-pgamma(qgamma(\alpha/2,n,1)\frac{a_1}{a_0},n,1) \end{aligned} \]
so
a0 <- 1; n <- 50; alpha <- 0.05
a <- seq(a0/3, 2*a0, length=250)
mu <- a0/(a0+1)
sigma <- sqrt(a0/(a0+1)^2/(a0+2))
mu1 <- a/(a+1)
sigma1 <- sqrt(a/(a+1)^2/(a+2))
xy <- mu+sigma/sqrt(n)*qnorm(c(alpha/2, 1-alpha/2))
Pow1 <- 1-(pnorm(sqrt(n)/sigma1*(xy[2]-mu1)) -
pnorm(sqrt(n)/sigma1*(xy[1]-mu1)))
Pow2 <- 1-(pgamma(a/a0*qgamma(1-alpha/2, n, 1), n, 1) -
pgamma(a/a0*qgamma(alpha/2, n, 1), n, 1))
df <- data.frame(a=c(a, a),
Power=c(Pow1, Pow2),
Test=rep(c("Wald", "Lrt"), each=250))
ggplot(data=df, aes(a, Power, color=Test)) +
geom_line()
Of course the Wald type test only works for large samples. A difficult question is how large n needs to be. Simulation can help to decide that.
We also have two tests for the case n=1. How do they relate to each other? For the test based on the likelihood ratio statistic we reject \(H_0\) if
\(T < qgamma(\alpha/2,n,a_0)\) or \(T>qgamma(1-\alpha/2,n,a_0)\)
but now \(T=-\log X\) so
\(-\log x < qgamma(\alpha/2,n,a_0)\) or \(-\log x > qgamma(1-\alpha/2,n,a_0)\)
iff
\(x<\exp[-qgamma(1-\alpha/2,n,a_0)]\) or \(x> \exp[-qgamma(\alpha/2,n,a_0)]\)
for the direct test we reject H_0 if
\(x<(\alpha/2)^{1/a0}\) or \(x>(1-\alpha/2)^{1/a0}\)
Let’s find some of these critical values
a0 <- seq(0.1, 10, length=10)
cv1 <- (alpha/2)^(1/a0)
cv2 <- (1-alpha/2)^(1/a0)
cv1a <- exp(-qgamma(1-alpha/2, 1, a0))
cv2a <- exp(-qgamma(alpha/2, 1, a0))
kable.nice(round(cbind(a0,cv1,cv1a,cv2,cv2a), 4),
do.row.names = FALSE)
a0 | cv1 | cv1a | cv2 | cv2a |
---|---|---|---|---|
0.1 | 0.0000 | 0.0000 | 0.7763 | 0.7763 |
1.2 | 0.0462 | 0.0462 | 0.9791 | 0.9791 |
2.3 | 0.2011 | 0.2011 | 0.9891 | 0.9891 |
3.4 | 0.3379 | 0.3379 | 0.9926 | 0.9926 |
4.5 | 0.4405 | 0.4405 | 0.9944 | 0.9944 |
5.6 | 0.5175 | 0.5175 | 0.9955 | 0.9955 |
6.7 | 0.5766 | 0.5766 | 0.9962 | 0.9962 |
7.8 | 0.6232 | 0.6232 | 0.9968 | 0.9968 |
8.9 | 0.6607 | 0.6607 | 0.9972 | 0.9972 |
10.0 | 0.6915 | 0.6915 | 0.9975 | 0.9975 |
and we see they are the same! Here is why:
\(qgamma(y,1,a)\) is the solution to the equation
\[ \begin{aligned} & y = \int_0^x \frac{a^1}{(1-1)!}t^{1-1}e^{-at}dt= \\ &\int_0^{x} ae^{-ay}dy= 1-e^{-ax}\\ &e^{-x} =(1-y)^{1/a} \end{aligned} \]
How about n=2? This is much trickier because one test is based on x+y, the other one on log(x)+log(y).
Here is a little simulation:
B <- 1e4
r <- matrix(0, B, 2)
for(i in 1:B) {
xy <- rbeta(2, 1/2, 1)
if(sum(xy)<0.03125 | sum(xy)>1.593) r[i,1] <- 1
if(-sum(log(xy))<0.484 | -sum(log(xy))>11.14) r[i,2] <- 1
if(r[i, 1]!=r[i, 2]) break
}
xy
## [1] 0.002283813 0.021309225
which shows that sometime one test rejects \(H_0\) while the other does not.
So then which is better? We do have a formula for the power of the likelihood ratio test, the other one needs to be done via simulation. Here is the result for \(a_0=0.5\):
a0 <- 1; B <- 1e4
xy <- matrix(rbeta(2*B, a0, 1), ncol=2)
z <- apply(xy, 1, sum)
cv <- quantile(z, c(0.025,0.975))
a <- seq(0.1, 10,length=250)
Pow1 <- rep(0, 250)
for(i in 1:250) {
xy <- matrix(rbeta(2*B, a[i],1), ncol=2)
z <- apply(xy, 1, sum)
z1 <- z[z<cv[1]]
z2 <- z[z>cv[2]]
Pow1[i] <- length(c(z1, z2))/B
}
Pow2 <- 1-(pgamma(a/a0*qgamma(0.975,2,1),2,1)-
pgamma(a/a0*qgamma(0.025,2,1),2,1))
df <- data.frame(a=c(a, a),
Power=c(Pow1, Pow2),
Test=rep(c("Wald", "Lrt"), each=250))
ggplot(data=df, aes(a, Power, color=Test)) +
geom_line()
and we see that the two tests have very similar power.