Methods for Finding Hypothesis Tests

Ad-hoc Methods

The idea here is to use some estimator of the parameter of interest and then derive a test from there.

Example (5.3.1)

Say we have \(X_1, ..., X_n\sim Pois(\lambda)\) and we want to test

\[H_0: \lambda = \lambda_0\text{ vs. }H_a: \lambda > \lambda_0\]

We know that \(\bar{x}\) is the mle of \(\lambda\) , so a test based on \(\bar{x}\) seems reasonable.

Clearly large values of \(\bar{x}\) will indicate that the alternative is more likely to be true than the null hypothesis, so a reasonable rejection region is \(\{\bar{x} >cv\}\). To find cv we need to solve the equation

\[ \begin{aligned} &\alpha = P(\bar{X} > cv | \lambda =\lambda_0) = \\ &1-P( \sum X_i \le n\times cv | \lambda =\lambda_0) = \\ \end{aligned} \]

but under the null hypothesis

\[ \sum X_i \sim \text{Pois}(n\lambda) \]

so we find cv with

cv <- qpois(1-alpha, n*lambda0)/n

The p-value of the test is

\[ \begin{aligned} &Y \sim \text{Pois}(n\lambda_0) \\ &p=P(Y \ge \sum x_i) \end{aligned} \]

Example (5.3.2)

n <- 10; lambda0 <- 3.0; alpha <- 0.05
x <- rpois(n, 4)
c(n*lambda0, sum(x))
## [1] 30 43
cat("Critical value cv: ", qpois(1-alpha, n*lambda0)/n)
## Critical value cv:  3.9
cat("p value: ", 1-ppois(sum(x)-1, n*lambda0))
## p value:  0.01481952

Next we want to test

\[H_0: \lambda = \lambda_0\text{ vs. }H_a: \lambda \ne \lambda_0\]

Now the critical region could be

\(\bar{x} <cv_1\) or \(\bar{x} >cv_2\)

It turns out, though, that the choice of \(cv_1\) and \(cv_2\) is not unique. For example,

\(cv_1=0\) and \(cv_3=qpois(1-\alpha,n \lambda_0)/n\)

would work. One popular strategy is use \(\alpha/2\) on the left and on the right, so we find

cv1 <- qpois(alpha/2, n*lambda0)/n
cv2 <- qpois(1-alpha/2, n*lambda0)/n

The p-value now is found by

\[ p=\left\{\begin{array}{ccc}2P(Y<\sum x_i\vert \lambda_0) & \text{if} & \sum x_i<n\lambda_0 \\ 2P(Y>\sum x_i\vert \lambda_0) & \text{if} & \sum x_i>n\lambda_0 \end{array}\right. \]

The factor 2 is needed because the problem is “symmetric”.

The type II error probability for the one and two-sided tests are

  • one-sided

\[\beta(\lambda_1) = P(Y<cv|\lambda_1)\]

  • two-sided

\[\beta(\lambda_1) = P(cv_1<Y<cv_2|\lambda_1)\]

power.pois <- function(n, lambda0, lrange, alpha=0.05) {
  fun <- function(lambda) {
    1-(ppois(qpois(1-alpha/2, n*lambda0), n*lambda) - 
      ppois(qpois(alpha/2, n*lambda0), n*lambda))
  }
  ggcurve(fun=fun, A=lrange[1], B=lrange[2])
}
power.pois(10, 3.0, c(0.5, 6))

Likelihood Ratio Tests

Say we want to test \[ H_0: \theta \in \Theta_0 \text{ vs } H_a: \theta \in \Theta_0^c \]

Then the likelihood ratio test statistic is defined by

\[ \lambda(\pmb{x})=\frac{\sup_{\Theta_0}L(\theta | \pmb{x})}{\sup L(\theta | \pmb{x})} \]

A likelihood ratio test (LRT) is any test that has a rejection region of the form

\[\{\pmb{x}: \lambda(\pmb{x}) \le c\}\]

The constant c here is not important, it will be found once we decide on the type I error probability \(\alpha\). It may be better to think of this as

\[\text{reject }H_0\text{ if }\lambda(\pmb{x})\text{ is small}\]

Note that the supremum in the denominator is found over the whole parameter space, so this is just like finding the mle, and then finding the corresponding value of the likelihood function.

Note that in the numerator we find the supremum over a subset of the one used in the denominator, so we always have

\[ 0 \le \lambda(\pmb{x}) \le 1 \]

The logic of the LRT is this:

  • In the denominator we have the likelihood of observing the data we did observe, given the most favorable parameters (the mle) possible.

  • In the numerator we have the likelihood of observing the data we did observe, given the most favorable parameters allowed under the null hypothesis.

  • if their ratio is much smaller than 1, then there are parameters outside the null hypothesis which are much more likely than any in the null hypothesis, and we would reject the null hypothesis.

Notice the connection to the Neyman-Pearson theory: again a test is based on the likelihood ratio. While we no longer have a theorem that guarantees optimality if the hypotheses are composite, it is still reasonable to expect this type of test to be quite good.

Example (5.3.3)

Let \(X_1, ..., X_n\sim N(\mu,\sigma)\), \(\sigma\) known. Consider testing

\[H_0: \mu=\mu_0\text{ vs. }H_1: \mu\ne\mu_0\]

Here \(\Theta_0=\{\mu_0\}\) and so the numerator of \(\lambda(\pmb{x})\) is \(L(\mu_0\vert\pmb{x})\). For the denominator we have to find the mle, which we already know is \(\bar{x}\). Therefore using (5.2.12) we find

\[ \begin{aligned} &L(\mu_0\vert\pmb{x}) = (2\pi)^{-n/2}\sigma^{-n-1}\exp \left\{-\frac{(n-1)s^2}{2\sigma^2} \right\}\exp \left\{-\frac{n(\mu_0-\bar{x})^2}{2\sigma^2} \right\}\\ &L(\bar{x}\vert\pmb{x})=(2\pi)^{-n/2}\sigma^{-n-1}\exp \left\{-\frac{(n-1)s^2}{2\sigma^2} \right\}\exp \left\{-\frac{n(\bar{x}-\bar{x})^2}{2\sigma^2} \right\} = \\ &(2\pi)^{-n/2}\sigma^{-n-1}\exp \left\{-\frac{(n-1)s^2}{2\sigma^2} \right\} \\ &\frac{L(\mu_0\vert\pmb{x})}{L(\bar{x}\vert\pmb{x})}=\exp \left\{-\frac{n(\mu_0-\bar{x})^2}{2\sigma^2} \right\} \end{aligned} \]

Now an LRT test rejects the null hypothesis if \(\lambda(\pmb{x})<c\) for some constant c. c depends on the choice of \(\alpha\). Again it is best to think of the test as rejecting \(H_0\) if " \(\lambda(\pmb{x})\) is small". But

\[\lambda(\pmb{x})=\exp \left\{-\frac{n(\mu_0-\bar{x})^2}{2\sigma^2} \right\}\]

is small iff

\[-\frac{n(\mu_0-\bar{x})^2}{2\sigma^2}\]

is small iff

\[(\mu_0-\bar{x})^2\]

is large iff

\[\vert\bar{x} -\mu_0\vert\]

is large, say

\[\vert\bar{x} -\mu_0\vert>cv\]

In other words the LRT test rejects the null hypothesis if \(\lambda(\pmb{x}\) is small, which is equivalent to \(|\bar{x} -\mu_0|\) being large.

What is the constant cv? It depends on \(\alpha\), namely

\[ \begin{aligned} &\alpha =P(\text{reject }H_0\vert H_0 \text{ true}) =\\ &P(\vert \bar{X}-\mu_0\vert>cv) = \\ &P(\vert \sqrt{n}\frac{\bar{X}-\mu_0}{\sigma}\vert>\frac{\sqrt{n}\cdot cv}{\sigma}) = \\ &2\left(1-\Phi(\frac{\sqrt{n}\cdot cv}{\sigma})\right)\\ &cv=\Phi^{-1}(1-\alpha/2)\frac{\sigma}{\sqrt{n}}=z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \end{aligned} \] for example

n <- 10; sigma=1; alpha <- 0.05
qnorm(1-alpha/2)*sigma/sqrt(n)
## [1] 0.619795

Example (5.3.4)

Let \(X_1, ..., X_n\) be a sample from a population with density

\[f(x|\theta) = e^{\theta-x}\]

if \(x>\theta\) and 0 otherwise. (This is an exponential r.v with rate 1, shifted by \(\theta\)). The likelihood function is given by

\[ \begin{aligned} &L(\theta | \pmb{x}) = \prod \exp (\theta-x_i) I_{(\theta, \infty))}(x_i) = \\ &\exp (n\theta-\sum x_i) I_{(-\infty, x_{(1)})}(\theta) \end{aligned} \]

Here is an example of what this function looks like:

n <- 10; theta0 <- 5
x <- rexp(10, 1) + 7.5
fun <- function(t)
  exp(n*t-sum(x))*ifelse(t<min(x), 1, 0)
ggcurve(fun=fun, A=6, B=8)

so it is positive and increasing on \(-\infty < \theta < x_{(1)}\), and then drops to 0. So clearly the mle of \(\theta\) is \(x_{(1)}\), the minimum. In this example it can not be found by differentiating the log-likelihood!

Let’s say we want to test

\[H_0: \theta \le \theta_0\text{ vs. }H_1: \theta>\theta_0\]

For the maximum of the likelihood function under the null hypothesis we have to consider two cases:

  • if \(x_{(1)}<\theta_0\) the maximum is at \(x_{(1)}\)
  • if \(x_{(1)}>\theta_0\) the maximum is at \(\theta_0\)

Therefore the likelihood ratio statistic is given by

\[ \begin{aligned} &\lambda(\pmb{x}) = 1\text{ if }x_{(1)}\le \theta_0\\ &\lambda(\pmb{x}) = \exp\{n\theta_0-nx_{(1)}\}\text{ if }x_{(1)}\le \theta_0 \end{aligned} \]

Here is an example of what this function looks like:

n <- 10; theta0 <- 2.5
x <- rexp(10, 1) + 7.5
fun <- function(t)
  ifelse(t<theta0, 1, exp(-n*(t-theta0)))
ggcurve(fun=fun, A=1, B=4)

An LRT rejects the null hypothesis if

\[\lambda(\pmb{x}) \le c\]

which is clearly equivalent to a test which rejects the null hypothesis if

\[X_{(1)}\ge c\]

To determine the value of c for some specific n and \(\alpha\) we need the distribution of \(X_{(1)}\):

\[ \begin{aligned} &f(x\vert\theta) = \exp \left\{\theta-x \right\}I_{[\theta,\infty]}(x) \\ &F(x\vert\theta) = \int_{\theta}^x \exp \left\{\theta-t \right\}dt= \\ &-\exp \left\{\theta-t \right\}\vert_{\theta}^x = 1-\exp \left\{\theta-x \right\} \end{aligned} \] and so

\[ \begin{aligned} &F_{X_{(1)}}(x) = P(X_{(1)}\le x) = 1-P(X_{(1)}> x) =\\ &1-P(X_1>x,...,X_n>x) = \\ &1-P(X_1>x)\cdot...\cdot P(X_n>x) = \\ &1-[1-F_{X_1}(x)]^n = \\ &1-[\exp \left\{\theta-x \right\}] ^n = \\ &1-\exp \left\{n(\theta-x) \right\} \\ &f_{X_{(1)}}(x) =n\exp \left\{n(\theta-x)\right\};x>\theta \end{aligned} \] so now

\[ \begin{aligned} &\alpha = P(X_{(1)}>cv\vert\theta_0)=1-F_{X_{(1)}}(cv)=\exp \left\{n(\theta_0-x) \right\} \\ &cv=\theta_0-\log(\alpha)/n \end{aligned} \]

The p value is

\[p=P(Y>X_{(1)}\vert\theta_0)=\exp \left\{n(\theta_0-X_{(1)})\right\}\]

n <- 10; theta0 <- 2.5
cat("cv: ", theta0-log(alpha)/n)
## cv:  2.799573
cat("p value: ", min(1, exp(n*(theta0-min(x)))))
## p value:  2.702004e-23

The power of the test is given by

\[Pow(\theta_1)=P(X_{(1)}> cv\vert\theta_1)=\exp \left\{n(\theta_1-\theta_0+\log(\alpha)/n)\right\}\]

Asymptotic Distribution of the LRT, Wilk’s Theorem

Theorem (5.3.5)

Wilk

Suppose \(X_1, ..., X_n\) are iid \(f(x|\theta)\) and we wish to test

\[ H_0: \theta \in \Theta_0\text{ vs. }H_1:\theta \notin \Theta_0 \]

Then under some regularity conditions the distribution of \(-2 \log \lambda (\pmb{X})\) converges to the distribution of a \(\chi^2(p)\). Here p is difference of the number of free parameters in \(\Theta\) and the number of free parameters in \(\Theta_0\).


Note Let \(\hat{\theta}\) be the mle of \(\theta\), and denote by \(\hat{\hat{\theta}}\) the maximum under the null. Then

\[ -2 \log\lambda({\pmb{x}}) = -2\log\frac{L(\hat{\hat{\theta}})}{L(\hat{\theta})} =2\left(l(\hat{\theta})-l(\hat{\hat{\theta}}) \right) \]

where \(l\) is the log-likelihood function.

Example (5.3.6)

We flip a coin 1000 times and find 545 heads. Test at the 5% whether this is a fair coin.

In general we have \(X_1, .., X_n\sim Ber(p)\) and we want to test

\[H_0:p=p_0\text{ vs }H_1:p\ne p_0\]

Let’s find the LRT test for this problem. First we have

\[L(p\vert\pmb{x})=p^k(1-p)^{n-k}\] where \(k=sum x_i\).

Therefore

\[l(p\vert\pmb{x})=\log L(p\vert\pmb{x})= k\log p+(n-k)\log(1-p)\]

We already know that the mle is \(\hat{p}=\bar{x}\) (see (4.2.4)), and so

\[-2\log \lambda(\pmb{x})=2\left(k\log \hat{p}+(n-k)\log(1-\hat{p})-k\log p_0-(n-k)\log(1-p_0)\right)=\] \[2\left(k\log \frac{\hat{p}}{p_0}+(n-k)\log\frac{1-\hat{p}}{1-p_0}\right)\]

The lrt looks like this:

n <- 1000; p0 <- 0.5
k <- 400:600
lrt <- 2*(k*log(n*p0/k)+(n-k)*log(n*(1-p0)/(n-k)))
df <- data.frame(k=k, lrt=lrt)
ggplot(df, aes(k, lrt)) + geom_point()

it is clear that

\[-2 \log\lambda(\pmb{x})\] is small iff k much smaller or much larger than \(np_0\) iff

\[|k-np_0|\] is large

Now let \(Y = \sum X_i\), then \(Y \sim Bin(n,p_0)\) and

\[ \begin{aligned} &\alpha = P(|Y-np_0|>\text{cv}) = \\ &1-P(|Y-np_0|\le \text{cv}| p =p_0) = \\ &1-P(-\text{cv} \le Y-np_0\le \text{cv}) = \\ &1-P(np_0-\text{cv} \le Y\le \text{cv}+np_0) \end{aligned} \]

For our test we have n=1000, \(p_0=0.5\), so \(np_0 = 500\). Let’s say we want \(\alpha=0.05\), then

1-diff(pbinom(500+c(-1,1)*20, 1000, 0.5))
## [1] 0.2061073
1-diff(pbinom(500+c(-1,1)*30, 1000, 0.5))
## [1] 0.05785052
1-diff(pbinom(500+c(-1,1)*31, 1000, 0.5))
## [1] 0.04998452

and we find \(cv=31\), and so we reject the null hypothesis because

\[|k-np_0|=|545-1000\cdot 0.5|=45>31\] We conclude that the coin is not fair.

How about using the chisquare approximation? In that case

\[T=-2 \log \lambda (\pmb{X}) \sim \chi^2(1)\]

so

\[ \begin{aligned} &T=2\left(k\log \frac{\hat{p}}{p_0}+(n-k)\log\frac{1-\hat{p}}{1-p_0}\right) = \\ &2\left[545\log(\frac{545}{500})+(455)\log(\frac{455}{500} ) \right] = 8.11\\ &\alpha=P(T>\text{cv}) \\ \end{aligned} \]

qchisq(1-0.05, 1)
## [1] 3.841459

and again we reject \(H_0\), now because T=8.11>cv=3.84


Why do we have this approximation? If \(H_0\) is true \(X \sim Bin(n,p_0)\), so \(E[X]=np_0\), so \(X \approx np_0\).

Recall that the Taylor series expansion of log(x+1) at x=0 is

\[ \log(x+1) \approx x-x^2/2 \] and so (using k instead of x) we find

\[ \begin{aligned} &-2\log\lambda(\pmb{x}) = \\ &2\left(k\log \frac{\hat{p}}{p_0}+(n-k)\log\frac{1-\hat{p}}{1-p_0}\right) = \\ &2\left(k\log \frac{k/n}{p_0}+(n-k)\log\frac{1-k/n}{1-p_0}\right) = \\ &2\left(k\log \frac{k}{np_0}+(n-k)\log\frac{n-k}{n(1-p_0)}\right) = \\ &2\left(k\log \left[(\frac{k}{np_0}-1)+1\right]+(n-k)\log\left[(\frac{n-k}{n(1-p_0)}-1)+1\right]\right) \approx \\ &2k\left[(\frac{k}{np_0}-1)-(\frac{k}{np_0}-1)^2/2\right]+\\ &2(n-k)\left[(\frac{n-k}{n(1-p_0)}-1)-(\frac{n-k}{n(1-p_0)}-1)^2/2\right]= \end{aligned} \] \[ \begin{aligned} &2k(\frac{k-np_0}{np_0})-k(\frac{k-np_0}{np_0})^2+\\ &2(n-k)(\frac{n-k-n(1-p_0)}{n(1-p_0)})-(n-k)(\frac{n-k-n(1-p_0)}{n(1-p_0)})^2=\\ &2k(\frac{k-np_0}{np_0})-k(\frac{k-np_0}{np_0})^2+\\ &2(n-k)(\frac{np_0-k)}{n(1-p_0)})-(n-k)(\frac{np_0-k}{n(1-p_0)})^2=\\ &\frac{k-np_0}{np_0(1-p_0)}\left[2k(1-p_0)-k(1-p_0)\frac{k-np_0}{np_0}+2(n-k)p_0-(n-k)p_0\frac{k-np_0}{n(1-p_0)}\right] =(*) \end{aligned} \]

Now

\[ \begin{aligned} &2k(1-p_0)-k(1-p_0)\frac{k-np_0}{np_0}-2(n-k)p_0-(n-k)p_0\frac{k-np_0}{n(1-p_0)} = \\ &2k-2kp_0-k\frac{k-np_0}{np_0}+kp_0\frac{k-np_0}{np_0}-2np_0+2kp_0-np_0\frac{k-np_0}{n(1-p_0)}+kp_0\frac{k-np_0}{n(1-p_0)} = \\ &2(k-n_0) -k\frac{k-np_0}{np_0}+k\frac{k-np_0}{n}-p_0\frac{k-np_0}{1-p_0}+kp_0\frac{k-np_0}{n(1-p_0)} = \\ &(k-np_0)\left[2 -k\frac{1}{np_0}+k\frac{1}{n}-p_0\frac{1}{1-p_0}+kp_0\frac{1}{n(1-p_0)}\right] \end{aligned} \]

Noting that under the null hypothesis \(k\approx np_0\), or \(\frac{k}{np_0}\approx 1\), we have

\[ \begin{aligned} &(*)= \frac{(k-np_0)^2}{np_0(1-p_0)} \left[2 -\frac{k}{np_0}+\frac{k}{n}-\frac{p_0}{1-p_0}+\frac{kp_0}{n(1-p_0)}\right] = \\ &\frac{(k-np_0)^2}{np_0(1-p_0)} \left[2 -\frac{k}{np_0}+\frac{k}{np_0}p_0-\frac{p_0}{1-p_0}+\frac{k}{np_0}\frac{p_0^2}{1-p_0}\right] \approx \\ &\frac{(k-np_0)^2}{np_0(1-p_0)} \left[2 -1+p_0-\frac{p_0}{1-p_0}+\frac{p_0^2}{1-p_0}\right] = \\ &\frac{(k-np_0)^2}{np_0(1-p_0)}\frac{1-p_0+p_0(1-p_0)-p_0+p_0^2}{1-p_0}=\\ &\frac{(k-np_0)^2}{np_0(1-p_0)}=\left(\frac{k-np_0}{\sqrt{np_0(1-p_0)}}\right)^2\sim\chi(1) \end{aligned} \]

Example (5.3.7)

say \(X_1, .., X_n\sim Beta(\alpha,\beta)\) and we want to test

\[H_0:\alpha=\beta\text{ vs. }H_1:\alpha\ne\beta\]

Now

\[ \begin{aligned} &f(\pmb{x}\vert\alpha,\beta) =\prod_{i=1}^n \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)})^n x_i^{\alpha-1}(1-x_i)^{\beta-1} = \\ &\left(\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\right)^n \left[\prod_{i=1}^nx_i\right]^{\alpha-1}\left[\prod_{i=1}^n(1-x_i)\right]^{\beta-1} \\ &\text{ }\\ &l(\alpha,\beta\vert\pmb{x}) = \\ &n\log \Gamma(\alpha+\beta)-n\log \Gamma(\alpha)-n\log \Gamma(\beta)+\\ &(\alpha-1)\sum_{i=1}^n\log x_i+(\beta-1)\sum_{i=1}^n\log (1-x_i) \end{aligned} \]
For the numerator of the LRT statistic we assume the null hypothesis is true: \(\alpha=\beta\), and estimate \(\alpha\) using Newton’s method. We also make use of some functions built into R: lgamma, digamma and trigamma compute the log of the gamma function and it’s first and second derivative. So let \(g(x)=\log\Gamma(x)\), \(A=\sum_{i=1}^n\log x_i\) and \(B=\sum_{i=1}^n\log (1-x_i)\), then

\[ \begin{aligned} &h(t) =ng(2t)-2ng(t)+(t-1)(A+B) \\ &h'(t) =2ng'(2t)-2ng'(t)+A+B \\ &h''(t) =4ng''(2t)-2ng''(t) \end{aligned} \]

for the denominator we have

\[ \begin{aligned} &h(t,s) = ng(t+s)-ng(t)-ng(s)+(t-1)A+(s-1)B\\ &h_t(t,s) = ng'(t+s)-ng'(t)+A\\ &h_s(t,s) = ng'(t+s)-ng'(s)+B\\ &H[1,1] = ng''(t+s)-ng''(t)\\ &H[1,2] =H[2,1] = ng''(t+s)\\ &H[2,2] = ng''(t+s)-ng''(s)\\ \end{aligned} \] After computing the suprema we can find the test statistic. All of this is implemented here:

lrt.beta <- function (x, n=100, alpha=1, beta=1, 
                      Show=TRUE) {
    l <- function(alpha, beta) {
        n*(lgamma(alpha + beta) - lgamma(alpha) -
          lgamma(beta)) + (alpha-1)*A+(beta-1)*B
    }
    if (missing(x)) 
        x <- rbeta(n, alpha, beta)
    else n <- length(x)
    A <- sum(log(x))
    B <- sum(log(1 - x))
    real.alpha <- alpha
    k <- 0
    repeat {
        k <- k + 1
        alphaold <- alpha
        h <- 2*n*(digamma(2*alpha)-digamma(alpha))+A+B
        hprime <- 2*n*(2*trigamma(2*alpha) -
                         trigamma(alpha))
        alpha <- alpha - h/hprime
        if (abs(alpha - alphaold) < 10^(-5)) 
            break
        if (k > 50) 
            return(2)
    }
    if (Show) cat("Supremum under Null Hypothesis is at",
                  round(alpha, 3), "\n")
    num <- l(alpha, alpha)
    alpha <- real.alpha
    h <- c(0, 0)
    H <- matrix(0, 2, 2)
    k <- 0
    repeat {
        k <- k + 1
        xold <- c(alpha, beta)
        h[1] <- n*(digamma(alpha+beta) - 
                     digamma(alpha)) + A
        h[2] <- n*(digamma(alpha + beta) - 
                     digamma(beta)) + B
        H[1, 1] <- n*(trigamma(alpha + beta) -
                      trigamma(alpha))
        H[1, 2] <- n*trigamma(alpha+beta)
        H[2, 1] <- H[1, 2]
        H[2, 2] <- n*(trigamma(alpha+beta) -
                      trigamma(beta))
        xnew <- xold - solve(H) %*% cbind(h)
        if (sum(abs(xnew - xold)) < 10^(-5)) 
            break
        if (k > 50) 
            return(2)
        alpha <- xnew[1]
        beta <- xnew[2]
    }
    if (Show) {
        cat("MLE's are: ", round(c(alpha, beta), 3), "\n")
    }
    denom <- l(alpha, beta)
    TS <- (-2)*(num - denom)
    crit <- qchisq(0.95, 1)
    if (Show) {
        cat("-2log(LRT)=", round(TS, 3), ifelse(TS>crit,">","<")," crit val=", 
            round(crit, 3), "\n")      
        if(TS>crit) cat("Conclusion: reject null hypothesis\n")
        else cat("Conclusion: fail to reject null hypothesis\n")
        cat("p-value: ", round(1-pchisq(TS, 1), 4),"\n")
    }
    
}
lrt.beta(alpha=1.0, beta=1)
## Supremum under Null Hypothesis is at 1.148 
## MLE's are:  1.084 1.26 
## -2log(LRT)= 2.013 <  crit val= 3.841 
## Conclusion: fail to reject null hypothesis
## p-value:  0.156
lrt.beta(alpha=1.0, beta=1.5)
## Supremum under Null Hypothesis is at 1.272 
## MLE's are:  1.185 1.687 
## -2log(LRT)= 11.804 >  crit val= 3.841 
## Conclusion: reject null hypothesis
## p-value:  6e-04

Example (5.3.8)

Let \(X_1, ..., X_n\) be a sample from a population with density

\[f(x\vert\theta) = e^{\theta-x}\]

if \(x>\theta\) and 0 otherwise. We saw before in (5.3.4) that

\[ \lambda(\pmb{x}) = \left\{ \begin{array}{lll} 1&\text{if}&x_{(1)}\le \theta_0\\ n\exp \left\{-n(x_{(1)}-\theta_0)\right\}&\text{if}&x_{(1)}> \theta_0\\ \end{array} \right. \]

therefore

\[ -2\log \lambda(\pmb{x}) = \left\{ \begin{array}{lll} 0&\text{if}&x_{(1)}\le \theta_0\\ 2n(x_{(1)}-\theta_0)&\text{if}&x_{(1)}> \theta_0\\ \end{array} \right. \]

and so

\[ \begin{aligned} &P(-2\log \lambda(\pmb{X})<x) = \\ &P(2n(X_{(1)}-\theta_0)<x) = \\ &P(X_{(1)}<x/2n+\theta_0) = \\ &1-\exp \left\{n[\theta-(x/2n+\theta_0)] \right\}=\\ &1-e^{-x/2} \end{aligned} \] or \(-2\log \lambda(\pmb{X})\sim Exp(1/2)=\chi^2(2)\ne \chi^2(1)\), and so here we have a case where Wilk’s theorem fails.


Let’s consider the following general problem: we have data \(X_1, ..., X_n\) iid \(f(x\vert\theta)\), where \(\theta\) is a one-dimensional parameter. We wish to use the LRT for testing

\[H_0: \theta = \theta_0\text{ vs }H_1: \theta \ne \theta_0\]

Now for the denominator we need to find the mle of \(\theta\), that is we have to find the maximum of

\[\prod f(x_i\vert \theta)\]

or as we usually do, the maximum of

\[\sum \log\{f(x_i\vert\theta)\}\]

Usually we find the mle analytically, but if the density f is nice enough we can do this completely automatically:

lrtfun <- function (x, df, theta0, Int) {
    g <- function(theta) sum(log(df(x,theta)))
    thetahat <- optimize(g, Int, maximum =TRUE)$maximum
    chi <- 2*(g(thetahat)-g(theta0))
    1-pchisq(chi,1)
}

Example (5.3.9)

\(X_1, ..., X_n\sim N(\mu,\sigma)\), \(\sigma\) known

\[H_0: \mu = \mu_0\text{ vs }H_1: \mu\ne\mu_0\]

x <- rnorm(50)
lrtfun(x, df=dnorm, theta0=0, Int=c(-5,5))
## [1] 0.1691898
lrtfun(x, df=dnorm, theta0=0.25, Int=c(-5,5))
## [1] 0.001674667

Example (5.3.10)

\(X_1, ..., X_n\sim N(\mu,\sigma)\), \(\mu\) known,

\[H_0: \sigma = \sigma_0\text{ vs }H_1: \sigma\ne\sigma_0\]

x <- rnorm(50, 0, 4)
lrtfun(x,
       df=function(x, sig) {dnorm(x, 0, sig)},
       theta0=4, Int=c(0, 10))
## [1] 0.3967211
lrtfun(x,
       df=function(x, sig) {dnorm(x, 0, sig)},
       theta0=5, Int=c(0, 10))
## [1] 0.004906238

Example (5.3.11)

\(X_1, ..., X_n\sim Gamma(\alpha,\beta)\), \(\alpha\) known,

\[H_0: \beta = \beta_0\text{ vs }H_1: \beta \ne \beta_0\]

x <- rgamma(25, 1, 2)
lrtfun(x,
       df=function(x, b) {dgamma(x, 1, b)},
       theta0=2, Int=c(0, 5))
## [1] 0.2536346
lrtfun(x,
       df=function(x, b) {dgamma(x, 1, b)},
       theta0=2.5, Int=c(0, 5))
## [1] 0.0167621

Does this actually work? Let’s do a small simulation, testing for the standard deviation of a normal distribution:

pvals <- rep(0, 10000)
for(i in 1:10000)  
  pvals[i] <- lrtfun(x=rnorm(50, 0, 4), 
                df=function(x, sig) {dnorm(x, 0, sig)},
                theta0=4, Int=c(0, 10))
df <- data.frame(x=pvals)
bw <- 1/50 
ggplot(df, aes(x)) +
  geom_histogram(aes(y = ..density..),
    color = "black", 
    fill = "white", 
    binwidth = bw) + 
    labs(x = "x", y = "Density") 

Histogram appears flat (uniform), so the test achieves the nominal type I error probability.

And a case where the null is false:

pvals <- rep(0, 10000)
for(i in 1:10000)  
  pvals[i] <- lrtfun(x=rnorm(50, 0, 4.5), 
                df=function(x, sig) {dnorm(x, 0, sig)},
                theta0=4, Int=c(0, 10))
df <- data.frame(x=pvals)
bw <- 1/50 
ggplot(df, aes(x)) +
  geom_histogram(aes(y = ..density..),
    color = "black", 
    fill = "white", 
    binwidth = bw) + 
    labs(x = "x", y = "Density") 

Large Sample Tests based on the CLT

Suppose we wish to test a hypothesis about a real-valued parameter \(\theta\), and \(T_n\) is a point estimator of \(\theta\). Say \(\sigma_n\) is the standard deviation of \(T_n\). Now if some form of the CLT shows that

\[(T_n-\theta)/\sigma_n\]

converges in distribution to N(0,1) we can use this as a basis for a test.

Sometimes \(\sigma_n\) also depends on unknown parameters. In that case we can use an estimate of \(\sigma_n\) such as \(S_n\) instead.

A test based on

\[Z_n = (T_n-\theta)/S_n\]

is often called a Wald test.

Example (5.3.12)

Let \(X_1, ..., X_n\sim Ber(p)\).

Consider testing

\[H_0: p=p_0\text{ vs }H_1: p \ne p_0\]

The MLE of p is \(\hat{p}=\bar{x}\),so the CLT applies and states that for any p

\[\sqrt{n} \frac{\bar{x}-p}{\sqrt{p(1-p)}}\rightarrow N(0,1) \]

Of course we don’t know p, but again we can estimate the p’s in the denominator by \(\hat{p}\) and so we get a test with the test statistic

\[ Z_1=\sqrt{n}\frac{\hat{p}-p_0}{\sqrt{\hat{p}(1-\hat{p})} } \]

and we reject the null hypothesis is \(|Z_1| > z_{\alpha/2}\).

Instead of replacing the p’s in the denominator by \(\hat{p}\) we could also have used its value under the null hypothesis, p0. Then another test is based on

\[ Z_2=\sqrt{n}\frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)} } \]

which rejects the null hypothesis if \(|Z_2| > z_{\alpha/2}\).

Which of these tests is better? Well, that depends on the power function.

bernoulli.power <- function(n=100, p0=0.2, 
                            B=10000, alpha=0.05) {
  crit <- qnorm(1-alpha/2)
  p <- seq(0.01, 0.39, length = 100)
  Pow1 <- 0*p
  for (i in 1:100) {
    x <- rbinom(B, size = n, prob = p[i])
    z <- sqrt(n)*(x/n-p0)/sqrt(x/n*(1-x/n))
    Pow1[i] <- mean(ifelse(abs(z)>crit, 1, 0))
  }
  Pow2 <- 1-(pbinom(n*p0+ crit*sqrt(n*p0*(1-p0)), n, p) -
             pbinom(n*p0-crit*sqrt(n*p0*(1-p0)), n, p))
  df <- data.frame(p=c(p, p),
                   Power=c(Pow1, Pow2),
                   which=rep(c("Z1", "Z2"), each=100))
  df
}
df <- bernoulli.power()
ggplot(data=df, aes(p, Power, color=which)) +
  geom_line()

Here the \(Z_1\) curve is found via simulation, whereas the \(Z_2\) power curve can be calculated directly.

As we see the power curves cross, so it depends on the true value of p which test is better.

If we suspect that p>0.2 we might prefer the \(Z_2\) test, otherwise the Z test.

Bayesian Hypothesis Testing

In the Bayesian framework hypothesis tests are based on

\[P(H_0\text{ is true }|\pmb{x})\]

and

\[P(H_1\text{ is true }|\pmb{x})\]

In many ways these probabilities are exactly what a researcher desires to know, but they can only be found at the price of a prior distribution.

A Bayesian hypothesis test might then reject the null hypothesis if

\[P(H_1 \text{ is true} | \pmb{x}) > 1-\alpha\]

Example (5.3.13)

Let \(X_1, ..., X_n\sim N(\mu,\sigma)\) and let the prior distribution on \(\mu\) be \(N(\mu_0,\tau)\), where \(\sigma\), \(\mu_0\) and \(\tau\) are known.

Say we wish to test

\[H_0: \mu=\mu_0\text{ vs }H_1: \mu \ne \mu_0\]

It can be shown that then the posterior distribution \(\pi(\mu |\pmb{x})\) is

\[N(n \tau^2 \bar{x} +\sigma^2\mu)/(n \tau ^2+\sigma^2),\sigma\tau \sqrt(n \tau^2+\sigma^2))\]

Say we decide to accept

\(H_0\) if

\[P(H_1 \text{ is true} | \pmb{x}) > P(H_1 \text{ is true} | \pmb{x})\]

and reject \(H_0\) otherwise. So we reject \(H_0\) if

\[P(\mu \le \mu_0|\pmb{x}) \ge 1/2\]

Since \(\pi(\mu|\pmb{x})\) is symmetric, this is true iff the mean of \(\pi(\mu|\pmb{x})\) is less than or equal to \(\mu_0\).

Therefore \(H_0\) is accepted if

\[\bar{x}\le\mu_0 + \sigma^2(\mu_0-\mu)/(n \tau^2)\]

Tests based on Simulation

The idea here is very simple: generate lots of simulated data from the same distribution as the real data, assuming the null hypothesis is true. For each run compute the corresponding test statistic, and then compare these values to the one from the data.

Example (5.3.14)

Say X1, .., Xn are iid Pois(\(\lambda\)). We wish to test

\[H_0: \lambda = \lambda_0\text{ vs }H_1: \lambda\ne\lambda_0\]

We know that the sample mean \(\bar{x}\) is the mle of \(\lambda\), and so we can base a test on \(\bar{x}\) and we reject \(H_0\) if \(\bar{x}\) is to far from \(\lambda\).

For this we compute B simulated data sets \(X'_1, ..., X'_n\sim Pois(\lambda_0)\), compute their sample means and find the \(\alpha/2\) and the \(1-\alpha/2\) quantiles.

This will give us the critical values cv1 and cv2, and we reject \(H_0\) if either

\(\bar{x} < cv_1\) or \(\bar{x} > cv_2\)

poissim.power <- function(lambda0=10, n=50, 
                         B=1000, alpha=0.05) {
  xbar <- apply(matrix(rpois(n*B, lambda0), B, n), 1, mean)
  cv <- as.numeric(quantile(xbar, c(alpha/2, 1-alpha/2)))
  lambda <- seq(max(0, lambda0-5*sqrt(lambda0/n)), 
              lambda0+5*sqrt(lambda0/n), length = 100)
  Pow1 <- 0*lambda
  for(i in 1:100) {
    xbar <- apply(matrix(rpois(n*B, lambda[i]), B, n), 
                1, mean)
    Pow1[i] <- sum(ifelse(xbar<cv[1] | xbar>cv[2], 1, 0))/B
  }
  Pow2 <- 1-(ppois(qpois(1-alpha/2, n*lambda0), n*lambda) - 
        ppois(qpois(alpha/2, n*lambda0), n*lambda))
  df <- data.frame(lambda=c(lambda, lambda),
        Power=c(Pow1, Pow2),
        which=rep(c("Simulation", "Exact"), each=100))
  df
}
df <- poissim.power()
ggplot(data=df, aes(lambda, Power, color=which)) +
  geom_line()

The blue curve is the exact test we discussed earlier, and we see that the simulation test is just about as good.

There are also some tests already built on simulation. Say we have X1, .., Xn from some distribution f(x) with mean \(\mu_1\) and Y1, .., Ym from the same distribution f(x) with mean \(\mu_2\). We wish to test

\[H_0: \mu_1=\mu_2\text{ vs }H_1: \mu_1\ne\mu_2\]

Now under the null hypothesis the two samples come from the exact same distribution with the same mean. So their order is completely random, and if we find the test statistic

\[ T=\bar{x}-\bar{y} \]

it should have a mean of 0. We could now proceed as above, generating data from f etc., but here is another idea:

  • put the x’s and the y’s in one vector of length n+m
  • find a random permutation of this vector
  • split it up into the first n and the last m numbers, calling them x’ and y’

Because under \(H_0\) any rearrangement of the data is just as likely as any other, this new data set is just as good as the original one. So we can now find T’ from these observations. Because we have completely mixed the X’s and the Y’s we have ET’=0 for sure. Generating many T’s we get an idea of “likely” values of T, and can compare the one in the real data set to them.

This is called a permutation test.

Example (5.3.15)

Say we have \(X_1, ..., X_n\sim N(\mu_1,\sigma)\) and \(Y_1, ..., Y_m\sim N(\mu_2,\sigma)\).

Now the LRT test for this is based on

\[T= \frac{\bar{x}-\bar{y}}{s\sqrt{1/n+1/m}}\]

where

\[s^2= \frac{(n-1)s_x^2+(m-1)s_y^2}{n+m-2}\]

which has a t distribution with n+m-2 degrees of freedom, and we reject \(H_0\) if \(|T|>qt(1-\alpha/2, n+m-2)\).

Let’s compare this test (which is known to be optimal) to a permutation test.

The 2-sample t test is already implemented in R in t.test, and we can find its power either via simulation or analytically. The permutation test is implemented in

perm.test <- function (x, y, alpha=0.05, B = 1000) {
  n <- length(x)
  m <- length(y)
  Tdata <- mean(x) - mean(y)
  Tsim <- rep(0, B)
  for (i in 1:B) {
    xy <- sample(c(x, y), size = n + m)
    Tsim[i] <- mean(xy[1:n]) - mean(xy[(n + 1):(n + m)])
  }
  length(Tsim[abs(Tsim) > abs(Tdata)])/B
}

as an example:

x <- rnorm(50, 10, 2)
y <- rnorm(40, 10, 2)
t.test(x, y)$p.value
## [1] 0.6053234
perm.test(x, y)
## [1] 0.603
y <- rnorm(40, 11, 2)
t.test(x, y)$p.value
## [1] 0.008007896
perm.test(x, y)
## [1] 0.007
perm.power <- function(mu=c(0, 0), sigma=c(1, 1), 
                       n=c(50, 50), alpha=0.05, B=1000) {
  pvals <- matrix(0, B, 2)
  for(i in 1:B) {
    x <- rnorm(n[1], mu[1], sigma[1])     
    y <- rnorm(n[2], mu[2], sigma[2])     
    pvals[i, 1] <- t.test(x, y)$p.value
    pvals[i, 2] <- perm.test(x, y)
  }
  c(sum(pvals[, 1]<alpha), sum(pvals[, 2]<alpha))/B
}
mu <- seq(0, 0.8, length=10)
power <- matrix(0, 10, 3)
colnames(power) <- c("mu2", "t-test", "permutation")
power[, 1] <- mu
for(i in 1:10) 
  power[i, 2:3] <- perm.power(mu=c(0, mu[i]))
power <- round(power, 3)
kable.nice(power, do.row.names = FALSE)
mu2 t-test permutation
0.000 0.053 0.052
0.089 0.066 0.065
0.178 0.132 0.127
0.267 0.258 0.257
0.356 0.392 0.392
0.444 0.561 0.556
0.533 0.743 0.742
0.622 0.856 0.857
0.711 0.939 0.938
0.800 0.976 0.977

and we see that the permutation test has a power quite similar to the t test.