Methods for Finding Hypothesis Tests

Direct Approach

Example Say we have X₁, .., X_n ~ Pois(λ) and we want to test H₀: λ=λ₀ vs. H₁: λ > λ₀

We know that X̅ is the mle of λ, so a test based on X̅ seems reasonable.

Clearly large values of X̅ will indicate that the alternative is more likely to be true than the null hypothesis, so a reasonable rejection region is {X̅>cv}. To find cv we need to solve the equation

α = P(X̅>cv | λ=λ₀) = 1 - P( X₁+..+X_n ≤ n·cv | λ=λ₀),

but under the null hypothesis

X_i~P(λ₀)

S_n = X₁+..+X_n ~ P(nλ₀)

so cv = qpois(1-α,nλ₀)/n

The p-value of the test is

p=P(Y> s _n | λ=λ₀)

where Y~P(nλ₀)

What if we want to test

H₀: λ=λ₀ vs. H₁: λ≠λ₀

Now the critical region could be

{X̅<cv₁ or X̅>cv₂}

Again the choice of cv₁ and cv₂ is not unique. For example , cv₁ = 0 and cv₂ = qpois(1-α,nλ₀)/n would work. As before we can devide α in two:

cv₁ = qpois(α/2,nλ₀)/n and cv₂ = qpois(1-α/2,nλ₀)/n.

The p-value now is found by

So the idea here is this:

• Find a point estimator for the parameter in question, say S

• Find a function of S, say T(S), such that you know the distribution of T(S) and can solve the equation

α = P(T(S)<c)

A special case of this idea used the following

Definition

A function T of the data and the parameters is called a pivot if the distribution of T does not depend on the parameters

Example: Say X₁, .., X_n ~ N(μ,σ), then

T = √n(X̅ -μ)/σ ~ N(0,1)

and therefore is a pivot.

Example: say we the following sample from a N(μ,10) and we want to test at the 5% level

H₀: μ=50 vs H_a: μ>50

66.5 55.3 78 63.1 65.5 51.2 56.6 64.9 71.8 58.4 58.7 70.4 53.7 66 48.1 60.6 62.9 70.6 44.8 54.4 54.4 37.2 77.1 48.7 55.8

Now we should reject H₀ if X̅>c, which is equivalent to T>c, and so

0.05 = α =P(T>c')

so c'=1.65

Now X̅ = 59.788, so T=√25(59.788-50)/10 = 4.894 > 1.65,

so we reject the null hypothesis.

Large Sample Tests based on the CLT

Suppose we wish to test a hypothesis about a real-valued parameter θ, and T_n is a point estimator of θ. Say σ_n is the standard deviation of T_n. Now if some form of the CLT shows that (T_n-θ)/σ_n converges in distribution to N(0,1) we have a pivot, at least asymptotically.

Sometimes σ_n also depends on unknown parameters. In that case we can use an estimate of σ_n such as S_n instead.

A test based on Z_n = (T_n-θ)/S_n is often called a Wald test.

Example : Let X₁, .., X_n be iid Ber(p). Consider testing

H₀: p=p₀ vs. H_a: p≠p₀

The MLE of p is p̂ =X̅, so the CLT applies and states that for any p

Of course we don't know p, but again we can estimate the p's in the denominator by p̂ and so we get a test with the test statistic

and we reject the null hypothesis is |Z_n| > z_a/2.

Instead of replacing the p's in the denominator by p̂ we could also have used p₀. Then another test is based on

which rejects the null hypothesis if |Z'_n| > z_a/2.

Which of these tests is better? Well, that depends on the power function. Here are the curves for p₀=0.2, the one for Z_n in blue:

As we see the power curves cross, so it depends on the true value of p which test is better. If we suspect that p>0.2 we might prefer the Z' test, otherwise the Z test.

Likelihood Ratio Tests

Definition

Say we want to test H₀: θ θ₀ vs. H₁: θ θ₀^c. Then the likelihood ratio test statistic is defined by

A likelihood ratio test (LRT) is any test that has a rejection region of the form {x: λ(x) ≤ c}

The constant c here is not important, it will be found once we decide on the type I error probability α. It may better to think of this as

"reject H₀ if λ(x) is small"

Note that the supremum in the denominator is found over the whole parameter space, so this is just like finding the mle, and then finding the corresponding value of the likelihood function.

Note that in the numerator we find the supremum over a subset of the one used in the denominator, so we always have

λ(x) ≤ 1

The logic of the LRT is this: In the denominator we have the likelihood of observing the data we did observe, given the most favourable paramaters (the mle) possible. In the numerator we find the likelihood assuming the null hypothesis is true. If their ratio is much smaller than 1, then there are parameters outside the null hypothesis which are much more likely than any in the null hypothesis, and we would reject the null hypothesis.

Example : Let X₁, .., X_n be a sample from N(θ,1). Consider testing H₀: θ=θ₀ vs. H₁: θ≠θ₀. Here Θ₀={θ₀} and so the numerator of λ(x) is L(θ₀|x). For the denominator we have to find sup{L(θ|x),θ }:

and so

Now an LRT test rejects the null hypothesis if λ(X) < c for some constant c. c depends on the choice of α. Again it is best to think of the test as rejecting H₀ if "λ(X) is small". But

λ(X)=exp{-n/2(X̅-θ₀)²} is small iff

-n/2(X̅-θ₀)² is small iff

(X̅-θ₀)² is large iff

|X̅-θ₀| is large, say |X̅-θ₀|>c

In other words the LRT test rejects the null hypothesis if λ(X) is small, which is equivalent to |X̅-θ₀| being large.

What is the constant c? It depends on α, namely

so c = z_α/2/√n

For example , say n=10 and we want α=0.05, then c = z_0.025/√10 = 1.96/√10 = 0.62

Example : You are at a charity event. There is going to be a raffle, with the grand price a car! A ticket costs $100, so you decide to by one if your chance of winning is at least 1 in 100. The tickets are numbered from 1 to N. You walk around a little and you see the following tickets:

21 45 68 91

Should you buy a ticket?

If a total of N tickets are sold, your chance of winning is 1/N, so you should buy one of N≤100. Therefore we need to do a hypothesis test

H₀: N₀≤100 vs H_a: N₀>100

What is our probability model? Let X_i be the number on the i^th ticket. The exact distribution of X₁, .., X_n is bit complicated because each ticket is unique, so X_i≠ X_j. We will simplify this a bit by ignoring this issue. Then we can assume that the X_i are independent and

X₁, .., X_n ~U{1,..,N}, or

Note that I_{M,..}(M} =1 always but I_{M,..}(N₀) could be 0 or 1 because we don't know what N₀ is.

Here is what this looks like as a function of M:

so here

"λ(x) is small" is the same as "M is small or M is large"

Because we are interested in the alternative N₀>100 we can ignore the "M is small" option, and we get the rejection region

reject H₀ if M>c

This of course makes perfectly good sense: if we see a ticket with a large number, this tells us that many tickets have been sold and we should not buy one!

So, how about c? As always it depends on α:

Using α=0.05 we find

c = N₀(1-α)_^1/n = 100(1-0.05)^1/5 = 98.97 = 99

so as long as the ticket with the largest number is 98 or less, we should buy one.

Asymptotic Distribution of the LRT

Definition

Suppose X₁, .., X_n are iid f(x|θ) and we wish to test

H₀: θ Θ₀ vs. H₁: θ Θ₀^c

Then under some regularity conditions the distribution of -2logλ(X) converges to the distribution of a χ²(p). Here p is difference of the number of free parameters in Θ and the number of free parameters in Θ₀.

Example We flip a coin 1000 times and find 545 heads. Test at the 5% whether this is a fair coin.

we have

X₁, .., X_n~Ber(p)

and we want to test

H₀:p=p₀ vs H_a:p≠p₀

Let's find the LRT test for this problem. First we have

Here is what that looks like for n=1000 and p₀=0.5:

and it is clear that

λ(x) is small iff k small or large relative to np₀ iff |k-np₀| large

Now let Y = ∑X_i~Bin(n,p₀), so

α = P(|Y-np₀|>cv) =

1-P(|Y-np₀|≤cv) =

1- P(-cv ≤ Y-np₀≤ cv) =

1- P(np₀-cv ≤ Y ≤ np₀+cv)

For our test we have n=1000, p₀=0.5 so np₀ = 500. Now

cv Prob

20 0.097

21 0.087

22 0.077

23 0.069

24 0.061

25 0.053

26 0.047

27 0.041

28 0.036

29 0.031

30 0.027

cv	Prob
20	0.097
21	0.087
22	0.077
23	0.069
24	0.061
25	0.053
26	0.047
27	0.041
28	0.036
29	0.031
30	0.027

and using α=0.05 we find cv=26, and so we reject the null hypothesis because

|k-np₀| = |545-1000·0.5| = 45 > 26

We conclude that the coin is not fair.

How about using the chisquare approximation? In that case

T = -2logλ(x)~χ²(1), so

and again we reject H₀, now because

T=8.11 > cv=3.8

Notice if the null hypothesis specifies on parameter, we have

T = -2logλ(x) ~ χ²(1)

and we reject H₀ if T>c. But if Z~N(0,1), then Z²~χ²(1), so

α = P(T>c) = P(Z²>c) = 1-P(Z²<c) = 1-P(-√c<Z<√c) = 1-(2Φ(√c)-1)

Φ(√c) = 1-α/2

√c = z_α/2

c = z_α/2²

Why do we have this approximation? In essence it is the following idea:

so the likelihood ratio is a sum of iid rv's, and so the central limit theorem applies.

Example : (two- sample problem)

in a company two employees perform the same task. The company wants to find out whether it takes them the same time to do the task. So they time them and find (in minutes):

Employee 1: 7.3 8.5 8.7 9.0 9.2 9.3 9.4 9.5 9.6 9.6 9.7 9.8 9.9 9.9 10.0 10.1 10.3 10.5 10.8 11.2

Employee 2: 8.7 9.6 9.7 10.1 10.1 10.4 10.6 10.8 10.8 10.9 10.9 11.2 11.3 11.4 11.4 12.0 12.2

Say the true mean time for employee 1 is μ and for employee 2 it is τ. Then we want to test

H₀: μ=τ vs. H_a:μ≠τ

Here is a first look at the data:

From this it seems reasonable to model the times as coming from a normal distribution. So we have

X₁, .., X_n~N(μ,σ_x), Y₁, .., Y_m~N(τ,σ_y)

Moreover

sd(x)=086
sd(y)=0.89

and so we will assume that σ_x=σ_y=σ

Finally to keep things easy we will assume that σ=0.875 is known. Then we find

X̅ = 9.615

Y̅ = 10.712

X̅Y̅ = 10.119

∑(x-X̅)² = 13.9

∑(y-Y̅)² = 12.658

∑(x-X̅Y̅ )² = 18.984

∑(y-X̅Y̅ )² = 18.633

-2logλ(x,y) = 1/0.875²*( 18.984 + 18.633 - 13.9 - 12.658) = 14.4

Here we have p = 2-1, so if we test at the 5% level we again find c=3.8, but

T = 14.4 > 3.8

so we reject the null hypothesis.

A little more:

Of course this is a bit wild: first, is it true that

• the difference of two independent chisquare rv's is chisquare? Actually, yes

• but is S_xy independent of S_x and S_y? Actually, no, but it works here because of the normal distribution.

Notice that in this problem ultimately the likelihood ratio statistic is a function of the sample variances. This is still true if we had three or more groups, and then this type of problem is appropriately called Analysis of Variance.