We have data X1,..,Xn~Ber(p) and we want develop a confidence interval for p.
First we need a corresponding hypothesis test:
H0: p=p0 vs Ha: p ≠p0If half-open intervals ("upper limits") are desired use the corresponding alternative hypotheses.
Let
then S~Bin(n,p)
Note: in this example S is the obvious choice to base a test on because S/n is an unbiased, sufficient and consistent estimator of p
Next we need a rejection region, that is we need to decide what values of S will lead us to reject the null hypothesis.
And we get to the first fork in the road:
Let Y~Bin(n,p0), reject H0 if P(Y<x|p0) < α/2 or if P(Y>x|p0) < α/2
Example
n=20, p0=0.3, α=0.05
if x=3 P(Y<3|p=0.3)=P(Y≤2|p=0.3) = pbinom(2,20,0.3) = 0.0354 > 0.025 and so we fail to reject the null hypothesis.
if x=2 P(Y<2|p=0.3)=P(Y≤1|p=0.3) = pbinom(1,20,0.3) = 0.0076 < 0.025 and so we reject the null hypothesis.
Let
and reject H0 if |X|>zα/2 where zα/2 is the 1-α/2 quantile of a standard normal. This idea is obviously based on the central limit theorem.
Example
n=20, p0=0.3
z0.975 = qnorm(0.975) = 1.96
np0 = 6, √(np0(1-p0) = 2.05
|X| = |(S-6)/2.05
x | |X| |
---|---|
0 | 2.9268293 |
1 | 2.4390244 |
2 | 1.9512195 |
3 | 1.4634146 |
4 | 0.9756098 |
5 | 0.4878049 |
6 | 0.0000000 |
7 | 0.4878049 |
8 | 0.9756098 |
9 | 1.4634146 |
10 | 1.9512195 |
11 | 2.4390244 |
12 | 2.9268293 |
13 | 3.4146341 |
14 | 3.9024390 |
15 | 4.3902439 |
16 | 4.8780488 |
17 | 5.3658537 |
18 | 5.8536585 |
19 | 6.3414634 |
20 | 6.8292683 |
and so we reject H0 if x<2 or x>10
Now we have a test. To get to a confidence interval we have to "invert" the test. The interval will contain all the parameter values that would have lead to accepting the null hypothesis given the observed data.
For a fixed x find p1 so that P(Y<x|p1)=1-α/2 and p2 so that P(Y>x|p2)=1-α/2 or P(Y≤x|p2)=α/2 then (p1,p2) is the confidence interval.
Example x=3, n=20, 95% CI, so α=0.05
P(Y<3|p1) = 1-α/2 =1-0.025 = 0.975 = pbinom(2,20,p1)
we find
pbinom(2,20,0.0321)=0.97494, so p1=0.0321
pbinom(3,20,0.379) = 0.02496, so p2=0.379
This method is called the Clopper-Pearson method. It was invented in 1934
Above we found p1 and p2 via trial and error. One could also write a numerical routine to do that. Or:
and so p1= qbeta(0.025,3,18) = 0.0321
p2= qbeta(0.975,4,17) = 0.379
This is also done by my routine
CIber("CP",x=3,n=20)
or by the built-in R routine
binom.test(3,20)
the acceptance region is
so we need to solve the equations
Example say we have n=20 and observe x=3, then a 95% confidence interval for p is (-0.006, 0.306)
CIber("Wald",x=3,n=20,corr=0)
Tis is called a Wald type interval (because we replace Var(X) with ans estimate). It was invented in 1948
We already see one obvious "bad" feature: the lower bound is negative. First of all p can never be negative. More than that because we observed x=3 we know p must be positive.
Work a little harder
Example say we have n=20 and observe x=7, then a 95% confidence interval for p is (0.063, 0.350)
CIber("Wilson",x=3,n=20,corr=0)
we are approximating a discrete rv by a continuous one, so maybe it is a good idea to correct for that a bit by subtracting (for lower bound) and adding (for upper bound) 1/2 to x This can be applied to both the Wald and the Wilson type intervals.
CIber("Wald",x=3,n=20,corr=1/2) (-0.02, 0.342)
CIber("Wilson",x=3,n=20,corr=1/2) (0.0396, 0.389)
or we can use built-in function prop.test
Note there is a similar adjustment for Clopper-Pearson intervals called mid-p intervals, which we won't have time to discuss
How do we pick between these? First we need to make sure that the methods have coverage
Example
say we have n=20 and p=3, so x/n=0.15, so the true p is probably between 0.05 and 0.3 . Let's do the coverage graphs:
so the only two that "work" are Clopper-pearson and Wilson with continuity correction.
How do we choose between these two? One possible criterion is the expected length of the interval:
drawn here:
so for p around 0.15 they are pretty much the same.