Bernoulli - Binomial Distribution

We have data X₁,..,X_n~Ber(p) and we want develop a confidence interval for p.

First we need a corresponding hypothesis test:

H₀: p=p₀ vs H_a: p ≠p₀

If half-open intervals ("upper limits") are desired use the corresponding alternative hypotheses.

Let

then S~Bin(n,p)

Note: in this example S is the obvious choice to base a test on because S/n is an unbiased, sufficient and consistent estimator of p

Next we need a rejection region, that is we need to decide what values of S will lead us to reject the null hypothesis.

And we get to the first fork in the road:

Idea 1:

Let Y~Bin(n,p₀), reject H₀ if P(Y<x|p₀) < α/2 or if P(Y>x|p₀) < α/2

Example
n=20, p₀=0.3, α=0.05

if x=3 P(Y<3|p=0.3)=P(Y≤2|p=0.3) = pbinom(2,20,0.3) = 0.0354 > 0.025 and so we fail to reject the null hypothesis.
if x=2 P(Y<2|p=0.3)=P(Y≤1|p=0.3) = pbinom(1,20,0.3) = 0.0076 < 0.025 and so we reject the null hypothesis.

Idea 2:

Let

and reject H₀ if |X|>z_α/2 where z_α/2 is the 1-α/2 quantile of a standard normal. This idea is obviously based on the central limit theorem.

Example
n=20, p₀=0.3

z_0.975 = qnorm(0.975) = 1.96

np₀ = 6, √(np₀(1-p₀) = 2.05

|X| = |(S-6)/2.05

x |X|

0 2.9268293

1 2.4390244

2 1.9512195

3 1.4634146

4 0.9756098

5 0.4878049

6 0.0000000

7 0.4878049

8 0.9756098

9 1.4634146

10 1.9512195

11 2.4390244

12 2.9268293

13 3.4146341

14 3.9024390

15 4.3902439

16 4.8780488

17 5.3658537

18 5.8536585

19 6.3414634

20 6.8292683

x	\|X\|
0	2.9268293
1	2.4390244
2	1.9512195
3	1.4634146
4	0.9756098
5	0.4878049
6	0.0000000
7	0.4878049
8	0.9756098
9	1.4634146
10	1.9512195
11	2.4390244
12	2.9268293
13	3.4146341
14	3.9024390
15	4.3902439
16	4.8780488
17	5.3658537
18	5.8536585
19	6.3414634
20	6.8292683

and so we reject H₀ if x<2 or x>10

Now we have a test. To get to a confidence interval we have to "invert" the test. The interval will contain all the parameter values that would have lead to accepting the null hypothesis given the observed data.

Idea 1:

For a fixed x find p₁ so that P(Y<x|p₁)=1-α/2 and p₂ so that P(Y>x|p₂)=1-α/2 or P(Y≤x|p₂)=α/2 then (p₁,p₂) is the confidence interval.

Example x=3, n=20, 95% CI, so α=0.05

P(Y<3|p₁) = 1-α/2 =1-0.025 = 0.975 = pbinom(2,20,p₁)

we find

pbinom(2,20,0.0321)=0.97494, so p₁=0.0321

pbinom(3,20,0.379) = 0.02496, so p₂=0.379

This method is called the Clopper-Pearson method. It was invented in 1934

Above we found p₁ and p₂ via trial and error. One could also write a numerical routine to do that. Or:

and so p₁= qbeta(0.025,3,18) = 0.0321
p₂= qbeta(0.975,4,17) = 0.379

This is also done by my routine

CIber("CP",x=3,n=20)

or by the built-in R routine

binom.test(3,20)

Idea 2:

the acceptance region is

so we need to solve the equations

Option 1:

S/n is an estimator of p₀ so let's replace the p₀ in the denominator with S/n:

Example say we have n=20 and observe x=3, then a 95% confidence interval for p is (-0.006, 0.306)
CIber("Wald",x=3,n=20,corr=0)

Tis is called a Wald type interval (because we replace Var(X) with ans estimate). It was invented in 1948

We already see one obvious "bad" feature: the lower bound is negative. First of all p can never be negative. More than that because we observed x=3 we know p must be positive.

Option 2:

Work a little harder

Example say we have n=20 and observe x=7, then a 95% confidence interval for p is (0.063, 0.350)
CIber("Wilson",x=3,n=20,corr=0)

This is called a Wilson type interval. It was invented in 1927

Finally: Ad-Hoc Adjustments

There have been a number of adjustments suggested for a variety of these. Here is one example:

we are approximating a discrete rv by a continuous one, so maybe it is a good idea to correct for that a bit by subtracting (for lower bound) and adding (for upper bound) 1/2 to x This can be applied to both the Wald and the Wilson type intervals.

CIber("Wald",x=3,n=20,corr=1/2) (-0.02, 0.342)

CIber("Wilson",x=3,n=20,corr=1/2) (0.0396, 0.389)

or we can use built-in function prop.test

Note there is a similar adjustment for Clopper-Pearson intervals called mid-p intervals, which we won't have time to discuss

How to choose:

How do we pick between these? First we need to make sure that the methods have coverage

Example

say we have n=20 and p=3, so x/n=0.15, so the true p is probably between 0.05 and 0.3 . Let's do the coverage graphs:

so the only two that "work" are Clopper-pearson and Wilson with continuity correction.

How do we choose between these two? One possible criterion is the expected length of the interval:

drawn here:

so for p around 0.15 they are pretty much the same.