Bernoulli - Binomial Distribution

We have data X1,..,Xn~Ber(p) and we want develop a confidence interval for p.

First we need a corresponding hypothesis test:

H0: p=p0 vs Ha: p ≠p0


If half-open intervals ("upper limits") are desired use the corresponding alternative hypotheses.

Let

then S~Bin(n,p)

Note: in this example S is the obvious choice to base a test on because S/n is an unbiased, sufficient and consistent estimator of p

Next we need a rejection region, that is we need to decide what values of S will lead us to reject the null hypothesis.

And we get to the first fork in the road:

Idea 1:

Let Y~Bin(n,p0), reject H0 if P(Y<x|p0) < α/2 or if P(Y>x|p0) < α/2

Example
n=20, p0=0.3, α=0.05

if x=3 P(Y<3|p=0.3)=P(Y≤2|p=0.3) = pbinom(2,20,0.3) = 0.0354 > 0.025 and so we fail to reject the null hypothesis.
if x=2 P(Y<2|p=0.3)=P(Y≤1|p=0.3) = pbinom(1,20,0.3) = 0.0076 < 0.025 and so we reject the null hypothesis.

Idea 2:

Let

and reject H0 if |X|>zα/2 where zα/2 is the 1-α/2 quantile of a standard normal. This idea is obviously based on the central limit theorem.

Example
n=20, p0=0.3

z0.975 = qnorm(0.975) = 1.96

np0 = 6, √(np0(1-p0) = 2.05

|X| = |(S-6)/2.05
x |X|
0 2.9268293
1 2.4390244
2 1.9512195
3 1.4634146
4 0.9756098
5 0.4878049
6 0.0000000
7 0.4878049
8 0.9756098
9 1.4634146
10 1.9512195
11 2.4390244
12 2.9268293
13 3.4146341
14 3.9024390
15 4.3902439
16 4.8780488
17 5.3658537
18 5.8536585
19 6.3414634
20 6.8292683

and so we reject H0 if x<2 or x>10

Now we have a test. To get to a confidence interval we have to "invert" the test. The interval will contain all the parameter values that would have lead to accepting the null hypothesis given the observed data.

Idea 1:

For a fixed x find p1 so that P(Y<x|p1)=1-α/2 and p2 so that P(Y>x|p2)=1-α/2 or P(Y≤x|p2)=α/2 then (p1,p2) is the confidence interval.

Example x=3, n=20, 95% CI, so α=0.05

P(Y<3|p1) = 1-α/2 =1-0.025 = 0.975 = pbinom(2,20,p1)

we find

pbinom(2,20,0.0321)=0.97494, so p1=0.0321

pbinom(3,20,0.379) = 0.02496, so p2=0.379

This method is called the Clopper-Pearson method. It was invented in 1934

Above we found p1 and p2 via trial and error. One could also write a numerical routine to do that. Or:

and so p1= qbeta(0.025,3,18) = 0.0321
p2= qbeta(0.975,4,17) = 0.379

This is also done by my routine

CIber("CP",x=3,n=20)

or by the built-in R routine

binom.test(3,20)

Idea 2:

the acceptance region is

so we need to solve the equations

Option 1:

S/n is an estimator of p0 so let's replace the p0 in the denominator with S/n:

Example say we have n=20 and observe x=3, then a 95% confidence interval for p is (-0.006, 0.306)
CIber("Wald",x=3,n=20,corr=0)

Tis is called a Wald type interval (because we replace Var(X) with ans estimate). It was invented in 1948

We already see one obvious "bad" feature: the lower bound is negative. First of all p can never be negative. More than that because we observed x=3 we know p must be positive.

Option 2:


Work a little harder

Example say we have n=20 and observe x=7, then a 95% confidence interval for p is (0.063, 0.350)
CIber("Wilson",x=3,n=20,corr=0)

This is called a Wilson type interval. It was invented in 1927

Finally: Ad-Hoc Adjustments

There have been a number of adjustments suggested for a variety of these. Here is one example:

we are approximating a discrete rv by a continuous one, so maybe it is a good idea to correct for that a bit by subtracting (for lower bound) and adding (for upper bound) 1/2 to x This can be applied to both the Wald and the Wilson type intervals.

CIber("Wald",x=3,n=20,corr=1/2) (-0.02, 0.342)

CIber("Wilson",x=3,n=20,corr=1/2) (0.0396, 0.389)

or we can use built-in function prop.test

Note there is a similar adjustment for Clopper-Pearson intervals called mid-p intervals, which we won't have time to discuss

How to choose:

How do we pick between these? First we need to make sure that the methods have coverage

Example

say we have n=20 and p=3, so x/n=0.15, so the true p is probably between 0.05 and 0.3 . Let's do the coverage graphs:


so the only two that "work" are Clopper-pearson and Wilson with continuity correction.

How do we choose between these two? One possible criterion is the expected length of the interval:

drawn here:

so for p around 0.15 they are pretty much the same.