binom

Inference for Binomial p

We have data \(Z_1,.., Z_n \sim Ber(p)\) and we want to develop a confidence interval for p.

First we need a corresponding hypothesis test:

\[H_0: p=p_0\text{ vs }H_a: p \ne p_0\]

If half-open intervals (“upper or lower limits”) are desired use the corresponding alternative hypotheses.

Let \(X=\sum_{i=1}^n Z_i\), then \(X \sim Bin(n,p)\)

Note: in this example X is the obvious choice to base a test on because \(\bar{z} = x/n\) is an unbiased, sufficient and consistent estimator of p.

Next we need a rejection region, that is we need to decide what values of S will lead us to reject the null hypothesis.

And we get to the first fork in the road:

Testing Idea 1:

Let \(Y \sim Bin(n,p_0)\), reject \(H_0\) if

\[P(Y<x|p_0) < \alpha/2\]

or if

\[P(Y>x|p_0) < \alpha/2\]

Example (6.4.1)

n <- 20; p0 <- 0.3; alpha=0.05
x <- 3
pbinom(x-1, n, p0)

## [1] 0.03548313

0.0354 > 0.025 and so we fail to reject the null hypothesis.

x <- 2
pbinom(x-1, n, p0)

## [1] 0.00763726

0.0076 < 0.025 and so we reject the null hypothesis.

Testing Idea 2:

Let \(TS=\frac{x-np_0}{\sqrt{np_0(1-np_0)}}\) and reject \(H_0\) if \(|T|>z_{\alpha/2}\). This idea is obviously based on the central limit theorem.

Example (6.4.2)

x <- 0:20
TS <- round((x-n*p0)/sqrt(n*p0*(1-p0)), 3)
df <- data.frame(x=x, a=abs(TS),
        b=ifelse(abs(TS)>qnorm(1-alpha/2), "Yes", "No"),   
        Decision=ifelse(abs(TS)>qnorm(1-alpha/2), 
                        "Reject", "Fail to reject"))
colnames(df)[2:3] <- c('|TS|', '|TS|>crit')
# critical value
qnorm(1-0.05/2)

## [1] 1.959964

kable.nice(df, do.row.names = FALSE)

x	\|TS\|	\|TS\|>crit	Decision
0	2.928	Yes	Reject
1	2.440	Yes	Reject
2	1.952	No	Fail to reject
3	1.464	No	Fail to reject
4	0.976	No	Fail to reject
5	0.488	No	Fail to reject
6	0.000	No	Fail to reject
7	0.488	No	Fail to reject
8	0.976	No	Fail to reject
9	1.464	No	Fail to reject
10	1.952	No	Fail to reject
11	2.440	Yes	Reject
12	2.928	Yes	Reject
13	3.416	Yes	Reject
14	3.904	Yes	Reject
15	4.392	Yes	Reject
16	4.880	Yes	Reject
17	5.367	Yes	Reject
18	5.855	Yes	Reject
19	6.343	Yes	Reject
20	6.831	Yes	Reject

and so we reject \(H_0\) if \(x<2\) or \(x>10\).

Now we have some tests. To get to a confidence interval we have to “invert” these tests.

The interval will contain all the parameter values that would have lead to accepting the null hypothesis given the observed data.

Interval Idea 1:

For a fixed x find \(p_1\) so that

\[P(Y<x|p_1)=1-\alpha/2\]

and \(p_2\) so that

\[P(Y>x|p_2)=1-\alpha/2\]

\[P(Y\le x|p_2)=\alpha/2\]

then \((p_1, p_2)\) is the confidence interval.

Example (6.4.3)

x <- 3; n <- 20; alpha <- 0.05
p <- seq(0.01, 0.4, length=1000)
y1 <- pbinom(x-1, n, p)
y2 <- pbinom(x, n, p)
df <- data.frame(p=c(p, p), y=c(y1, y2),
              which=rep(c("Lower", "Upper"), each=1000))
ggplot(df, aes(p, y, coloer=which)) + 
  geom_line()

we saw that the curves are strictly decreasing, so we can find the solutions with

round(c(p[y1<1-alpha/2][1], p[y2<alpha/2][1]), 4)

## [1] 0.0323 0.3793

This method is called the Clopper-Pearson method. It was invented in 1934.

Above we found p₁ and p₂ via a grid search. One could also write a numerical routine to do that.

Or we can use a theorem from probability theory: if \(Z\sim Beta(x,n-x+1)\) and \(Y\sim Bin(n,p)\), then

\[P(Z<p)=P(Y<x)\] so with n=x and m=n-x+1 we have

\[ \begin{aligned} &1-\alpha/2 = P(Y< x|p) =\\ &P(Z<p|x,n-x+1) = \\ &p =qbeta(1-\alpha/2\vert x,n-x+1) \end{aligned} \]

round(c(qbeta(0.025, 3, 18), qbeta(0.975,4,17)), 4)

## [1] 0.0321 0.3789

This is already implemented in the base R routine

round(c(binom.test(3, 20)$conf.int), 4)

## [1] 0.0321 0.3789

Innterval Idea 2:

the acceptance region is

\[ -z_{\alpha/2}<\frac{x-np_0}{\sqrt{np_0(1-np_0)}}<z_{\alpha/2} \]

so we need to solve these equations for \(p_0\). Again we have choices:

Option 1:

\(x/n\) is an estimator of p₀ so let’s replace the p₀ in the denominator with x/n:

\[ \begin{aligned} & \\ &p=x/n\pm z_{\alpha/2}\sqrt{x/n(1-x/n)} \end{aligned} \]

Example (6.4.4)

x <- 3; n <- 20; alpha <- 0.05
round(x/n + c(-1, 1)*qnorm(1-alpha/2)*sqrt(x/n*(1-x/n)), 4)

## [1] -0.5498  0.8498

This is called a Wald type interval (because we replace var(X) with an estimate). It was invented in 1948.

We already see one obvious “bad” feature: the lower bound is negative. First of all p can never be negative. More than that because we observed x=3 we know p must be positive.

Option 2

Work a little harder

\[ \begin{aligned} &\frac{x-np}{\sqrt{np(1-p)}}=\pm z_{\alpha/2} \\ &\frac{(x-np)^2}{np(1-p)}=\pm z_{\alpha/2}^2 =: z \\ &(x-np)^2 = znp(1-p)\\ &n(n+z)p^2-n(2x+z)p+x^2=0\\ &p_{1,2} = \frac1{2n(n+z)}\left(n(2x+z)\pm\sqrt{[n(2x+z)]^2-4n(n+z)x^2}\right) = \\ &\frac1{n+z}\left(x+z/2\pm\sqrt{zx+z^2/4-zx^2/n}\right) \end{aligned} \]

Example (6.4.5)

x <- 3; n <- 20; alpha <- 0.05
z <- qnorm(1-alpha/2)^2
round( (x+z/2+c(-1, 1)*sqrt(z*x+z^2/4-z*x^2/n))/(n+z), 4)

## [1] 0.0524 0.3604

This is called a Wilson interval. It was invented in 1927.

Ad-Hoc Adjustments

There have been a number of adjustments suggested for a variety of these. Here is one example:

we are approximating a discrete rv by a continuous one, so maybe it is a good idea to correct for that a bit by subtracting (for lower bound) and adding (for upper bound) 1/2 to x.

This can be applied to both the Wald and the Wilson type intervals. For the Wald test this already done in the built-in function prop.test.

Note there is a similar adjustment for Clopper-Pearson intervals called mid-p intervals, which we won’t have time to discuss.

How to choose:

How do we pick between these? First we need to make sure that the methods have coverage

Example (6.4.6)

say we have n=20 and p=3, so x/n=0.15, so the true p is probably between 0.05 and 0.3 . Let’s do the coverage graphs:

so the only two that “work” are Clopper-Pearson and Wilson with continuity correction.

How do we choose between these two? One possible criterion is the expected length of the interval:

\[e(p)=E\left[U(X)-L(X)\right]=\sum_{i=0}^n (u(i)-l(i)){n\choose i}p^i(1-p)^{n-i}\]

drawn here:

so for p around 0.15 they are pretty much the same.