We have data \(Z_1,.., Z_n \sim Ber(p)\) and we want to develop a confidence interval for p.
First we need a corresponding hypothesis test:
\[H_0: p=p_0\text{ vs }H_a: p \ne p_0\]
If half-open intervals (“upper or lower limits”) are desired use the corresponding alternative hypotheses.
Let \(X=\sum_{i=1}^n Z_i\), then \(X \sim Bin(n,p)\)
Note: in this example X is the obvious choice to base a test on because \(\bar{z} = x/n\) is an unbiased, sufficient and consistent estimator of p.
Next we need a rejection region, that is we need to decide what values of S will lead us to reject the null hypothesis.
And we get to the first fork in the road:
Let \(Y \sim Bin(n,p_0)\), reject \(H_0\) if
\[P(Y<x|p_0) < \alpha/2\]
or if
\[P(Y>x|p_0) < \alpha/2\]
n <- 20; p0 <- 0.3; alpha=0.05
x <- 3
pbinom(x-1, n, p0)
## [1] 0.03548313
0.0354 > 0.025 and so we fail to reject the null hypothesis.
x <- 2
pbinom(x-1, n, p0)
## [1] 0.00763726
0.0076 < 0.025 and so we reject the null hypothesis.
Let \(TS=\frac{x-np_0}{\sqrt{np_0(1-np_0)}}\) and reject \(H_0\) if \(|T|>z_{\alpha/2}\). This idea is obviously based on the central limit theorem.
x <- 0:20
TS <- round((x-n*p0)/sqrt(n*p0*(1-p0)), 3)
df <- data.frame(x=x, a=abs(TS),
b=ifelse(abs(TS)>qnorm(1-alpha/2), "Yes", "No"),
Decision=ifelse(abs(TS)>qnorm(1-alpha/2),
"Reject", "Fail to reject"))
colnames(df)[2:3] <- c('|TS|', '|TS|>crit')
# critical value
qnorm(1-0.05/2)
## [1] 1.959964
kable.nice(df, do.row.names = FALSE)
x | |TS| | |TS|>crit | Decision |
---|---|---|---|
0 | 2.928 | Yes | Reject |
1 | 2.440 | Yes | Reject |
2 | 1.952 | No | Fail to reject |
3 | 1.464 | No | Fail to reject |
4 | 0.976 | No | Fail to reject |
5 | 0.488 | No | Fail to reject |
6 | 0.000 | No | Fail to reject |
7 | 0.488 | No | Fail to reject |
8 | 0.976 | No | Fail to reject |
9 | 1.464 | No | Fail to reject |
10 | 1.952 | No | Fail to reject |
11 | 2.440 | Yes | Reject |
12 | 2.928 | Yes | Reject |
13 | 3.416 | Yes | Reject |
14 | 3.904 | Yes | Reject |
15 | 4.392 | Yes | Reject |
16 | 4.880 | Yes | Reject |
17 | 5.367 | Yes | Reject |
18 | 5.855 | Yes | Reject |
19 | 6.343 | Yes | Reject |
20 | 6.831 | Yes | Reject |
and so we reject \(H_0\) if \(x<2\) or \(x>10\).
Now we have some tests. To get to a confidence interval we have to “invert” these tests.
The interval will contain all the parameter values that would have lead to accepting the null hypothesis given the observed data.
For a fixed x find \(p_1\) so that
\[P(Y<x|p_1)=1-\alpha/2\]
and \(p_2\) so that
\[P(Y>x|p_2)=1-\alpha/2\]
or
\[P(Y\le x|p_2)=\alpha/2\]
then \((p_1, p_2)\) is the confidence interval.
x <- 3; n <- 20; alpha <- 0.05
p <- seq(0.01, 0.4, length=1000)
y1 <- pbinom(x-1, n, p)
y2 <- pbinom(x, n, p)
df <- data.frame(p=c(p, p), y=c(y1, y2),
which=rep(c("Lower", "Upper"), each=1000))
ggplot(df, aes(p, y, coloer=which)) +
geom_line()
we saw that the curves are strictly decreasing, so we can find the solutions with
round(c(p[y1<1-alpha/2][1], p[y2<alpha/2][1]), 4)
## [1] 0.0323 0.3793
This method is called the Clopper-Pearson method. It was invented in 1934.
Above we found p1 and p2 via a grid search. One could also write a numerical routine to do that.
Or we can use a theorem from probability theory: if \(Z\sim Beta(x,n-x+1)\) and \(Y\sim Bin(n,p)\), then
\[P(Z<p)=P(Y<x)\] so with n=x and m=n-x+1 we have
\[ \begin{aligned} &1-\alpha/2 = P(Y< x|p) =\\ &P(Z<p|x,n-x+1) = \\ &p =qbeta(1-\alpha/2\vert x,n-x+1) \end{aligned} \]
round(c(qbeta(0.025, 3, 18), qbeta(0.975,4,17)), 4)
## [1] 0.0321 0.3789
This is already implemented in the base R routine
round(c(binom.test(3, 20)$conf.int), 4)
## [1] 0.0321 0.3789
the acceptance region is
\[ -z_{\alpha/2}<\frac{x-np_0}{\sqrt{np_0(1-np_0)}}<z_{\alpha/2} \]
so we need to solve these equations for \(p_0\). Again we have choices:
Option 1:
\(x/n\) is an estimator of p0 so let’s replace the p0 in the denominator with x/n:
\[ \begin{aligned} & \\ &p=x/n\pm z_{\alpha/2}\sqrt{x/n(1-x/n)} \end{aligned} \]
x <- 3; n <- 20; alpha <- 0.05
round(x/n + c(-1, 1)*qnorm(1-alpha/2)*sqrt(x/n*(1-x/n)), 4)
## [1] -0.5498 0.8498
This is called a Wald type interval (because we replace var(X) with an estimate). It was invented in 1948.
We already see one obvious “bad” feature: the lower bound is negative. First of all p can never be negative. More than that because we observed x=3 we know p must be positive.
Option 2
Work a little harder
\[ \begin{aligned} &\frac{x-np}{\sqrt{np(1-p)}}=\pm z_{\alpha/2} \\ &\frac{(x-np)^2}{np(1-p)}=\pm z_{\alpha/2}^2 =: z \\ &(x-np)^2 = znp(1-p)\\ &n(n+z)p^2-n(2x+z)p+x^2=0\\ &p_{1,2} = \frac1{2n(n+z)}\left(n(2x+z)\pm\sqrt{[n(2x+z)]^2-4n(n+z)x^2}\right) = \\ &\frac1{n+z}\left(x+z/2\pm\sqrt{zx+z^2/4-zx^2/n}\right) \end{aligned} \]
x <- 3; n <- 20; alpha <- 0.05
z <- qnorm(1-alpha/2)^2
round( (x+z/2+c(-1, 1)*sqrt(z*x+z^2/4-z*x^2/n))/(n+z), 4)
## [1] 0.0524 0.3604
This is called a Wilson interval. It was invented in 1927.
There have been a number of adjustments suggested for a variety of these. Here is one example:
we are approximating a discrete rv by a continuous one, so maybe it is a good idea to correct for that a bit by subtracting (for lower bound) and adding (for upper bound) 1/2 to x.
This can be applied to both the Wald and the Wilson type intervals. For the Wald test this already done in the built-in function prop.test.
Note there is a similar adjustment for Clopper-Pearson intervals called mid-p intervals, which we won’t have time to discuss.
How do we pick between these? First we need to make sure that the methods have coverage
say we have n=20 and p=3, so x/n=0.15, so the true p is probably between 0.05 and 0.3 . Let’s do the coverage graphs:
so the only two that “work” are Clopper-Pearson and Wilson with continuity correction.
How do we choose between these two? One possible criterion is the expected length of the interval:
\[e(p)=E\left[U(X)-L(X)\right]=\sum_{i=0}^n (u(i)-l(i)){n\choose i}p^i(1-p)^{n-i}\]
drawn here:
so for p around 0.15 they are pretty much the same.