Say we have a sample of size n from a geometric random variable with rate p, that \(P(X=x)=p(1-p)^{x-1}\); \(x=1,2,...\).
As an example we have the data set
x | Freq |
---|---|
1 | 49 |
2 | 23 |
3 | 14 |
4 | 8 |
5 | 4 |
6 | 2 |
In this homework you can use R for basic arithmetic but you can not use any of the probability routines such as pgeom, qbinom etc.;
Find the mle of \(p\)
\[ \begin{aligned} &f(x\vert p) = p(1-p)^{x-1}\\ &f(\pmb{x}\vert p) = \prod_{i=1}^n p(1-p)^{x_i-1} = p^n(1-p)^{\sum_{i=1}^n x_i-n}\\ &l(p\vert\pmb{x}) = n\log p+(\sum_{i=1}^n x_i-n)\log(1-p)\\ &dl(p\vert\pmb{x})/dp = n/p-(\sum_{i=1}^n x_i-n)/(1-p)=0\\ &\hat{p}=n/\sum x_i=1/\bar{x} \end{aligned} \]
round(1/mean(x), 3)
## [1] 0.498
Derive a hypothesis test for
\[H_0: p=p_0\text{ vs. }H_a:p>p_0\]
based on the mle. Apply it to the data with \(\alpha=0.05;p_0=0.5\)
Now if \(p>p_0\) then we would expect \(1/\bar{x}\) to be large, or \(\bar{x}\) to be small, so we should use a reject region of the form \(\left\{\sum_{i=1}^n x_i\le c \right\}\).
Under the null hypothesis \(T=\sum_{i=1}^n X_i\) has a negative binomial distribution with parameters n and \(p_0\), so
\[ \begin{aligned} &\alpha =P(T\le c\vert p_0=0.5) =\sum_{x=n}^c {x-1\choose n-1}0.5^{n}(1-0.5)^{x-n} = \sum_{x=n}^c {x-1\choose n-1}0.5^{x} \\ &T= qnbinom(\alpha,n,p_0)+n \end{aligned} \] This can not be done analytically, so lets use R:
find.crit=function(n, alpha=0.05) {
k=n
tmp=choose(k-1, n-1)*0.5^k
repeat {
k=k+1
tmp=tmp+choose(k-1, n-1)*0.5^k
if(tmp>alpha) break
}
k
}
crit=find.crit(length(x))
c(crit, sum(x))
## [1] 178 201
201>178, so we fail to reject the null hypothesis.
Using all of R we could also have found the critical value this way:
qnbinom(0.05, length(x), 0.5)+length(x)
## [1] 178
Find the power of this test. Use it to draw the curve for the case n=100, \(p_0=0.5,\alpha=0.05\) and \(p_1\in [0.5,0.65]\).
\[
\begin{aligned}
&Power(p_1)=P(\text{reject }H_0\vert p=p_1) = \\
&P(T\le c\vert p_1) = \sum_{x=n}^c {x-1\choose n-1}p_1^{n}(1-p_1)^{x-n}
\end{aligned}
\]
letโs write a little routine:
Power=function(p, n=100, alpha=0.05) {
y=0*p
k=n:find.crit(n, alpha)
for(i in seq_along(p)) {
y[i]=sum(choose(k-1, n-1)*p[i]^n*(1-p[i])^(k-n))
}
y
}
curve(Power, 0.5, 0.65)
If in fact \(p=0.54\), what sample size n would be needed so that the test has a power of 80%?
n=100
repeat {
n=n+1
crit=find.crit(n)
k=n:crit
power=sum(choose(k-1, n-1)*0.54^n*(1-0.54)^(k-n))
if(power>0.8) break
}
n
## [1] 500
round(power*100, 1)
## [1] 80.1
Find another test, again based on the mle but now using a normal approximation (aka the central limit theorem). For the case \(n=100,\alpha=0.05,p_1=0.6\), which of these tests is more powerful?
Again we reject the null hypothesis if \(\bar{x}\) is small. Now we know \(E[X]=1/p_0=1/0.5=2\) and \(var(X)=(1-p_0)/p_0^2 = 2\), so by the central limit theorem
\[\sqrt{n}p_0\left[\bar{x}-1/p_0\right]/\sqrt{1-p_0}\sim N(0,1)\]
n=10;p0=1/2
round(c(n*sqrt(p0)*(mean(x)-1/p0), qnorm(1-0.05)), 2)
## [1] 0.07 1.64
For the first test we have
Power(0.6, 100, 0.05)
## [1] 0.8677474
For the second test we find
\[ \begin{aligned} &P(\text{reject }H_0\vert p=p_1) = \\ &P\left(\sqrt np_0\frac{\bar{X}-1/p_0}{\sqrt{1-p_0}}<-z_{\alpha}\right) = \\ &P\left(\bar{X}<-z_{\alpha}\frac{\sqrt{1-p_0}}{\sqrt n p_0}+1/p_0\right) = \\ &P\left(\bar{X}-1/p_1<-z_{\alpha}\frac{\sqrt{1-p_0}}{\sqrt n p_0}+1/p_0-1/p_1\right) = \\ &P\left(\frac{\sqrt np_1}{\sqrt{1-p_1}}\left[\bar{X}-1/p_1\right]<\frac{\sqrt n p_1}{\sqrt{1-p_1}}\left[-z_{\alpha}\frac{\sqrt{1-p_0}}{\sqrt n p_0}+1/p_0-1/p_1\right]\right) = \\ &P\left(Z<\frac{\sqrt n p_1}{\sqrt{1-p_1}}\left[-z_{\alpha}\frac{\sqrt{1-p_0}}{\sqrt n p_0}+1/p_0-1/p_1\right]\right) = \\ \end{aligned} \]
n=100;p0=0.5;p1=0.6;z=qnorm(1-0.05)
pnorm(sqrt(n)*p1/sqrt(1-p1)*(-z*sqrt(1-p0)/sqrt(n)/p0+1/p0-1/p1))
## [1] 0.8303312
so the first test is more powerful (for these values)