Say we have a sample of size n from a geometric random variable with rate p, that \(P(X=x)=p(1-p)^{x-1}\); \(x=1,2,...\).

As an example we have the data set

x Freq
1 49
2 23
3 14
4 8
5 4
6 2

In this homework you can use R for basic arithmetic but you can not use any of the probability routines such as pgeom, qbinom etc.;

Problem 1

Find the mle of \(p\)

\[ \begin{aligned} &f(x\vert p) = p(1-p)^{x-1}\\ &f(\pmb{x}\vert p) = \prod_{i=1}^n p(1-p)^{x_i-1} = p^n(1-p)^{\sum_{i=1}^n x_i-n}\\ &l(p\vert\pmb{x}) = n\log p+(\sum_{i=1}^n x_i-n)\log(1-p)\\ &dl(p\vert\pmb{x})/dp = n/p-(\sum_{i=1}^n x_i-n)/(1-p)=0\\ &\hat{p}=n/\sum x_i=1/\bar{x} \end{aligned} \]

round(1/mean(x), 3)
## [1] 0.498

Problem 2

Derive a hypothesis test for

\[H_0: p=p_0\text{ vs. }H_a:p>p_0\]

based on the mle. Apply it to the data with \(\alpha=0.05;p_0=0.5\)

Now if \(p>p_0\) then we would expect \(1/\bar{x}\) to be large, or \(\bar{x}\) to be small, so we should use a reject region of the form \(\left\{\sum_{i=1}^n x_i\le c \right\}\).

Under the null hypothesis \(T=\sum_{i=1}^n X_i\) has a negative binomial distribution with parameters n and \(p_0\), so

\[ \begin{aligned} &\alpha =P(T\le c\vert p_0=0.5) =\sum_{x=n}^c {x-1\choose n-1}0.5^{n}(1-0.5)^{x-n} = \sum_{x=n}^c {x-1\choose n-1}0.5^{x} \\ &T= qnbinom(\alpha,n,p_0)+n \end{aligned} \] This can not be done analytically, so lets use R:

find.crit=function(n, alpha=0.05) {
   k=n
   tmp=choose(k-1, n-1)*0.5^k
   repeat {
      k=k+1
      tmp=tmp+choose(k-1, n-1)*0.5^k
      if(tmp>alpha) break
   }
   k
} 
crit=find.crit(length(x))
c(crit, sum(x))
## [1] 178 201

201>178, so we fail to reject the null hypothesis.

Using all of R we could also have found the critical value this way:

qnbinom(0.05, length(x), 0.5)+length(x)
## [1] 178

Problem 3

Find the power of this test. Use it to draw the curve for the case n=100, \(p_0=0.5,\alpha=0.05\) and \(p_1\in [0.5,0.65]\).

\[ \begin{aligned} &Power(p_1)=P(\text{reject }H_0\vert p=p_1) = \\ &P(T\le c\vert p_1) = \sum_{x=n}^c {x-1\choose n-1}p_1^{n}(1-p_1)^{x-n} \end{aligned} \]
letโ€™s write a little routine:

Power=function(p, n=100, alpha=0.05) {
   y=0*p
   k=n:find.crit(n, alpha)
   for(i in seq_along(p)) {
      y[i]=sum(choose(k-1, n-1)*p[i]^n*(1-p[i])^(k-n))
   }
   y   
}
curve(Power, 0.5, 0.65)

Problem 4

If in fact \(p=0.54\), what sample size n would be needed so that the test has a power of 80%?

n=100
repeat {
   n=n+1
   crit=find.crit(n)
   k=n:crit
   power=sum(choose(k-1, n-1)*0.54^n*(1-0.54)^(k-n))
   if(power>0.8) break
}
n
## [1] 500
round(power*100, 1)
## [1] 80.1

Problem 5

Find another test, again based on the mle but now using a normal approximation (aka the central limit theorem). For the case \(n=100,\alpha=0.05,p_1=0.6\), which of these tests is more powerful?

Again we reject the null hypothesis if \(\bar{x}\) is small. Now we know \(E[X]=1/p_0=1/0.5=2\) and \(var(X)=(1-p_0)/p_0^2 = 2\), so by the central limit theorem

\[\sqrt{n}p_0\left[\bar{x}-1/p_0\right]/\sqrt{1-p_0}\sim N(0,1)\]

n=10;p0=1/2
round(c(n*sqrt(p0)*(mean(x)-1/p0), qnorm(1-0.05)), 2)
## [1] 0.07 1.64

For the first test we have

Power(0.6, 100, 0.05)
## [1] 0.8677474

For the second test we find

\[ \begin{aligned} &P(\text{reject }H_0\vert p=p_1) = \\ &P\left(\sqrt np_0\frac{\bar{X}-1/p_0}{\sqrt{1-p_0}}<-z_{\alpha}\right) = \\ &P\left(\bar{X}<-z_{\alpha}\frac{\sqrt{1-p_0}}{\sqrt n p_0}+1/p_0\right) = \\ &P\left(\bar{X}-1/p_1<-z_{\alpha}\frac{\sqrt{1-p_0}}{\sqrt n p_0}+1/p_0-1/p_1\right) = \\ &P\left(\frac{\sqrt np_1}{\sqrt{1-p_1}}\left[\bar{X}-1/p_1\right]<\frac{\sqrt n p_1}{\sqrt{1-p_1}}\left[-z_{\alpha}\frac{\sqrt{1-p_0}}{\sqrt n p_0}+1/p_0-1/p_1\right]\right) = \\ &P\left(Z<\frac{\sqrt n p_1}{\sqrt{1-p_1}}\left[-z_{\alpha}\frac{\sqrt{1-p_0}}{\sqrt n p_0}+1/p_0-1/p_1\right]\right) = \\ \end{aligned} \]

n=100;p0=0.5;p1=0.6;z=qnorm(1-0.05)
pnorm(sqrt(n)*p1/sqrt(1-p1)*(-z*sqrt(1-p0)/sqrt(n)/p0+1/p0-1/p1))
## [1] 0.8303312

so the first test is more powerful (for these values)