Power and Sample Size

Power of a One-Way ANOVA Test.

Say we are considering a one-way ANOVA model with k groups and the basic test

\[H_0:\alpha_1=..=\alpha_k\]

By the discussion in (6.6.3) we have that if the null hypothesis is wrong the test statistic

\[F=\frac{\text{SSR}/k}{\text{SSE}/(n-k-1)}\]

has a non-central F distribution with k and n-k-1 degrees of freedom and non-centrality parameter

\[\lambda=\pmb{\beta_1X_c'X_c\beta_1}/(2\sigma^2)\]

where \(\pmb{X}_c\) is the centered matrix of \(\pmb{X}\), the design matrix without the column of 1’s, and \(\pmb{\beta}_1\) is \(\pmb{\beta}\) without \(\mu\).

Example

Let’s illustrate this in the case of the mothers and cocaine use data:

y=mothers$Length
x=mothers$Status
k=3
ni=table(x)
n=sum(ni)
y..=sum(y)
yi.=tapply(y, x, sum)
sse=sum(y^2)-sum(yi.^2/ni)
s2=sse/(n-k-1)

j1=rep(1, ni[1])
j2=rep(1, ni[2])
j3=rep(1, ni[3])
X1=cbind(c(j1, rep(0, ni[2]+ni[3])),
         c(rep(0, ni[1]), j2, rep(0, ni[3])),
         c(rep(0, ni[1]+ni[2]), j3))
Xc=(diag(n)-matrix(1, n ,n )/n)%*%X1
Xcc=t(Xc)%*%Xc

Finally we need \(\beta_1\), that is a vector of true values. Let’s check the data

yi./ni-y../n

##       Drug Free First Trimester      Throughout 
##       1.5510638      -0.2489362      -1.5489362

Let’s find the power if \(\pmb{\beta}_1 = \begin{pmatrix} 1.5 & 0 & -1.5 \end{pmatrix}'\):

beta=rbind(1.5, 0, -1.5)
lambda=t(beta)%*%Xcc%*%beta/2/s2
crit=qf(0.95, k-1, n-k-1)
round(1-pf(crit, k-1, n-k-1, lambda), 3)

## [1] 0.733

so in this case our test had a 73% chance of correctly rejecting the null hypothesis!

Sample Size

Say we are planning on carrying out the following experiment: we will collect n observations each from 4 groups. Then we will do the one-way ANOVA test. We want the test to have a power of 80%. What should n be?

To answer the question we need two things:

\(\sigma^2\): we do need some idea of the population standard deviation. Often one uses some number from similar experiments, or one does a small scale pilot study. Say for our purpose we know \(\sigma^2=0.15\).
\(\pmb{\beta}_1\): here we need to decide what the smallest effect size of practical interest is. This is the smallest difference that would be important enough to justify the experiment. Let’s say that it is \(\delta\). Then one often uses

\(\pmb{\beta}_1 = \begin{pmatrix} \delta/2 & -\delta/2 & 0 & 0 \end{pmatrix}'\)

so that the difference between groups 1 and 2 is \(\delta\), and the other groups are between those. Say in our experiment \(\delta=2.3\).

So now

pwr <- function(n, k, delta, sigma2, alpha=0.05) {
  X1=matrix(0, k*n, k)
  for(i in 1:k) X1[((i-1)*n+1):(i*n), i]=1
  Xc=(diag(k*n)-matrix(1, k*n , k*n)/(k*n))%*%X1
  Xcc=t(Xc)%*%Xc
  beta=c(delta/2, -delta/2, rep(0, k-2))
  lambda=t(beta)%*%Xcc%*%beta/2/s2
  crit=qf(1-alpha, k-1, n*k-k-1)
  round(1-pf(crit, k-1, n*k-k-1, lambda), 4)
}
pwr(20, 4, 2.3, 0.15)

## [1] 0.2374

so with 20 observations per group the power is 23%, to low.

pwr(50, 4, 2.3, 0.15)

## [1] 0.5609

pwr(100, 4, 2.3, 0.15)

## [1] 0.8791

pwr(90, 4, 2.3, 0.15)

## [1] 0.839

pwr(80, 4, 2.3, 0.15)

## [1] 0.7884

pwr(83, 4, 2.3, 0.15)

## [1] 0.8048

pwr(82, 4, 2.3, 0.15)

## [1] 0.7994

so we need 83 observations per group, for a total of 332!