Problem 1

Find formulas for prediction and confidence intervals for the \(y|x=x_0\) in a no-intercept model. Use simulation to show that your formulas are proper 95% confidence intervals in the case \(x=1:100/100\), \(y=x+\epsilon_i\), \(\epsilon_i\sim N(0,1/2)\) and \(x_0=0.5\).

We have \(\pmb{X}=(x_1 ... x_n)'\), so

\[\pmb{x_0'(X'X)^{-1}x_0}=x_0^2/\sum_{i=1}^n x_i^2\]

and a \((1-\alpha)100\%\) confidence interval for \(E[y|x=x_0]\) is given by

\[x_0\hat{\beta}\pm t_{\alpha/2,n-1}sx_0/\sqrt{\sum_{i=1}^n x_i^2}\]

Also a \((1-\alpha)100\%\) prediction interval for \(y|x=x_0\) is given by

\[x_0\hat{\beta}\pm t_{\alpha/2,n-1}s\sqrt{1+x_0^2/\sum_{i=1}^n x_i^2}\]

B=1e4
x=1:100/100
sx=sum(x^2)
crit=qt(0.975,99)
Interval=matrix(0,B,4)
for(i in 1:B) {
  y=x+rnorm(100,0,1/2)
  betahat=sum(x*y)/sum(x^2)
  yhat=x*betahat
  sse=sum((y-yhat)^2)
  s=sqrt(sse/98)
  Interval[i, 1:2]=1*betahat+c(-1,1)*crit*s/sqrt(sx)
  Interval[i, 3:4]=1*betahat+c(-1,1)*crit*s*sqrt(1+1/sx)
}
sum(Interval[,1]<1&1<Interval[, 2])/B
## [1] 0.9464
newy=1+rnorm(B,0,1/2)
sum(Interval[,3]<newy & newy<Interval[, 4])/B
## [1] 0.951

Problem 2

A popular graph in many fields of science is the following: A scatterplot with the fitted curve and “error bars” on each of the observations, that is 95% confidence intervals. An example is shown here:

set.seed(111)
x=1:10
y=2+4*x+rnorm(10, 0, 2)

## [1] 7
  1. write a routine that creates this graph for any simple regression problem.

  2. Use simulation to find the coverage of these intervals if they are interpreted as familywise intervals.

Note: in the example above 7 of the 10 intervals cross the least squares regression line. The familywise-coverage is the percentage of cases where all of the intervals cross the line.

sim.family.coverage=function(alpha, B=1000) {
  z=rep(0, B)
  for(i in 1:B) {
    x=1:10
    y=2+4*x+rnorm(10, 0, 2)
    if(error.bars(x, y, alpha, FALSE)==10) z[i]=1
  }
  sum(z)/B  
}
sim.family.coverage(0.05)
## [1] 0
  1. Find a method that yields intervals which have a true familywise error rate of 68%
alpha=0.05
repeat {
  alpha=alpha/2
  tmp=sim.family.coverage(alpha)
  print(c(alpha, tmp))
  if(tmp>0.68) break
}
## [1] 0.025 0.004
## [1] 0.0125 0.0600
## [1] 0.00625 0.19500
## [1] 0.003125 0.371000
## [1] 0.0015625 0.6010000
## [1] 0.00078125 0.80000000
low=alpha
high=2*alpha
k=0
repeat {
  k=k+1
  mid=(low+high)/2
  tmp=sim.family.coverage(mid, B=10000)
  print(c(mid, tmp))
  if(tmp<0.68) high=mid
  else low=mid
  if(abs(tmp-0.68)<0.02) break
  if(k>10) break
}
## [1] 0.001171875 0.697900000
error.bars(x,y,mid)

## [1] 9