Find formulas for prediction and confidence intervals for the \(y|x=x_0\) in a no-intercept model. Use simulation to show that your formulas are proper 95% confidence intervals in the case \(x=1:100/100\), \(y=x+\epsilon_i\), \(\epsilon_i\sim N(0,1/2)\) and \(x_0=0.5\).
We have \(\pmb{X}=(x_1 ... x_n)'\), so
\[\pmb{x_0'(X'X)^{-1}x_0}=x_0^2/\sum_{i=1}^n x_i^2\]
and a \((1-\alpha)100\%\) confidence interval for \(E[y|x=x_0]\) is given by
\[x_0\hat{\beta}\pm t_{\alpha/2,n-1}sx_0/\sqrt{\sum_{i=1}^n x_i^2}\]
Also a \((1-\alpha)100\%\) prediction interval for \(y|x=x_0\) is given by
\[x_0\hat{\beta}\pm t_{\alpha/2,n-1}s\sqrt{1+x_0^2/\sum_{i=1}^n x_i^2}\]
B=1e4
x=1:100/100
sx=sum(x^2)
crit=qt(0.975,99)
Interval=matrix(0,B,4)
for(i in 1:B) {
y=x+rnorm(100,0,1/2)
betahat=sum(x*y)/sum(x^2)
yhat=x*betahat
sse=sum((y-yhat)^2)
s=sqrt(sse/98)
Interval[i, 1:2]=1*betahat+c(-1,1)*crit*s/sqrt(sx)
Interval[i, 3:4]=1*betahat+c(-1,1)*crit*s*sqrt(1+1/sx)
}
sum(Interval[,1]<1&1<Interval[, 2])/B
## [1] 0.9464
newy=1+rnorm(B,0,1/2)
sum(Interval[,3]<newy & newy<Interval[, 4])/B
## [1] 0.951
A popular graph in many fields of science is the following: A scatterplot with the fitted curve and “error bars” on each of the observations, that is 95% confidence intervals. An example is shown here:
set.seed(111)
x=1:10
y=2+4*x+rnorm(10, 0, 2)
## [1] 7
write a routine that creates this graph for any simple regression problem.
Use simulation to find the coverage of these intervals if they are interpreted as familywise intervals.
Note: in the example above 7 of the 10 intervals cross the least squares regression line. The familywise-coverage is the percentage of cases where all of the intervals cross the line.
sim.family.coverage=function(alpha, B=1000) {
z=rep(0, B)
for(i in 1:B) {
x=1:10
y=2+4*x+rnorm(10, 0, 2)
if(error.bars(x, y, alpha, FALSE)==10) z[i]=1
}
sum(z)/B
}
sim.family.coverage(0.05)
## [1] 0
alpha=0.05
repeat {
alpha=alpha/2
tmp=sim.family.coverage(alpha)
print(c(alpha, tmp))
if(tmp>0.68) break
}
## [1] 0.025 0.004
## [1] 0.0125 0.0600
## [1] 0.00625 0.19500
## [1] 0.003125 0.371000
## [1] 0.0015625 0.6010000
## [1] 0.00078125 0.80000000
low=alpha
high=2*alpha
k=0
repeat {
k=k+1
mid=(low+high)/2
tmp=sim.family.coverage(mid, B=10000)
print(c(mid, tmp))
if(tmp<0.68) high=mid
else low=mid
if(abs(tmp-0.68)<0.02) break
if(k>10) break
}
## [1] 0.001171875 0.697900000
error.bars(x,y,mid)
## [1] 9