Consider the following weighted no-intercept regression model: \(y_i=\beta x_i+\epsilon _i\), where \(\epsilon_i\sim N(0, \sigma^2 x_i)\).
We have \(y_i\sim N(\beta x_i, \sigma^2 x_i)\), so
\[ \begin{aligned} &f(x_i\vert \beta, \sigma^2)=\frac1{\sqrt{2\pi\sigma^2x_i}}\exp \left\{-\frac12 \frac{(y_i-\beta x_i)^2}{\sigma^2 x_i} \right\} \\ &f(\pmb{x}\vert \beta, \sigma^2)=\prod_{i=1}^n\left[ \frac1{\sqrt{2\pi\sigma^2x_i}}\exp \left\{-\frac12 \frac{(y_i-\beta x_i)^2}{\sigma^2 x_i} \right\} \right]=\\ &(2\pi\sigma^2)^{-n/2}\prod_{i=1}^n x_i^{-1/2}\exp \left\{-\frac1{2\sigma^2} \sum_{i=1}^n \frac{(y_i-\beta x_i)^2}{x_i} \right\}\\ &l(\beta,\sigma^2\vert\pmb{x}) =-\frac{n}2\log(2\pi \sigma^2)-\frac12\sum_{i=1}^n \log x_i-\frac1{2\sigma^2} \sum_{i=1}^n \frac{(y_i-\beta x_i)^2}{x_i} \end{aligned} \] \[ \begin{aligned} &\frac{d l}{\beta} = \frac1{2\sigma^2} \sum_{i=1}^n \frac{(y_i-\beta x_i)x_i}{x_i}=\frac1{2\sigma^2} \sum_{i=1}^n (y_i-\beta x_i) =\\ &\frac1{2\sigma^2} (\sum_{i=1}^n y_i-\beta\sum_{i=1}^n x_i) = 0\\ &\hat{\beta} = \sum_{i=1}^n y_i/\sum_{i=1}^n x_i\\ \end{aligned} \]
\[ \begin{aligned} &\frac{d l}{\sigma^2} = -\frac{n}{2\sigma^2}+\frac1{2(\sigma^2)^2} \sum_{i=1}^n \frac{(y_i-\beta x_i)^2}{x_i}=0\\ &\widehat{\sigma^2}=\frac1n\sum_{i=1}^n \frac{(y_i-\hat{\beta} x_i)^2}{x_i} \end{aligned} \] b. Assuming we know \(\sigma\), find a \((1-\alpha)100\%\) confidence interval for \(\beta\).
\[ \begin{aligned} &y_i\sim N(\beta x_i, \sigma^2 x_i) \\ &\sum_{i=1}^n y_i\sim N(\beta \sum_{i=1}^n x_i, \sigma^2 \sum_{i=1}^n x_i) \\ &\hat{\beta} \sim N(\beta, \sigma^2/ \sum_{i=1}^n x_i)\\ \end{aligned} \]
and so a \((1-\alpha)100\%\) confidence interval for \(\beta\) is given by
\[\hat{\beta}\pm z_{\alpha/2} \sigma/\sqrt{\sum_{i=1}^n x_i}\]
B=10000;x=0:100/100;beta=2;sigma=2;alpha=0.05
crit=qnorm(1-alpha/2)
sumx=sum(x)
const=crit*sigma/sqrt(sumx)
betahat=rep(0,B)
for(i in 1:B) {
y=beta*x+rnorm(101, 0, sigma*sqrt(x))
betahat[i]=sum(y)/sumx
}
hist(betahat, 100, freq=FALSE)
curve(dnorm(x, 2, 2/sqrt(sumx)),col="blue", lwd=2, add=TRUE)
sum(betahat-const<beta&beta<betahat+const)/B
## [1] 0.9451
Below are the values of the response variable for the following model: \(y_i= \alpha i/10 + \beta i^2/100 +\epsilon_i\), i=10,..,100.
\[H_0:\beta=0\text{ vs }H_a:\beta\ne 0\] that is we want to test whether a quadratic term is needed.
## [1] 1.1 0.9 1.1 0.2 1.3 1.6 0.9 1.2 1.3 1.7 1.9 1.9 3.1 2.5 2.8
## [16] 1.7 2.6 2.6 2.2 3.1 3.2 3.3 3.3 3.3 3.6 3.9 2.7 5.1 4.0 3.3
## [31] 2.5 3.7 5.0 3.6 3.4 5.2 4.6 5.0 4.6 4.4 4.5 4.7 4.7 5.7 5.9
## [46] 5.7 5.9 6.2 7.2 7.1 6.3 7.1 6.8 6.7 6.7 6.3 7.3 7.8 7.1 7.1
## [61] 7.1 8.1 7.7 8.0 8.4 7.6 8.0 7.6 8.3 8.9 8.8 8.7 8.2 8.7 8.6
## [76] 9.3 8.7 9.0 9.7 9.5 8.8 9.1 9.8 9.5 9.2 10.5 10.3 9.5 10.0 10.7
## [91] 11.5
By theorem (6.6.7) if the null hypothesis is true \(F\sim F(h, n-k-1)\), where
\[F=\frac{\pmb{y'(H-H_1)y}/h}{\pmb{y'(I-H)y}/(n-k-1)}\] where \[\pmb{H}=\pmb{X}(\pmb{X}'\pmb{X})^{-1}\pmb{X}'\] \[\pmb{H}_1=\pmb{X}_1(\pmb{X}_1'\pmb{X}_1)^{-1}\pmb{X}_1'\]
so
i=10:100/10
X=cbind(i,i^2)
H=X%*%solve(t(X)%*%X)%*%t(X)
X1=cbind(i)
H1=X1%*%solve(t(X1)%*%X1)%*%t(X1)
y=cbind(y)
n=length(i)
k=2
h=1
numer=t(y)%*%(H-H1)%*%y/h
denom=t(y)%*%(diag(91)-H)%*%y/(n-k-1)
FTS = c(numer/denom)
round(FTS, 2)
## [1] 3.58
round(1-pf(FTS, h, n-k-1), 4)
## [1] 0.0618
so we fail to reject the null hypothesis, \(\beta= 0\).
By the theorem, if the null hypothesis is false \(F\sim F(h, n-k-1,\lambda_1)\), where
\[\lambda_1=\pmb{\beta}_2'[\pmb{X}_2'\pmb{X}_2-\pmb{X}_2'\pmb{X}_1(\pmb{X}_1'\pmb{X}_1)^{-1}\pmb{X}_1'\pmb{X}_2]\pmb{\beta}_2/(2\sigma^2)\]
X2=cbind(i^2)
beta=0.015;sigma=1/2
lambda1=beta^2*(t(X2)%*%X2-t(X2)%*%H1%*%X2)/(2*sigma^2)
round(lambda1, 2)
## [,1]
## [1,] 5.71
crit=qf(1-0.05, 1, 91-2-1)
round(1-pf(crit, 1, 91-2-1, lambda1), 4)
## [1] 0.6564
`