Let’s consider the following problem: we have \(y_1,..,y_n\sim N(\mu,\sigma^2)\), and we want to find a confidence interval for \(\mu\).
Let’s first find a point estimate, and for that we will use the method of least squares, that is we will find \(\hat{\mu}\) that minimizes
\[G(a) = \sum_{i=1}^n (y_i-a)^2\]
We find
\[ \begin{aligned} &\frac{d G(a)}{d a} = 2\sum_{i=1}^n (y_i-a) = 2\sum_{i=1}^n y_i-2na = 0\\ &\hat{\mu} = \frac1n \sum_{i=1}^n y_i = \bar{y} \end{aligned} \]
We want to find a confidence interval, and a standard method for that is to first find a hypothesis test and then invert the test. So now we want to test \(H_0:\mu=\mu_0\). Again using the least squares criteria we have \(\sum_{i=1}^n (y_i-\mu_0)^2\), and a reasonable test statistic is given by
\[\frac{\sum_{i=1}^n (y_i-\mu_0)^2}{\sum_{i=1}^n (y_i-\hat{\mu})^2}\] Now
\[ \begin{aligned} &\sum_{i=1}^n (y_i-\mu_0)^2 = \\ &\sum_{i=1}^n (y_i-\hat{\mu}+\hat{\mu}-\mu_0)^2 = \\ &\sum_{i=1}^n\left[ (y_i-\hat{\mu})^2+2(y_i-\hat{\mu})(\hat{\mu}-\mu_0)+(\hat{\mu}-\mu_0)^2\right] = \\ &\sum_{i=1}^n(y_i-\hat{\mu})^2+2(\hat{\mu}-\mu_0)\sum_{i=1}^n(y_i-\hat{\mu})+\sum_{i=1}^n(\hat{\mu}-\mu_0)^2 = \\ &\sum_{i=1}^n(y_i-\hat{\mu})^2+2(\hat{\mu}-\mu_0)(\sum_{i=1}^ny_i-n\hat{\mu})+\sum_{i=1}^n(\hat{\mu}-\mu_0)^2 = \\ &\sum_{i=1}^n(y_i-\hat{\mu})^2+2(\hat{\mu}-\mu_0)(\sum y_i-\sum y_i)+n(\hat{\mu}-\mu_0)^2 = \\ &\sum_{i=1}^n(y_i-\hat{\mu})^2+n(\hat{\mu}-\mu_0)^2 \\ \end{aligned} \] So now we have
\[\frac{\sum_{i=1}^n (y_i-\mu_0)^2}{\sum_{i=1}^n (y_i-\hat{\mu})^2}=\\\text{ }\\ \frac{\sum_{i=1}^n (y_i-\hat{\mu})^2+n(\hat{\mu}-\mu_0)^2}{\sum_{i=1}^n (y_i-\hat{\mu})^2}=\\ \text{ }\\ 1+\frac{n(\hat{\mu}-\mu_0)^2}{\sum_{i=1}^n (y_i-\hat{\mu})^2}\]
so we can just as well use the test statistic
\[F=\frac{n(\hat{\mu}-\mu_0)^2}{\sum_{i=1}^n (y_i-\hat{\mu})^2}\]
Now we need to know the distribution of F. We know
\[ \begin{aligned} &E[\hat{\mu}] = \mu_0\\ &var(\hat{\mu}) = var \left( \frac1n \sum_{i=1}^n y_i\right) =\frac1{n^2} \sum_{i=1}^n var(y_i) = \frac{\sigma^2}{n} \\ \end{aligned} \]
so \(\sqrt n(\hat{\mu}-\mu_0)/\sigma\sim N(0,1)\), and \(\sqrt n(\hat{\mu}-\mu_0)^2/\sigma^2\sim \chi^2(1)\)
Also if the null hypothesis is true
\[ \begin{aligned} &\frac{y_i-\mu_0}{\sigma}\sim N(0,1) \\ &\frac1{\sigma^2}\sum_{i=1}^n (y_i-\mu_0)^2 =\sum_{i=1}^n \left(\frac{y_i-\mu_0}{\sigma}\right)^2 \sim \chi^2(n) \end{aligned} \]
and from above we have
\[\sum_{i=1}^n \left(\frac{y_i-\mu_0}{\sigma}\right)^2=\sum_{i=1}^n \left(\frac{y_i-\hat{\mu}}{\sigma}\right)^2+\frac{n(\hat{\mu}-\mu_0)^2}{\sigma^2}\]
Now the distribution of a sum of independent chi-square random variables is again chi-square, the term on the left is \(\chi(n)\), the term on the right is \(\chi^2(1)\), so we could conclude that
\[\frac1{\sigma^2}\sum_{i=1}^n(y_i-\hat{\mu})^2\sim \chi(n-1)\] IF we knew that \(\sum_{i=1}^n(y_i-\hat{\mu})^2\) is independent of \(\hat{\mu}\). This however is a well known fact from Statistics.
Now the ratio of two independent chi-square random variables has an F distribution, and so we find
\[F=\frac{n(\hat{\mu}-\mu_0)^2}{\sum_{i=1}^n (y_i-\hat{\mu})^2} = \frac{n(\hat{\mu}-\mu_0)^2/\sigma^2}{\sum_{i=1}^n (y_i-\hat{\mu})^2/\sigma^2} \sim F(1, n-1)\]
Also, if \(T\sim t(k)\), then \(T^2\sim F(1,k)\), therefore
\[ \begin{aligned} &1-\alpha = \\ &P\left( \frac{\sqrt{n}\vert \bar{y}-\mu_0\vert}{\sqrt{\sum_{i=1}^n (y_i-\bar{y})^2}}<t_{\alpha/2,n-1} \right) = \\ &P\left( \vert \bar{y}-\mu_0\vert<st_{\alpha/2,n-1} \right) = \\ &P\left(-st_{\alpha/2,n-1}< \bar{y}-\mu_0<st_{\alpha/2,n-1} \right) = \\ &P\left(\bar{y}-t_{\alpha/2,n-1}\frac{s}{\sqrt n}< \mu_0<\bar{y}+t_{\alpha/2,n-1}\frac{s}{\sqrt n} \right) \end{aligned} \]
where \(s^2=\sum_{i=1}^n (y_i-\bar{y})^2\), and this is of course the standard confidence interval for a normal mean with unknown standard deviation.
The crucial steps in this derivation where
\[\sum_{i=1}^n (y_i-\mu_0)^2=\sum_{i=1}^n(y_i-\hat{\mu})^2+n(\hat{\mu}-\mu_0)^2\] which we will write as SST=SSE+SSH (total sum of squares = error sum of squares + hypothesis sum of squares)
- the fact that SSE/\(\sigma^2\) and SSH/\(\sigma^2\) have \(\chi^2\) distributions
- the fact that SSE and SSH are independent.
In this course we will first show that these facts are true in great generality, and then we will apply that to many different situations.