Example Consider the the dataset hubble. In 1929 Edwin Hubble published a paper showing a relationship between the distance and radial velocity away from Earth of "extra-galactic nebulae" (galaxies). His findings revolutionized astronomy. The "Hubble constant," the slope of the regression of velocity (Y) on distance (X), is still a subject of research and debate. The data here are those Hubble published in his original paper.
Question: If it is true there is a linear relationship between Velocity (Y) and Distance (X), what is the slope of the line?
If there is a linear relationship, there exist β0 and β1 such that
Yi=β0+β1Xi+εi i=1,..,n
where the εi are called the residuals.
In the problem above the main task is to find a interval estimate for β1. In other problems it might be to estimate Y for a specific value of x, to estimate E[Y] for some x, to see whether β0 or β1 are zero (or some other value) etc.
Another version of the regression problem is Yi=β0+β1xi+εi i=1,..,n that is the x's are not random but fixed for example the number of car accidents in Puert Rico (Y) by the year (x).
First we need a probability model. Again this will depend on the problem, but an often used is to assume that (X,Y) are bivariate normal with parameters (μx,μy,σx,σy,ρ) If we think in terms of predicting Y from a fixed value of X=x, the we need the conditional distribution of Y|X=x, which N(μy+ρσy/σx(x-μx),σy√(1-ρ2)). Therefore we have
E[Y|X=x] = μy+ρσy/σx(x-μx) = μy-μxρσy/σx+ρσy/σxx
so we find that under this probability model we have a natural linear relationship between X and Y with
β0=μy-μxρσy/σx and β1=ρσy/σx
Generally in a regression context the analysis is carried out using the conditional distribution of (Y1,..,Yn) given X1=x1,..,Xn=xn., in which case we can consider the x's as fixed and known. the probability model then becomes
Yi=β0+β1xi+εi , εi ~N(0,σ), i=1,..,n
Notice that we are assuming equal variance. If this is not reasonable, the analysis is still possible but somewhat more difficult.
First let's find the mle's of β0, β1: and σ:
which are of course the standard least squares regression estimates! For σ2 we find
What are the sampling distributions?
and so β0 is a linear combination of normal rv's, and therefore normal itself. Moreover
A confidence interval for the slope can be found as follows:
and in hubble.est the 95% CI is calculated to be (298, 610)