Ordinary Linear Regression

Example Consider the the dataset hubble. In 1929 Edwin Hubble published a paper showing a relationship between the distance and radial velocity away from Earth of "extra-galactic nebulae" (galaxies). His findings revolutionized astronomy. The "Hubble constant," the slope of the regression of velocity (Y) on distance (X), is still a subject of research and debate. The data here are those Hubble published in his original paper.

Question: If it is true there is a linear relationship between Velocity (Y) and Distance (X), what is the slope of the line?

If there is a linear relationship, there exist β0 and β1 such that

Yi01Xii i=1,..,n

where the εi are called the residuals.

In the problem above the main task is to find a interval estimate for β1. In other problems it might be to estimate Y for a specific value of x, to estimate E[Y] for some x, to see whether β0 or β1 are zero (or some other value) etc.

Another version of the regression problem is Yi01xii i=1,..,n that is the x's are not random but fixed for example the number of car accidents in Puert Rico (Y) by the year (x).

First we need a probability model. Again this will depend on the problem, but an often used is to assume that (X,Y) are bivariate normal with parameters (μxyxy,ρ) If we think in terms of predicting Y from a fixed value of X=x, the we need the conditional distribution of Y|X=x, which N(μy+ρσyx(x-μx),σy√(1-ρ2)). Therefore we have

E[Y|X=x] = μy+ρσyx(x-μx) = μyxρσyx+ρσyxx

so we find that under this probability model we have a natural linear relationship between X and Y with

β0yxρσyx and β1=ρσyx

Generally in a regression context the analysis is carried out using the conditional distribution of (Y1,..,Yn) given X1=x1,..,Xn=xn., in which case we can consider the x's as fixed and known. the probability model then becomes

Yi01xii , εi ~N(0,σ), i=1,..,n

Notice that we are assuming equal variance. If this is not reasonable, the analysis is still possible but somewhat more difficult.

First let's find the mle's of β0, β1: and σ:

which are of course the standard least squares regression estimates! For σ2 we find

What are the sampling distributions?

and so β0 is a linear combination of normal rv's, and therefore normal itself. Moreover

A confidence interval for the slope can be found as follows:

and in hubble.est the 95% CI is calculated to be (298, 610)