We can now return to the case of two quantitative variables and the question of whether or not they are related. Specifically, we have a test of
\(H_0: \rho =0\) (no relationship) vs \(H_a: \rho \ne 0\) (some relationship)
The assumptions of the test are that the relationship is linear and that there are no outliers. We can use the mplot command to check them. The command to find the p value is pearson.cor.
We have previously used simulation to see that a sample correlation r=-0.226 is very unusual (for n=366). Now we can do the formal test:
pearson.cor(Draft.Number, Day.of.Year, rho.null = 0)
## p value of test H0: rho=0 vs. Ha: rho <> 0: 0.000
Assumptions: boxplots and scatterplot show no outliers. No non-linear relationship.
Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcoholic beverages. The numbers are the average expenditure for each of the 11 regions of England.
The marginal plot shows one outlier:
attach(alcohol)
mplot(Tobacco, Alcohol)
This is Northern Ireland, observation # 11. Eliminating this observations show no more outliers, and a linear relationship:
mplot(Tobacco[-11], Alcohol[-11])
So the test is:
pearson.cor(Tobacco[-11], Alcohol[-11], rho.null = 0)
## p value of test H0: rho=0 vs. Ha: rho <> 0: 0.0072
Note: Running the test with Northern Ireland would have given the wrong answer:
pearson.cor(Tobacco, Alcohol, rho.null = 0)
## p value of test H0: rho=0 vs. Ha: rho <> 0: 0.5087
p value of test for now is: 0.5087 > 0.05
Example below is the data from the following experiment: on 30 consecutive days we recorded the number of sales in store. During this time an add campaign was run. Find a \(90\%\) confidence interval for the correlation of Day and Sales.
Day | Sales |
---|---|
1 | 48 |
2 | 47 |
3 | 52 |
4 | 50 |
5 | 48 |
6 | 53 |
7 | 49 |
8 | 49 |
9 | 54 |
10 | 55 |
11 | 54 |
12 | 55 |
13 | 51 |
14 | 57 |
15 | 47 |
16 | 46 |
17 | 55 |
18 | 56 |
19 | 47 |
20 | 54 |
21 | 45 |
22 | 53 |
23 | 54 |
24 | 56 |
25 | 58 |
26 | 55 |
27 | 57 |
28 | 52 |
29 | 51 |
30 | 54 |
attach(sales)
mplot(Day, Sales)
pearson.cor(Day, Sales, conf.level = 90)
## A 90% confidence interval for the
## population correlation coefficient is ( 0.086, 0.617 )