This is the same data set we considered in Exercise Problems 1. The data is in studentsurvey
Problem 1 What can you say about the relationship between Score and Gender?
Problem 2 What can you say about the relationship between Score and GPA? Find a 95% interval estimate for a student with a GPA of 2.5. Is this an interpolation or an extrapolation?
Problem 3 What can you say about the relationship between Score and Distance? Find a 99% interval estimate for a student who lives 1.5 miles from the school. Is this an interpolation or an extrapolation?
Problem 4 What can you say about the relationship between Score and Age? Find a 90% interval estimate for the mean score of 21 year old students. Is this an interpolation or an extrapolation?
attach(studentsurvey)
Problem 1 What can you say about the relationship between Score and Gender?
In problem 1 of the Exercise Problems 2 we ran the ANOVA and found a statistically significant difference between the scores of males and females. Because there are just two groups there is no reason to run tukey, but rerunning the same command gives us a 95% confidence interval for the difference in scores:
oneway(Score, Gender)
## p value of test of equal means: p = 0.0033
## Smallest sd: 1.7 Largest sd : 2.2
## A 95% confidence interval for the difference in group means is (0.3, 1.2)
Problem2 What can you say about the relationship between Score and GPA?
In problem 3 of the Exercise Problems 2 we found a statistically significant correlation between Score and GPA. Let’s find a good model.
slr(Score, GPA)
## The least squares regression equation is:
## Score = 3.12 + 1.33 GPA
## R^2 = 11%
the residual vs fits plt and the normal plot looks good, so no problem with the assumptions. We find the model
\[ \text{Score} = 3.12 + 1.33 \text{ GPA} \]
Find a 95% interval estimate for a student with a GPA of 2.5. Is this an interpolation or an extrapolation?
slr.predict(Score, GPA, newx=2.5, interval="PI")
## GPA Fit Lower Upper
## 2.5 6.45 2.65 10.24
so a 95% prediction interval for a student with a GPA of 2.5 is (2.65, 10.24)
This is an interpolation because 2.5 is in the range of GPAs in the data setProblem 3 What can you say about the relationship between Score and Distance?
In problem 5 of the Exercise Problems 2 we used a log transform on Distance. Doing so again yields
slr(Score, log(Distance + 1))
## The least squares regression equation is:
## Score = 6.2 + 0.042 log(Distance + 1)
## R^2 = 0.05%
the residual vs fits plt and the normal plot looks good, so no problem with the assumptions. We find the model
Score = 6.2 + 0.042 log(Distance+1)
Find a 99% interval estimate for a student who lives 1.5 miles from the school. Is this an interpolation or an extrapolation?
slr.predict(Score, log(Distance+1),
newx=log(1.5+1), interval="PI", conf.level = 99)
## log(Distance + 1) Fit Lower Upper
## 0.9162907 6.24 0.94 11.54
so a 95% prediction interval for a student who lives 1.5 miles from the school is (0.94, 11.54)
Note newx=log(1.5+1) because we have the predictor log(Distance+1).
This is an interpolation because 1.5 is in the range of Distances in the data set.Problem 4 What can you say about the relationship between Score and Age?
Before we saw that observation #220 is an outlier and removed it. We do the same now. Then
slr(Age[-220], Score[-220])
## The least squares regression equation is:
## Age[-220] = 20.132 - 0.033 Score[-220]
## R^2 = 0.43%
Find a 90% interval estimate for the mean score of 21 year old students. Is this an interpolation or an extrapolation?
slr.predict(Score[-220], Age[-220],
newx=21, interval="CI", conf.level = 90)
## Age[-220] Fit Lower Upper
## 21 6.13 5.82 6.44
so a 90% confidence interval for the mean score of 21 year old students is (5.82, 6.44)
This is an interpolation because 21 is in the range of Ages in the data set.