Problem 1

We have data \(X_1,..,X_n\sim N(\mu_x,\sigma)\) and \(Y_1,..,Y_m\sim N(\mu_y,\sigma)\), \(\sigma\) unknown. (This is the so called two sample problem). Say we want to test

\[H_0:\mu_x=\mu_y\text{ vs. }H_0:\mu_x\ne\mu_y\]

The classical test is based on the test statistic

\[T=\frac{\bar{x}-\bar{y}}{s_p\sqrt{1/n+1/m}}\] where \(s_p^2=\frac{(n-1)s_x^2+(m-1)s_y^2}{n+m-2}\) is called the pooled standard deviation. Under the null hypothesis \(T\sim t(n+m-2)\) and the the test rejects the null hypothesis if \(|T|>qt_{\alpha/2,n+m-2}\).

Derive the two sample problem as a special case of ANOVA.

Problem 2

In an experiment an industrial engineer studied the effect of the type of coating and its thickness on the durability of a certain type of paint. There were three types of coatings labeled 1, 2 and 3, and three thickness levels labeled thin, medium and thick. The durability was measured in days. The data is

thin medium thick
1 166, 154, 155, 156, 149 167, 171, 166, 165, 185 181, 185, 178, 178, 174
2 219, 241, 216, 220, 220 263, 241, 246, 245, 224 242, 258, 257, 242, 250
3 277, 276, 277, 278, 280 309, 281, 309, 302, 314 350, 348, 359, 340, 342

Analyze this data to see what effect(s) if any the type of coating and the thickness have on the durability. Specifically, which factor-level combination(s) is/are statistically significantly the best?