Exercise Problems 1

In this page we will practise some of the things we have learned before. You should try to solve each problem on your own before looking at the solutions.

Problem 1

Some years ago an insurance company did a study of their policies and found that 30% of them where for cars. They randomly select 532 of their current policies and find that 131 of them are car insurances.

  1. Test at the 1% level whether the percentage of car insurance policies is now smaller than 30%

  2. If the true percentage of car insurance policies is 25%, what was the power of this test?

  3. If the true percentage of car insurance policies is 25%, what sample size is needed to have a power of 95%

Problem 2

An insurance company is interested in the amount of money they pay on average on insurance claims. They randomly select 42 policies and find the payouts:

1400 32400 27400 22400 3500 8300 33600 17000 9600 20500 4900 9900 33100 9600 23200 22400 3600 12400 14900 29100 4500 13500 12100 12700 16600 17000 21700 29200 16200 4000 16500 17000 41000 19000 23900 37300 32100 5200 19700 21600 400 18700

Find a 95% confidence interval for the true mean amount of payouts.

Problem 3

In a certain store the average sales is $48. The store ran an add in a newspaper, and they want to see whether the add has worked. They randomly select some recent sales and find

52.35 48.34 72.55 71.55 44.68 56.64 52.43 73.98 60.55 60.29
42.33 46.84 59.45 63.04 59.04 33.28 50.67 62.3 54.13 68.65
34.21 58.72 60.99 68.24 52.45 56.88 57.81 56.29 79.69 46.88
48.73 53.77 65.79 73.71 44.55 58.53 51.48 52.95 46.26 50.98

  1. Test at the 5% level whether the add was a success

  2. if the add raised the mean sales to $50, what is the power of the test?

  3. if the add raised the mean sales to $50, what sample size is needed to have a power of 99%?

Problem 4

According to a web site of the Red Cross 57% of Hispanics have blood type O, 31% have type A, 10% type B and 2% type AB. A sample of 250 people resulted in the following blood types:

O A B AB
147 71 28 4

Test at the 10% level whether the blood types suggest that these people were hispanics

Problem 5

The data set studentsurvey has the replies of students to some questionnaire.

  1. find a 90% confidence interval for the mean score

  2. test at the 10% level whether there are equally many male and female students.

  3. test at the 5% level whether the mean GPA is less than 2.5

  4. test at the 5% level whether the poplation has equally many Freshman, Junior, Senior and Sophomore

  5. find a 90% confidence interval for the mean age of the students.

Problem 6

In class we talked about Bernoulli trials, that is experiments which have only two possible outcomes. Often one is interested in how often a certain outcome happens when the experiment is carried out a number of times. This is then called a Binomial distribution, and probabilities can be found with the R command dbinom(k, n, p), where n is the number of trials, k how often the outcome happens and p its probability. For example, if we want to know the probability of 2 sixes in 10 rolls of a fair die, it is

 dbinom(2,10,1/6) 
## [1] 0.29071
  1. if a group consists of 100 men and 90 women and if three people are chosen at random, what is the probability all of them are men?

  2. if a fair coin is flipped 10 time, what is the probability of at most 3 heads?

  3. if the probability of having an accident on any one mile of road is 0.001, what is the probability of having at least one accident when driving 10000 miles in one year?

  4. if a fair coin is flipped 100 times, what is the probability of getting between 40 and 60 heads (include 40 and 60) ? (Compare that our discussion of the coin app)

Problem 7

The mean score in the final exam of a Calculus course over many years was 72.3. The University wants to decide whether or not to change the text book, and so they are planning to teach several sections of the course with a new text book. Then they will test H0: \(\mu=72.3\) vs Ha: \(\mu>72.3\) at the 5% level. If it is true that with this new text book the mean score will go up to 75.5 points, how many students do they need to have so that the hypothesis test has a power of 80%? (assume the standard deviation is 15.0)

Problem 8

At a certain moment in time the national unemployment rate was 6.9%. In one city among 250 randomly selected people 30 said they were unemployed. Test at the 10% level to check whether in this city the unemplyoment rate differs from that nationwide.



Solutions

Problem 1

Some years ago an insurance company did a study of their policies and found that 30% of them where for cars. They randomly select 532 of their current policies and find that 131 of them are car insurances.

Variables: 1 Proportion

Problem: Hypothesis test

  1. Test at the 1% level whether the percentage of car insurance policies is now smaller than 30%
one.sample.prop( x = 131, n =  533, pi.null = 0.3, alternative="less")
## p value of test H0: pi=0.3 vs. Ha: pi < 0.3:  0.0036
  1. Parameter: proportion \(\pi\)
  2. Method: exact binomial
  3. Assumptions: none
    Assumption is ok
  4. \(\alpha\) = 0.01
  5. H0: \(\pi\) = 0.3
  6. Ha: \(\pi\) < 0.3
  7. p-value = 0.0036
  8. p-value = 0.0036 < \(\alpha\), so we reject the null hypothesis, the true percentage of car insurance claims is statistically significantly smaller than 30%
  1. If the true percentage of car insurance policies is 25%, what was the power of this test?
prop.ps( n = 533, phat = 0.25, pi.null = 0.3, alpha = 0.01, alternative = "less")

## [1] "Power of Test = 63.3%"
  1. If the true percentage of car insurance policies is 25%, what sample size is needed to have a power of 95%
prop.ps( power = 95, phat = 0.25, pi.null = 0.3, alpha = 0.01, alternative = "less")
## [1] "Sample size required is  1233"

Problem 2

Find a 95% confidence interval for the true mean amount of payouts.

Variables: 1 mean
Problem: confidence interval

highlight the data, then in R

one.sample.t(x) 

Assumptions are ok (checked boxplot and normal plot)

## A 95% confidence interval for the population mean is (14369.7, 20825.5)

Problem 3

Variables: 1 mean
Problem: hypothesis test

  1. Test at the 5% level whether the add was a success highlight the data, then in R

Assumptions are ok (checked boxplot and normal plot)

  1. Parameter of interest: population mean
  2. Method of analysis: one sample t
  3. Assumptions of Method: normal data or large sample
  4. Type I error probability \(\alpha\) = 0.05
  5. H0: \(\mu\) = 48
  6. Ha: \(\mu\) > 48
  7. p value = 0.000
 one.sample.t( x, mu.null = 48, alternative = "greater")

## p value of test H0: mu=48 vs. Ha: mu > 48:  0.000
  1. p<\(\alpha\), we fail to reject the null hypothesis, the add was a success
  1. if the add raised the mean sales to $50, what is the power of the test?
## Power of Test = 31.9%
 t.ps( n = 40, diff = 50-48, sigma = sd(x), alternative = "greater")
  1. if the add raised the mean sales to $50, what sample size is needed to have a power of 99%?
 t.ps( power = 99, diff = 50-48, sigma = sd(x), alternative = "greater")
## Sample size required is  433

Problem 4

Variable: 1 categorical
Problem: hypothesis test

chi.gof.test(c(147,71,28,4), c(57,31,10,2))
## p value of test p=0.7417

so we fail to reject the proportions, these people might well have been hispanics

Problem 5

The data set studentsurvey has the replies of students to some questionnaire.

attach(studentsurvey)
## The following object is masked from babe:
## 
##     Year
## The following object is masked from wrinccensus (pos = 10):
## 
##     Gender
## The following object is masked from wrinccensus (pos = 17):
## 
##     Gender
## The following object is masked from longjump (pos = 18):
## 
##     Year
## The following object is masked from longjump (pos = 27):
## 
##     Year
## The following object is masked from wrinccensus (pos = 29):
## 
##     Gender
## The following object is masked from wrinccensus (pos = 30):
## 
##     Gender
## The following object is masked from wrinccensus (pos = 32):
## 
##     Gender
## The following object is masked from wrinccensus (pos = 33):
## 
##     Gender
  1. find a 90% confidence interval for the mean score

Variable: 1 mean
Problem: confidence interval

Assumptions are ok (checked boxplot and normal plot)

one.sample.t(Score, conf.level = 90)

## A 90% confidence interval for the population mean is (6.1, 6.5)
  1. test at the 10% level whether there are equally many male and female students.

Variables: 1 Proportion
Problem: Hypothesis test

table(Gender)
## Gender
## Female   Male 
##    111    138
  1. Parameter: proportion \(\pi\)
  2. Method: exact binomial
  3. Assumptions: none
    Assumption is ok
  4. \(\alpha\) = 0.1
  5. H0: \(\pi\) = 0.5
  6. Ha: \(\pi \ne 0.5\)
  7. p-value = 0.0994
one.sample.prop( x = 111 , n = 111+138, pi.null = 0.5)
## p value of test H0: pi=0.5 vs. Ha: pi <> 0.5:  0.0994
  1. p-value = 0.0994 < \(\alpha\), we reject the null hypothesis
  2. it appears there are slightly fewer female students (but this was a very close call, we would have failed to reject the null at the 5% level!)
  1. test at the 5% level whether the mean GPA is less than 2

Variables: 1 mean
Problem: hypothesis test

Assumptions are ok (checked boxplot and normal plot) 1) Parameter of interest: population mean
2) Method of analysis: one sample t
3) Assumptions of Method: normal data or large sample
4) Type I error probability \(\alpha\) = 0.05
5) H0: \(\mu\) = 2.5
6) Ha: \(\mu\) < 2.5
7) p value = 0.000

one.sample.t( GPA, mu.null = 2.5, alternative="less")

## p value of test H0: mu=2.5 vs. Ha: mu < 2.5:  0.000
  1. p < \(\alpha\), we fail to reject the null hypothesis
  2. the population mean GPA is almost certainly less than 2.5
  1. test at the 5% level whether the poplation has equally many Freshman, Junior, Senior and Sophomore

Variable: 1 categorical
Problem: hypothesis test

chi.gof.test(table(Year), c(1,1,1,1)/4)
## p value of test p=0.0544

p value = 0.0544 > 0.05, so we fail to reject the proportions, but by just a bit.

  1. find a 90% confidence interval for the mean age of the students. The boxplot of Age shows a severe outlier. Further investigation shows this to be observation #220. We should remove this observation from the calculation.
bplot(Age)

which(Age==max(Age))
## [1] 220
bplot(Age[-220])

one.sample.t(Age[-220], conf.level = 90) 

## A 90% confidence interval for the population mean is (19.8, 20)

Problem 6

In class we talked about Bernoulli trials, that is experiments which have only two possible outcomes. Often one is interested in how often a certain outcome happens when the experiment is carried out a number of times. This is then called a Binomial distribution, and probabilities can be found with the R command dbinom(k, n, p) where n is the number of trials, k how often the outcome happens and p its probability. For example, if we want to know the probability of 2 sixes in 10 rolls of a fair die, it is

dbinom(2, 10, 1/6) 
## [1] 0.29071
  1. if a group consists of 100 men and 90 women and if three people are chosen at random, what is the probability all of them are men?
dbinom(3, 3, 100/190) 
## [1] 0.1457938
  1. if a fair coin is flipped 10 time, what is the probability of at most 3 heads?

at most 3 means either 0 or 1 or 2 or 3, so

dbinom(0, 10, 1/2) + dbinom(1, 10, 1/2) + dbinom(2, 10, 1/2) + dbinom(3, 10, 1/2) 
## [1] 0.171875

or quicker:

sum(dbinom(0:3, 10, 1/2))
## [1] 0.171875
  1. if the probability of having an accident on any one mile of road is 0.001, what is the probability of having at least one accident when driving 10000 miles in one year?

Prob(at least one accident) = 1-Prob(0 accidents)

1-dbinom(0, 10000, 0.0001) 
## [1] 0.632139
  1. if a fair coin is flipped 100 times, what is the probability of getting between 40 and 60 heads (include 40 and 60) ? (Compare that our discussion of the coin app)
sum(dbinom(40:60, 100, 1/2)) 
## [1] 0.9647998

Problem 7

The mean score in the final exam of a Calculus course over many years was 72.3. The University wants to decide whether or not to change the text book, and so they are planning to teach several sections of the course with a new text book. Then they will test H0: \(\mu\)=72.3 vs Ha: \(\mu\)>72.3 at the 5% level. If it is true that with this new text book the mean score will go up to 75.5 points, how many students do they need to have so that the hypothesis test has a power of 80%? (assume the standard deviation is 15.0)

t.ps(diff=75.5-72.3, sigma=15.0, power=80, alternative="greater")
## Sample size required is  138

Problem 8

At a certain moment in time the national unemployment rate was 6.9%. In one city among 250 randomly selected people 30 said they were unemployed. Test at the 10% level to check whether in this city the unemplyoment rate differs from that nationwide.

  1. Parameter: proportion \(\pi\)
  2. Method: exact binomial
  3. Assumptions: none
    Assumption is ok
  4. \(\alpha\) = 0.1
  5. H0: \(\pi\) = 0.069
  6. Ha: \(\pi \ne 0.069\)
  7. p-value=0.00223
one.sample.prop(x=30, n=250, pi.null=0.069)
## p value of test H0: pi=0.069 vs. Ha: pi <> 0.069:  0.0022
  1. p-value = 0.00223 < \(\alpha\), so we reject the null hypothesis
  2. the true unemployment rate in this city is not 6.9%