In this page we will practise some of the things we have learned before. You should try to solve each problem on your own before looking at the solutions.
Some years ago an insurance company did a study of their policies and found that 30% of them where for cars. They randomly select 532 of their current policies and find that 131 of them are car insurances.
Test at the 1% level whether the percentage of car insurance policies is now smaller than 30%
If the true percentage of car insurance policies is 25%, what was the power of this test?
If the true percentage of car insurance policies is 25%, what sample size is needed to have a power of 95%
An insurance company is interested in the amount of money they pay on average on insurance claims. They randomly select 42 policies and find the payouts:
1400 32400 27400 22400 3500 8300 33600 17000 9600 20500 4900 9900 33100 9600 23200 22400 3600 12400 14900 29100 4500 13500 12100 12700 16600 17000 21700 29200 16200 4000 16500 17000 41000 19000 23900 37300 32100 5200 19700 21600 400 18700
Find a 95% confidence interval for the true mean amount of payouts.
In a certain store the average sales is $48. The store ran an add in a newspaper, and they want to see whether the add has worked. They randomly select some recent sales and find
52.35 48.34 72.55 71.55 44.68 56.64 52.43 73.98 60.55 60.29
42.33 46.84 59.45 63.04 59.04 33.28 50.67 62.3 54.13 68.65
34.21 58.72 60.99 68.24 52.45 56.88 57.81 56.29 79.69 46.88
48.73 53.77 65.79 73.71 44.55 58.53 51.48 52.95 46.26 50.98
Test at the 5% level whether the add was a success
if the add raised the mean sales to $50, what is the power of the test?
if the add raised the mean sales to $50, what sample size is needed to have a power of 99%?
According to a web site of the Red Cross 57% of Hispanics have blood type O, 31% have type A, 10% type B and 2% type AB. A sample of 250 people resulted in the following blood types:
O | A | B | AB |
---|---|---|---|
147 | 71 | 28 | 4 |
Test at the 10% level whether the blood types suggest that these people were hispanics
The data set studentsurvey has the replies of students to some questionnaire.
find a 90% confidence interval for the mean score
test at the 10% level whether there are equally many male and female students.
test at the 5% level whether the mean GPA is less than 2.5
test at the 5% level whether the poplation has equally many Freshman, Junior, Senior and Sophomore
find a 90% confidence interval for the mean age of the students.
In class we talked about Bernoulli trials, that is experiments which have only two possible outcomes. Often one is interested in how often a certain outcome happens when the experiment is carried out a number of times. This is then called a Binomial distribution, and probabilities can be found with the R command dbinom(k, n, p), where n is the number of trials, k how often the outcome happens and p its probability. For example, if we want to know the probability of 2 sixes in 10 rolls of a fair die, it is
dbinom(2,10,1/6)
## [1] 0.29071
if a group consists of 100 men and 90 women and if three people are chosen at random, what is the probability all of them are men?
if a fair coin is flipped 10 time, what is the probability of at most 3 heads?
if the probability of having an accident on any one mile of road is 0.001, what is the probability of having at least one accident when driving 10000 miles in one year?
if a fair coin is flipped 100 times, what is the probability of getting between 40 and 60 heads (include 40 and 60) ? (Compare that our discussion of the coin app)
The mean score in the final exam of a Calculus course over many years was 72.3. The University wants to decide whether or not to change the text book, and so they are planning to teach several sections of the course with a new text book. Then they will test H0: \(\mu=72.3\) vs Ha: \(\mu>72.3\) at the 5% level. If it is true that with this new text book the mean score will go up to 75.5 points, how many students do they need to have so that the hypothesis test has a power of 80%? (assume the standard deviation is 15.0)
At a certain moment in time the national unemployment rate was 6.9%. In one city among 250 randomly selected people 30 said they were unemployed. Test at the 10% level to check whether in this city the unemplyoment rate differs from that nationwide.
Some years ago an insurance company did a study of their policies and found that 30% of them where for cars. They randomly select 532 of their current policies and find that 131 of them are car insurances.
Variables: 1 Proportion
Problem: Hypothesis test
one.sample.prop( x = 131, n = 533, pi.null = 0.3, alternative="less")
## p value of test H0: pi=0.3 vs. Ha: pi < 0.3: 0.0036
prop.ps( n = 533, phat = 0.25, pi.null = 0.3, alpha = 0.01, alternative = "less")
## [1] "Power of Test = 63.3%"
prop.ps( power = 95, phat = 0.25, pi.null = 0.3, alpha = 0.01, alternative = "less")
## [1] "Sample size required is 1233"
Find a 95% confidence interval for the true mean amount of payouts.
Variables: 1 mean
Problem: confidence interval
highlight the data, then in R
one.sample.t(x)
Assumptions are ok (checked boxplot and normal plot)
## A 95% confidence interval for the population mean is (14369.7, 20825.5)
Variables: 1 mean
Problem: hypothesis test
Assumptions are ok (checked boxplot and normal plot)
one.sample.t( x, mu.null = 48, alternative = "greater")
## p value of test H0: mu=48 vs. Ha: mu > 48: 0.000
## Power of Test = 31.9%
t.ps( n = 40, diff = 50-48, sigma = sd(x), alternative = "greater")
t.ps( power = 99, diff = 50-48, sigma = sd(x), alternative = "greater")
## Sample size required is 433
Variable: 1 categorical
Problem: hypothesis test
chi.gof.test(c(147,71,28,4), c(57,31,10,2))
## p value of test p=0.7417
so we fail to reject the proportions, these people might well have been hispanics
The data set studentsurvey has the replies of students to some questionnaire.
attach(studentsurvey)
## The following object is masked from babe:
##
## Year
## The following object is masked from wrinccensus (pos = 10):
##
## Gender
## The following object is masked from wrinccensus (pos = 17):
##
## Gender
## The following object is masked from longjump (pos = 18):
##
## Year
## The following object is masked from longjump (pos = 27):
##
## Year
## The following object is masked from wrinccensus (pos = 29):
##
## Gender
## The following object is masked from wrinccensus (pos = 30):
##
## Gender
## The following object is masked from wrinccensus (pos = 32):
##
## Gender
## The following object is masked from wrinccensus (pos = 33):
##
## Gender
Variable: 1 mean
Problem: confidence interval
Assumptions are ok (checked boxplot and normal plot)
one.sample.t(Score, conf.level = 90)
## A 90% confidence interval for the population mean is (6.1, 6.5)
Variables: 1 Proportion
Problem: Hypothesis test
table(Gender)
## Gender
## Female Male
## 111 138
one.sample.prop( x = 111 , n = 111+138, pi.null = 0.5)
## p value of test H0: pi=0.5 vs. Ha: pi <> 0.5: 0.0994
Variables: 1 mean
Problem: hypothesis test
Assumptions are ok (checked boxplot and normal plot) 1) Parameter of interest: population mean
2) Method of analysis: one sample t
3) Assumptions of Method: normal data or large sample
4) Type I error probability \(\alpha\) = 0.05
5) H0: \(\mu\) = 2.5
6) Ha: \(\mu\) < 2.5
7) p value = 0.000
one.sample.t( GPA, mu.null = 2.5, alternative="less")
## p value of test H0: mu=2.5 vs. Ha: mu < 2.5: 0.000
Variable: 1 categorical
Problem: hypothesis test
chi.gof.test(table(Year), c(1,1,1,1)/4)
## p value of test p=0.0544
p value = 0.0544 > 0.05, so we fail to reject the proportions, but by just a bit.
bplot(Age)
which(Age==max(Age))
## [1] 220
bplot(Age[-220])
one.sample.t(Age[-220], conf.level = 90)
## A 90% confidence interval for the population mean is (19.8, 20)
In class we talked about Bernoulli trials, that is experiments which have only two possible outcomes. Often one is interested in how often a certain outcome happens when the experiment is carried out a number of times. This is then called a Binomial distribution, and probabilities can be found with the R command dbinom(k, n, p) where n is the number of trials, k how often the outcome happens and p its probability. For example, if we want to know the probability of 2 sixes in 10 rolls of a fair die, it is
dbinom(2, 10, 1/6)
## [1] 0.29071
dbinom(3, 3, 100/190)
## [1] 0.1457938
at most 3 means either 0 or 1 or 2 or 3, so
dbinom(0, 10, 1/2) + dbinom(1, 10, 1/2) + dbinom(2, 10, 1/2) + dbinom(3, 10, 1/2)
## [1] 0.171875
or quicker:
sum(dbinom(0:3, 10, 1/2))
## [1] 0.171875
Prob(at least one accident) = 1-Prob(0 accidents)
1-dbinom(0, 10000, 0.0001)
## [1] 0.632139
sum(dbinom(40:60, 100, 1/2))
## [1] 0.9647998
The mean score in the final exam of a Calculus course over many years was 72.3. The University wants to decide whether or not to change the text book, and so they are planning to teach several sections of the course with a new text book. Then they will test H0: \(\mu\)=72.3 vs Ha: \(\mu\)>72.3 at the 5% level. If it is true that with this new text book the mean score will go up to 75.5 points, how many students do they need to have so that the hypothesis test has a power of 80%? (assume the standard deviation is 15.0)
t.ps(diff=75.5-72.3, sigma=15.0, power=80, alternative="greater")
## Sample size required is 138
At a certain moment in time the national unemployment rate was 6.9%. In one city among 250 randomly selected people 30 said they were unemployed. Test at the 10% level to check whether in this city the unemplyoment rate differs from that nationwide.
one.sample.prop(x=30, n=250, pi.null=0.069)
## p value of test H0: pi=0.069 vs. Ha: pi <> 0.069: 0.0022