As you will see shortly, many of the methods for statistical inference we discuss here (and that are widely used in practice) require the data to come from a normal distribution. How do we check that?
It would seem that the obvious thing to do would be to draw a histogram and see whether it is bell-shaped. This however does not work. First, to do a decent histogram we need a lot of data (at least a few hundred observations) and second they can be quite hard to read.
We can check the assumption of normality using a boxplot. Here are some boxplots for data from a normal distribution:
Here are some features of boxplots for normal data:
There are very few “outliers”, and those are close to the boxplot
The lower fence, the box and the upper fence are all about the same size.
Here are some examples of non - normal data:
or this one
This is a graph specifically designed to check for normality. If the data comes from a normal distribution the points should form a line. Again, let’s start with some examples of normal data:
and some examples of non-normal data:
or this one
The data is the weight of 2000 1-euro coins, 250 each in eight “rolls”. The data were collected by Herman Callaert at Hasselt University in Belgium. The euro coins were “borrowed” at a local bank. Two assistants, Sofie Bogaerts and Saskia Litiere weighted the coins one by one, in laboratory conditions on a weighing scale of the type Sartorius BP 310s.
head(euros)
## Weight Roll
## 1 7.512 1
## 2 7.502 1
## 3 7.461 1
## 4 7.562 1
## 5 7.528 1
## 6 7.459 1
The manufacturing process of the coins might suggest that the weights have a normal distribution. Is this true?
attach(euros)
bplot(Weight)
nplot(Weight)
both the boxplot and the normal probability plot indicate that the data does not come from a normal distribution but from some symmetric distribution with heavier tails, that is some outliers on both sides.
attach(mothers)
bplot(Length)
nplot(Length)
both graphs indicate that the data comes from a normal distribution.