Named after Karl Friedrich Gauss
This (for good reasons we will see shortly) is the most important distribution of them all! First, it is already familiar to you because it results in data with bell-shaped histograms:
A normal random distribution has two parameters, denoted by \(\mu\) and \(\sigma\).
What is the meaning (interpretation) of the parameters? It is of course that \(\mu\) is the population mean and \(\sigma\) is the population standard deviation.
In the next picture we have 4 examples of normal distributions with different means and standard deviations, drawn on the same scale:
run.app(normal)
this app draws the histogram of data from a normal distribution with different means and standard deviations.
Why is the normal distribution so important? The reason is the Central Limit Theorem, which states that under some very general conditions the sample mean has (approximately) a normal distribution, no matter what the distribution of the observations.
As an illustration let’s do the following. We start by getting some data that is very much NOT normally distributed. I have a routine to that called clt_illustration. To see what the data looks like run
clt.illustration(1)
## x
## [1] 8.1
## [1] 29.8
## [1] 26.3
## [1] 28.1
## [1] 33.4
## ...
## [1] 14.6
## [1] 7.1
## [1] 33.4
## [1] 29.1
## [1] 25
Now this clearly is not a bell-shaped histogram! Now let’s do the following: generate pairs of numbers x1, x2 and find their mean with (x1+x2)/2. We can do this with
clt.illustration(2)
## (x1 + x2)/2 = xbar
## (28.2 + 14.3 )/ 2 = 21.25
## (33.6 + 4.5 )/ 2 = 19.05
## (24.6 + 13.5 )/ 2 = 19.05
## (31.8 + 7.3 )/ 2 = 19.55
## (30.7 + 11.6 )/ 2 = 21.15
## ...
## (11.5 + 9.3 )/ 2 = 10.4
## (31.9 + 7.4 )/ 2 = 19.65
## (12.5 + 34.3 )/ 2 = 23.4
## (11.9 + 33.8 )/ 2 = 22.85
## (6 + 32.7 )/ 2 = 19.35
Still not much of a bell-shaped histogram. But if we keep numbers to the mean we quickly get there, here is what it looks like for 10:
clt.illustration(3)
## (x1 + x2 + x3)/3 = xbar
## (27.8 + 30 + 9.7 )/ 3 = 22.5
## (14.1 + 32 + 34.8 )/ 3 = 26.96667
## (6.5 + 30.7 + 8.8 )/ 3 = 15.33333
## (7.9 + 13.7 + 13.7 )/ 3 = 11.76667
## (27.8 + 31.7 + 26.3 )/ 3 = 28.6
## ...
## (11.4 + 30 + 10.6 )/ 3 = 17.33333
## (11.2 + 10 + 31.4 )/ 3 = 17.53333
## (14.5 + 29.2 + 9.2 )/ 3 = 17.63333
## (30.8 + 33.8 + 31.1 )/ 3 = 31.9
## (30.3 + 31.8 + 5.4 )/ 3 = 22.5
clt.illustration(10)
## (x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10)/10 = xbar
## (11.1 + 7.2 + 12.5 + 11 + 28.1 + 10.7 + 30.2 + 6 + 11.6 + 32.1 )/ 10 = 16.05
## (6.1 + 25.9 + 31.9 + 7.8 + 9.3 + 30.2 + 7.6 + 23.8 + 30 + 4.4 )/ 10 = 17.7
## (16 + 32.2 + 10.4 + 33.9 + 28.3 + 30.8 + 10.4 + 29.2 + 31.3 + 23.9 )/ 10 = 24.64
## (10.4 + 29.3 + 26.8 + 9.2 + 31.4 + 11.1 + 32.4 + 12 + 12.2 + 12.9 )/ 10 = 18.77
## (33.5 + 9.6 + 7.9 + 5.7 + 29 + 11.2 + 33.5 + 30.4 + 8.5 + 7.8 )/ 10 = 17.71
## ...
## (29.6 + 30.7 + 9 + 6.7 + 10.9 + 15.5 + 11.9 + 10.4 + 34.1 + 29.7 )/ 10 = 18.85
## (8.5 + 30.5 + 13.1 + 28.7 + 13 + 31.9 + 30.1 + 6.9 + 24.3 + 8.3 )/ 10 = 19.53
## (31.8 + 35.8 + 32.3 + 32.1 + 6.4 + 31.5 + 24.4 + 31 + 31.6 + 7.8 )/ 10 = 26.47
## (28 + 33.9 + 10.4 + 32.5 + 29 + 9.4 + 8.6 + 16.1 + 13.2 + 25.5 )/ 10 = 20.66
## (29.4 + 17.2 + 11.5 + 29.5 + 10.2 + 7.5 + 9.3 + 30.2 + 8.6 + 11.1 )/ 10 = 16.45
There is very nice way to illustrate the workings of the central limit theorem called a Galton Board Video
this app does various illustrations of the central limit theorem
run.app(clt)
In real life almost any measuring device makes some errors. Some instruments are lousy and make big ones, other instruments are excellent and make small ones.
You want to measure the length a certain streetlight is red. You ask 10 friends to go with you and everyone makes a guess.
You want to measure the length a certain streetlight is red. You ask 10 friends to go with you. You have a stopwatch that you give to each friend.
Clearly in the second case we expect to get much smaller errors.
Around 1800 Gauss was thinking about what one could say in great generality about such measurement errors. He came up with the following rules that (almost) all measurement errors should follow, no matter what the instrument:
Small errors are more likely than large errors.
an error of \(\epsilon\) is just as likely as an error of \(-\epsilon\)
In the presence of several measurements of the same quantity, the most likely value of the quantity being measured is their average.
Now it is quite astonishing that JUST FROM THESE THREE rules he was able to derive the normal curve.
For the math people, the mathematical function is
\[ \frac1{\sqrt{2\pi \sigma^2}} e^{ -(x-\mu)^2/(2\sigma^2) } \]
notice it has two very famous math numbers in it, \(\pi\) and \(e\)