Using the Computer and R

This page contains some basic information on how to use the computer and the R program

To log on to computers in Ch115:

Username:     .\esma    ( important: do not forget to include “. " Before the word esma )

Password:    Mate1234    ( important: uppercase letter “M” )
The class webpages are at http://academic.uprm.edu/wrolke/esmaXXXX (3101, 3102, 6661 etc) At the end of each session log off

General Info

You can get a free version of R for your computer from a number of sources. The download is about 70MB and setup is fully automatic. Here are some links:

Windows

MacOS

After the installation is finished close R (if it is open). From now on ALWAYS open R by clicking on the link to to the RData file on top of the homepage. You can also download and save that file to your own computer and start R from there. The first time you do this the program will download a number of additional stuff, just let it. Also a window might pop up and ask whether to save something, if so click on yes.

This step might take a few minutes, just wait until the > sign appears.

FOR MAC OS USERS ONLY There are a few things that are different from MacOS and Windows. Here are two things 1) Download XQuartz - XQuartz-2.7.11.dmg
Open XQuartz
Type the letter R (to make XQuartz run R)
Hit enter Open R Run the command .First()
Then, every command should work correctly. 2) if there is any errors type

myinfo$documenttype <-  "none" 

RStudio

there is a program called RStudio that a lot of people like to use to run R. You can download it at https://www.rstudio.com/. Before you can use RStudio with Resma3 you need to run Resma3 JUST ONCE from R itself. So do this

  1. follow ALL the instructions above

  2. only if everything is running correctly install RStudio

Troubleshooting

if you try to run a command and get an error could not find function “ggplot” (or grid or shiny) first try this: run the command

ls() 

You should see a listing of many things (over 200). If you do not Resma3 did not load correctly. Close R and restart it by clicking on the link to Resma3.RData on the homepage.

If you do see the listing, type

.First() 

(note the . in front and the capital F)
A number of things should be happening, just wait until you see the > again and see whether that fixes the problem.

If this does not work turn off R and restart it with a new version of Resma3.RData from the top of the class homepage.

If this also does not work send me an email with the explanation of the problem. The best thing to do is to include a screenshot. Here is how:

Windows
MacOS
You can also just use your cell phone to take a picture of the screen, but make sure it is is readable!
I often get an email saying that something is not working, and my answer is simply:
RGDM
this means: Read the God-Damn Manual! that is the answer to your problem is somewhere on these pages, and you should have found it there before sending an email!

Throughout this class when you see something in a gray box like this:

text 
it means commands you should type (or copy-paste) into R.

Computers in Monzon:

Until the R version is updated copy-paste the following lines into R

  one.time.setup()

To see whether everything is installed correctly copy-paste the following line into R and hit enter:

 hplot(rnorm(1000))

You should see a graph like this (called a histogram)

For a much more extensive introduction to R go here

Once you have started a session the first thing you see is some text, and then the > sign. This is the R prompt, it means R is waiting for you to do something. Sometimes the prompt changes to a different symbol, as we will see.

Let’s start with

  ls() 

shows you a “listing” of the files (data, routines etc.) If you have worked for a while you might have things you need to save, do that by clicking on

File > Save Workspace

If you quit the program without saving your stuff everything you did will be lost. R has a somewhat unusual file system, everything belonging to the same project (data, routines, graphs etc.) are stored in just one file, with the extension .RData.

To quit R, type

 q() 

or click the x in the upper right corner.

R has a nice recall feature, using the up and down arrow keys. Also, typing

history() 

shows you the most recent things entered.

R is case-sensitive, so a and A are two different things.

Often during a session you create objects that you need only for a short time. When you no longer need them use rm to get rid of them:

x <- 10
x^2
## [1] 100
rm(x) 

the <- is the assignment character in R, it assigns what is on the right to the symbol on the left.

Data Entry

For a few numbers the easiest thing is to just type them in:

x <-  c(10, 2, 6, 9)
x
## [1] 10  2  6  9

c() is a function that takes the objects inside the () and combines them into one single object (a vector).

Sometimes the data is listed on a webpage and we need to transfer it to R. Here are some examples on how to do this quickly:

  1. a single vector: 101.6 115.0 100.9 103.8 77.6 102.6 99.6 108.5 100.8 92.5 101.8 81.6 103.7 94.9 103.3 86.7 101.6 106.6 101.5 96.9 highlight the data with the mouse, copy it, go to R and type
x <- scan("clipboard")
x

if you want to copy-paste the command first you need to this:

if the data is not numbers you need the what argument as well:

F F M M F F F M M M F

 x <- scan("clipboard", what = "char") 

sometimes parts of the data are spearated by some symbol, for example a comma. In that case you can use the sep argument:

1.5, 2.3, 5.3, 2.4, 7.9, 8.1, 2.7, 4.2

 x  <-  scan("clipboard", sep = ",")
  1. data in groups. Say we have data on the time in seconds it took people to carry out a task. We also have information on their age:
    Old 101.6 115 100.9 103.8 77.6 102.6 99.6 108.5 100.8 92.5 101.8 81.6 103.7 94.9 103.3
    Young 64.8 54.4 44 47.5 49.5 70.7 36.1 48.5 49.5 59.7 32.9 39.4 42.2 26.6 54.3

for this data we likely need one vector with all the numbers and a second vector with the “Old” and “Young” labels. First get the numbers as before in two steps, then combine them into one vector. Finally create a vector of “Old” and Young“:

x <- scan("clipboard") 
y <- scan("clipboard") 
Time <- c(x,y)
Age <- c(rep("Old", length(x)), rep("Young", length(y)))
  1. tables

We have data on the age and the position of people. So there were 10 old people in the first position, and so on:

Age First Second Third
Old 10 16 21
Young 15 12 26

To get this into R use the the routine idataio.

It can be used to enter the values directly from the keyboard, a table that was copied to the clipboard or read it from a file like an excel worksheet.

Say we want to get the table above into R. Here are three ways to do this using idataio:

mytbl <- idataio() 

this will bring up the browser with a spreadsheet and you can just enter the values. Change Number of Cases to 2 and Number of Variables to 4. Type the column names (Age First Second Third) in the box on the right and enter the values in the spreadsheet. Click on the button Close App to return to R.

  1. use the mouse to highlight the whole table, switch to R and run
 mytbl <- idataio()

select the Copy from Clipboard option. Change Number of Variables to 4. Highlight the table in the browser and right-click Copy. Hit Go! and see whether the table appears correctly. If not maybe you need to play around a bit with the Number ofr cases etc. When it is ok hit the Close App button on top.

copying from an Excel worksheet works exactly the same way.

NOTE: the current version does not allow for empty cells. If there are any enter NA first.

  1. Open Microsoft Excel and enter the info as usual. Save the file as an excel spreadsheet (with the xlsx extension). Now run idataio and choose the Read data from file option.

Quick Data Entry

There is another way to get data from a web page into R very easily. This will work especially well for data in Moodle quizzes.

Note you need the routines getx and getxy. If you do not have them yet run the following command in R:

source(url("http://academic.uprm.edu/wrolke/Resma3/get.R"))

Now:

  1. single vector: say you want the following numbers in R:

63 65 83 86 89 94 95 95 96 104 107 112 112 113 114 117 118 125 131 132

high-light the numbers, hit copy, switch to R and run the command

x <- getx()

(To avoid confusion just type in the command, don’t copy it)

  1. data in a table:
Age First Second Third
Old 10 16 21
Young 15 12 26

high-light the whole table, hit copy, switch to R and run the command

x <- getxy()

Data Types

the most basic type of data in R is a vector, a list of values. Say we want the numbers 1.5, 3.6, 5.1 and 4.0 in an R vector called x, then we can type

x <- c(1.5, 3.6, 5.1, 4.0)
x
## [1] 1.5 3.6 5.1 4.0

Often the numbers have a structure one can make use of:

1:10 
##  [1]  1  2  3  4  5  6  7  8  9 10
10:1
##  [1] 10  9  8  7  6  5  4  3  2  1
1:20*2
##  [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
c(1:10, 1:10*2)
##  [1]  1  2  3  4  5  6  7  8  9 10  2  4  6  8 10 12 14 16 18 20

Sometimes you need parentheses:

 n <- 10
1 : n-1
##  [1] 0 1 2 3 4 5 6 7 8 9
1 : (n-1)
## [1] 1 2 3 4 5 6 7 8 9

The rep (“repeat”) command is very useful:

rep(1, 10)
##  [1] 1 1 1 1 1 1 1 1 1 1
rep(1:3, 10)
##  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
rep(1:3, each=3)
## [1] 1 1 1 2 2 2 3 3 3

To find out how many elements a vector has use the length command

x <- rep(1:3, each=3)
length(x)
## [1] 9

The elements of a vector are accessed with the bracket notation:

x <-1:10*5
x
##  [1]  5 10 15 20 25 30 35 40 45 50
x[3]
## [1] 15
x[1:3]
## [1]  5 10 15
x[c(1,3,8)]
## [1]  5 15 40
x[-3]
## [1]  5 10 20 25 30 35 40 45 50
x[-c(1,2,5)]
## [1] 15 20 30 35 40 45 50

Instead of numbers a vector can also consist of characters (letters, numbers, symbols etc.) These are identified by quotes:

c("A", "B", 7, "%")
## [1] "A" "B" "7" "%"

A vector is either numeric or character, but never both (see how the 7 was changed to “7”). You can turn one into the other (if possible) as follows:

x <- 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
as.character(x)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
x <- c("1", "5")
x
## [1] "1" "5"
as.numeric(x)
## [1] 1 5

A third type of data is logical, with values either TRUE or FALSE.

x <-  1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x > 4
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

these are often used as conditions:

x[ x>4 ]
## [1]  5  6  7  8  9 10

This, as we will see shortly, is EXTREMELY useful!

Data Frames

data frames are the basic format for data in R. They are essentially vectors put together as columns. The main thing you need to know about working with data frames are the following commands:

Example

consider the upr data set . This is the application data for all the students who applied and were accepted to UPR-Mayaguez between 2003 and 2013.

 dim(upr)
## [1] 23666    16

tells us that there were 23666 applications and that for each student there are 16 pieces of information.

colnames(upr)
##  [1] "ID.Code"        "Year"           "Gender"         "Program.Code"  
##  [5] "Highschool.GPA" "Aptitud.Verbal" "Aptitud.Matem"  "Aprov.Ingles"  
##  [9] "Aprov.Matem"    "Aprov.Espanol"  "IGS"            "Freshmen.GPA"  
## [13] "Graduated"      "Year.Grad."     "Grad..GPA"      "Class.Facultad"

shows us the variables

head(upr, 3)
##      ID.Code Year Gender Program.Code Highschool.GPA Aptitud.Verbal
## 1 00C2B4EF77 2005      M          502           3.97            647
## 2 00D66CF1BF 2003      M          502           3.80            597
## 3 00AB6118EB 2004      M         1203           4.00            567
##   Aptitud.Matem Aprov.Ingles Aprov.Matem Aprov.Espanol IGS Freshmen.GPA
## 1           621          626         672           551 342         3.67
## 2           726          618         718           575 343         2.75
## 3           691          424         616           609 342         3.62
##   Graduated Year.Grad. Grad..GPA Class.Facultad
## 1        Si       2012      3.33           INGE
## 2        No          *         *           INGE
## 3        No          *         *       CIENCIAS

shows us the first three cases.

Let’s say we want to find the number of males and females. We can use the table command for that:

table(Gender)

Error in table(Gender) : object ‘Gender’ not found What happened? Right now R does not know what Gender is because it is “hidden” inside the the upr data set. We need to make it visible to R first:

 attach(upr)
 table(Gender)
## Gender
##     F     M 
## 11487 12179
there is also a detach command to undo an attach, but this is not usually needed because the attach goes away when you close R.
NOTE: you need to do attach only once in each session working with R

Vector Arithmetic

R allows us to apply any mathemetical functions to a whole vector:

 x  <-  1:10
2*x
##  [1]  2  4  6  8 10 12 14 16 18 20
 x^2
##  [1]   1   4   9  16  25  36  49  64  81 100
 log(x)
##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
##  [8] 2.0794415 2.1972246 2.3025851
 sum(x)
## [1] 55
 y  <-  21:30
 x+y
##  [1] 22 24 26 28 30 32 34 36 38 40
 x^2+y^2   
##  [1]  442  488  538  592  650  712  778  848  922 1000
 mean(x+y) 
## [1] 31

Subsetting

One of the most common tasks in Statistic is to select a part of a data set for further analysis. There is even a name for this: data wrangling.

Case Study: New York Air Quality Measurements

Description: Daily measurements of air quality in New York, May to September 1973.

A data frame with 154 observations on 6 variables.

Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island

Solar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park

Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport

Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.

Source: The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data).

head(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

Let’s say that instead of looking at the whole data set we want to consider only the months of August and September. Those have Month = 8, 9 and we can select this part of the data set with

attach(airquality)
airAugSept <- airquality[Month>=8, ]
head(airAugSept)
##    Ozone Solar.R Wind Temp Month Day
## 93    39      83  6.9   81     8   1
## 94     9      24 13.8   81     8   2
## 95    16      77  7.4   82     8   3
## 96    78      NA  6.9   86     8   4
## 97    35      NA  7.4   85     8   5
## 98    66      NA  4.6   87     8   6

Notice that because a data frame has both rows and columns, the [ ] notation becomes [ , ].

This task of data wrangling is so important, there are quite a lot of routines that are helping with it. One of them is subset. So the same job we could have done with

airAugSept <- subset(airquality, Month>=8)

Note that this would have worked also without the attach first.

I also wrote an interactive version of this command which you can use, called isubset. Here is what you do:

airAugSept<- isubset(airquality) 

The app lets you use up to three conditions, we just have one (Month >= 8), so we can leave that alone. Now choose the condition and then hit “Click when ready to run”

Here is a screenshot:

now hit Close App and return to R.

Note the line R Code: it shows you the command that you could have used in R directly to get the same result, without using the app.

In this example we used a very simple condition: Month >= 8. These conditions can be much more complicated using & (AND), | (OR) and !(NOT).

Let’s say what we want only those days in August and September with a Temperature less than 80:

 airAugSeptTemp80 <- isubset(airquality)

from the R Code line we see could also have run

airAugSeptTemp80 <- subset(airquality, Month>=8 & Temp<80)

Finally let’s say we want only either those days in August and September with a Temperature less than 80, or days with Wind>10:

airAugSeptTemp80W10 <- subset(airquality, (Month>=8 & Temp<80) | Wind>10)

Notice the R symbols for AND is & and for OR is |. Actually, a lot of computer programs use the same!

Let’s get back to the days in August and September. What we want to do with those days is to find the mean Ozone level:

airAugSept <- subset(airquality, Month>=8)
mean(Ozone)
## [1] NA

Oh! There are missing values in Ozone. So we need to take them out:

mean(Ozone, na.rm=TRUE) 
## [1] 42.12931

or we could use:

stat.table(Ozone)
## Warning:  37  missing values were removed!
##       Sample Size Mean Standard Deviation
## Ozone         116 42.1                 33

OK! But wait a minute:

length(Ozone)
## [1] 153
nrow(airAugSept)
## [1] 61

there are 153 Ozone values but our data set for August and September has only 61. The problem is that Ozone still comes from the original airquality data set, but our Ozone is still hidden inside airAugSept. One solution would be to

attach(airAugSept)
## The following objects are masked from airquality:
## 
##     Day, Month, Ozone, Solar.R, Temp, Wind

but as R is warning us, now there are two Ozones, and it can get quite confusing. Here is a better (and much faster!) solution. The subset command also let’s us pick just part of a data set to return:

newozone <- subset( airquality, Month >= 8, select = Ozone, drop = TRUE)
head(newozone)
## [1] 39  9 16 78 35 66
mean(newozone , na.rm = TRUE)
## [1] 44.92727

Note: the argument drop = TRUE is needed to turn the data frame into a vector so we can use the mean command.

Case Study: Age and Gender in Puerto Rico in 2000

Breakdown of the population of USA and Puerto Rico by age and gender, according to the 2000 Census

Data set: agesex

head(agesex)
##           Age  Male Female
## 1 Less than 1 29601  28442
## 2           1 29543  28130
## 3           2 30252  28881
## 4           3 30643  28867
## 5           4 31248  29799
## 6           5 31621  29696
tail(agesex)
##           Age Male Female
## 98         97  282    418
## 99         98  189    296
## 100        99  123    196
## 101 100 - 104  258    448
## 102 105 - 109   47     59
## 103  Over 110   17     27

shows us that the data set consists of three vectors: the ages, the number of males and the number of females. The first one is a character vector (“less than 1”) and the other two are numeric.
Because there are now rows and columns, elements of a data frame are accessed with the [. , .] method:

agesex[1, 1]
## [1] "Less than 1"
agesex[4, 3]
## [1] 28867
agesex[1, ]
##           Age  Male Female
## 1 Less than 1 29601  28442
agesex[ ,2]
##   [1] 29601 29543 30252 30643 31248 31621 30907 31100 30827 31798 33188
##  [12] 30807 30678 30665 30646 31117 31203 32735 32216 32038 32441 30281
##  [23] 30011 29019 27674 27468 25803 26233 26584 26930 26242 24645 24338
##  [34] 24883 26056 26107 25259 24637 24051 24367 24547 22809 23286 23184
##  [45] 22452 23028 21353 21199 20888 21268 22201 20794 21500 21249 20347
##  [56] 18879 18064 17756 16681 15751 15750 15179 14901 14284 14162 14023
##  [67] 11793 12358 11462 11346  9936 10161  9600  9169  8595  8471  7544
##  [78]  7174  6663  6144  5831  4982  4368  3849  3667  3482  3011  2560
##  [89]  2116  1802  1381  1034   862   673   493   427   310   282   189
## [100]   123   258    47    17
agesex[1:5, 2:3]
##    Male Female
## 1 29601  28442
## 2 29543  28130
## 3 30252  28881
## 4 30643  28867
## 5 31248  29799

to find the number of rows or columns of a data frame use

ncol(agesex)
## [1] 3
nrow(agesex)
## [1] 103

Let’s answer a few questions about the age and gender in PR in 2000: What was the number of men and women in PR in 2000?

attach(agesex)
sum(Male)
## [1] 1833577
sum(Female)
## [1] 1975033

How many people where there in PR?

People <- Male + Female
head(People)
## [1] 58043 57673 59133 59510 61047 61317
sum(People)  
## [1] 3808610

Notice we now have another variable called People among the data sets, as we can see with

ls()

It will stay there until we close R. If we want to keep it for the next time we use R we need to save everything with File > Save Workspace. If we want to save the workspace but not this variable we first have to

rm(People) 

How many newborns were there?

People[1]
## [1] 58043

How many teenagers were there? teenagers (Age from 13 to 19) are in rows 14 - 20, so

sum(People[14:20])
## [1] 433764

What percentage of the population was male, rounded to 1 digit behind the decimal point?

round(sum(Male)/sum(People)*100, 1)
## [1] 48.1

In how many age groups were there more males than females?

Let’s start with

Male > Female
##   [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
##  [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100] FALSE FALSE FALSE FALSE

and now we can find

sum(Male > Female)
## [1] 21

What age group had the largest population?

max(People)
## [1] 64795
People==max(People)
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100] FALSE FALSE FALSE FALSE
Age[People==max(People)]
## [1] " 10"

Why is the answer a bit strange?

Here is another way to do this:

order(People, decreasing = TRUE)
##   [1]  11  21  19  18  20  10   6   8  17   5  22  23  16   7  13  12  15
##  [18]  14   9   4   3  24   1   2  25  26  30  35  36  29  31  37  28  38
##  [35]  27  41  40  34  39  33  32  43  44  46  42  45  51  53  47  48  54
##  [52]  50  49  52  55  56  57  58  59  61  60  62  63  64  66  65  68  67
##  [69]  69  70  72  71  73  74  75  76  77  78  79  80  81  82  83  84  85
##  [86]  86  87  88  89  90  91  92  93  94  95  96  97 101  98  99 100 102
## [103] 103
head( agesex[ order(People, decreasing = TRUE), ]) 
##    Age  Male Female
## 11  10 33188  31607
## 21  20 32441  32154
## 19  18 32216  31705
## 18  17 32735  31070
## 20  19 32038  31744
## 10   9 31798  30101

another useful command is sort, which we can use to order one variable, by default from smallest to largest:

sort(People)
##   [1]    44   106   319   485   700   706   847  1122  1332  1728  2285
##  [12]  2694  3640  4466  5261  6278  7279  8414  8726  9132 10436 11659
##  [23] 13449 14211 15293 16657 17514 19403 19673 20588 21421 21865 23123
##  [34] 24982 25596 26222 26929 30387 30552 30690 32035 32737 34118 34715
##  [45] 36268 38544 39146 40807 44265 45004 45280 45875 45926 46155 46311
##  [56] 46579 48142 48987 49262 49499 50003 50009 50828 50951 51259 52213
##  [67] 52395 52553 52795 52807 53293 53573 53709 54352 54815 55124 55313
##  [78] 55754 56337 57673 58043 58725 59133 59510 60020 60112 60216 60221
##  [89] 60456 60695 60707 60748 60786 61047 61221 61231 61317 61899 63782
## [100] 63805 63921 64595 64795

What was the mean age of the population?

Because the data is grouped the mean is found as follows:

(0#of newborns + 1#of one year olds + 2*# of two year olds + … + )/total population

Age is a character variable but we need a quantitative one to do arithmetic, so let’s make one as close to Age as possible:

Ages  <- c(0:99, 102, 107, 112)
Ages
##   [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
##  [18]  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33
##  [35]  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50
##  [52]  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
##  [69]  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84
##  [86]  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 102 107
## [103] 112
round(sum(Ages*People)/sum(People), 1)
## [1] 34