Once you have started a session the first thing you see is some text, and then the > sign. This is the R prompt, it means R is waiting for you to do something.
Let’s start with
ls()
shows you a “listing”" of the files (data, routines etc.)
Everything in R is either a data set or a function. It is a function if it is supposed to do something (maybe calculate something, show you something like a graph or something else etc. ). If it is a function is ALWAYS NEEDS (). Sometimes there is something (called an argument) in between the parenthesis, like in the hplot() example above. Sometimes there isn’t like in the ls(). But the () has to be there anyway.
If you have worked for a while you might have things you need to save, do that by clicking on
File > Save Workspace
If you quit the program without saving your stuff everything you did will be lost. R has a somewhat unusual file system, everything belonging to the same project (data, routines, graphs etc.) are stored in just one file, with the extension .RData.
To close R click on the x in the upper right corner.
R has a nice recall feature, using the up and down arrow keys. Also, typing
history()
shows you the most recent things entered.
You can use the up and down arrow keys to recall recent commands.
R is case-sensitive, so a and A are two different things.
Often during a session you create objects that you need only for a short time. When you no longer need them use rm to get rid of them:
x <- 10
x^2
## [1] 100
rm(x)
the <- is the assignment character in R, it assigns what is on the right to the symbol on the left. (Think of an arrow to the left)
For a few numbers the easiest thing is to just type them in:
x <- 2
x
## [1] 2
x <- c(10, 2, 6, 9)
x
## [1] 10 2 6 9
c() is a function that takes the objects inside the () and combines them into one single object (a vector).
Most moodle quizzes will require you to transfer data from the quiz to R. This is done with the command get.moodle.data(). There are two steps:
in moodle use the mouse to highlight the data. If it is a table with several columns ALWAYS include the column headers (names of variables).
switch to R and run
get.moodle.data()
Now the data should be in R. It is called x. You can always check by typing x and hit ENTER.
x
Here are some examples: (Note that you can NOT copy the data from here, these are just pictures!)
say this is what you see in the quiz:
Now use the mouse to high-light (JUST!) the numbers, go to R and type
get.moodle.data()
You should see this:
Data begins with: [1] 1 2 3 4 5 6
Data has been saved as x
and you can check that the correct data has been transferred with:
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
## [103] 103 104 105 106 107 108
this also works if the data is not numbers:
get.moodle.data()
x
## [1] "Medium" "X-Large" "Medium" "Large" "Large" "X-Large" "Large"
## [8] "X-Large" "X-Large" "X-Large" "Large" "X-Large" "Large" "X-Large"
## [15] "X-Large" "Large" "Large" "Large" "Medium" "Large" "Medium"
## [22] "Small" "Large" "Large" "Small" "Small" "Small" "Small"
## [29] "Large" "X-Large" "Large" "Large" "Large" "Large" "Medium"
## [36] "Medium" "Small" "Medium"
Note you need to high-light the column names as well! (here RPM Oil) but NOT the dashed line.
get.moodle.data()
x
## RPM Oil
## 1 2100 0.9
## 2 2200 2.4
## 3 2300 0.5
## 4 2400 0.8
## 5 2500 1.6
## 6 2600 -0.2
## 7 2700 1.0
## 8 2800 0.8
## 9 2900 1.5
## 10 3000 0.3
## 11 3100 0.5
## 12 3200 1.7
## 13 3300 -1.3
## 14 3400 -1.4
## 15 3500 2.1
## 16 3600 0.4
## 17 3700 -2.1
## 18 3800 1.7
## 19 3900 1.3
if the data is a table it is immediately attached and you can use the column names, for example
mean(RPM)
## [1] 3000
Note on rare occasions the routine can fail if the data is a table but everything is text. In that case use the argument is.table=TRUE.
Note sometimes you might get a warning from R, as long as the data is transferred correctly you can ignore that.
the most basic type of data in R is a vector, simply a list of values.
Say we want the numbers 1.5, 3.6, 5.1 and 4.0 in an R vector called x, then we can type
x <- c(1.5, 3.6, 5.1, 4.0)
x
## [1] 1.5 3.6 5.1 4.0
Often the numbers have a structure one can make use of:
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
10:1
## [1] 10 9 8 7 6 5 4 3 2 1
1:20*2
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
c(1:10, 1:10*2)
## [1] 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20
Sometimes you need parentheses:
n <- 10
1:n-1
## [1] 0 1 2 3 4 5 6 7 8 9
1:(n-1)
## [1] 1 2 3 4 5 6 7 8 9
To find out how many elements a vector has use the length command:
x <- c(1.4, 5.1, 2.0, 6.8, 3.5, 2.1, 5.6, 3.3, 6.9, 1.1)
length(x)
## [1] 10
The elements of a vector are accessed with the bracket [ ] notation:
x[3]
## [1] 2
x[1:3]
## [1] 1.4 5.1 2.0
x[c(1, 3, 8)]
## [1] 1.4 2.0 3.3
x[-3]
## [1] 1.4 5.1 6.8 3.5 2.1 5.6 3.3 6.9 1.1
x[-c(1, 2, 5)]
## [1] 2.0 6.8 2.1 5.6 3.3 6.9 1.1
this also works with logic operations:
x[x>4]
## [1] 5.1 6.8 5.6 6.9
x[x<=5.1]
## [1] 1.4 5.1 2.0 3.5 2.1 3.3 1.1
Instead of numbers a vector can also consist of characters (letters, numbers, symbols etc.) These are identified by quotes:
c("A", "B", 7, "%")
## [1] "A" "B" "7" "%"
A vector is either numeric or character, but never both (see how the 7 was changed to “7”).
data frames are the basic format for data in R. They are essentially vectors put together as columns.
The main thing you need to know about working with data frames are the following commands:
consider the upr data set . This is the application data for all the students who applied and were accepted to UPR-Mayaguez between 2003 and 2013.
dim(upr)
## [1] 23666 16
tells us that there were 23666 applications and that for each student there are 16 pieces of information.
colnames(upr)
## [1] "ID.Code" "Year" "Gender" "Program.Code"
## [5] "Highschool.GPA" "Aptitud.Verbal" "Aptitud.Matem" "Aprov.Ingles"
## [9] "Aprov.Matem" "Aprov.Espanol" "IGS" "Freshmen.GPA"
## [13] "Graduated" "Year.Grad." "Grad..GPA" "Class.Facultad"
shows us the variables
head(upr, 3)
## ID.Code Year Gender Program.Code Highschool.GPA Aptitud.Verbal
## 1 00C2B4EF77 2005 M 502 3.97 647
## 2 00D66CF1BF 2003 M 502 3.80 597
## 3 00AB6118EB 2004 M 1203 4.00 567
## Aptitud.Matem Aprov.Ingles Aprov.Matem Aprov.Espanol IGS Freshmen.GPA
## 1 621 626 672 551 342 3.67
## 2 726 618 718 575 343 2.75
## 3 691 424 616 609 342 3.62
## Graduated Year.Grad. Grad..GPA Class.Facultad
## 1 Si 2012 3.33 INGE
## 2 No NA NA INGE
## 3 No NA NA CIENCIAS
shows us the first three cases.
Let’s say we want to find the number of males and females. We can use the table command for that:
table(Gender)
## Error: object 'Gender' not found
What happened? Right now R does not know what Gender is because it is “hidden” inside the upr data set. Think of upr as a box that is currently closed, so R can’t look inside and see the column names. We need to open the box first:
attach(upr)
table(Gender)
## Gender
## F M
## 11487 12179
Note: you need to attach a data frame only once in each session working with R.
Note: Say you are working first with a data set “students 2016” which has a column called Gender, and you attached it. Later (but in the same R session) you start working with a data set “students 2017” which also has a column called Gender, and you are attaching this one as well. If you use Gender now it will be from “students 2017”.
Note when the data was transferred from moodle with get.moodle.data() it is automatically attached.
Consider the following data frame (not a real data set):
students
## Age GPA Gender
## 1 22 3.1 Male
## 2 23 3.2 Male
## 3 20 2.1 Male
## 4 22 2.1 Male
## 5 21 2.3 Female
## 6 21 2.9 Male
## 7 18 2.3 Female
## 8 22 3.9 Male
## 9 21 2.6 Female
## 10 18 3.2 Female
Here each single piece of data is identified by its row number and its column number. So for example in row 2, column 2 we have “3.2”, in row 6, column 3 we have “Male”.
As with the vectors before we can use the [ ] notation to access pieces of a data frame, but now we need to give it both the row and the column number, separated by a ,:
students[6, 3]
## [1] "Male"
As before we can pick more than one piece:
students[1:5, 3]
## [1] "Male" "Male" "Male" "Male" "Female"
students[1:5, 1:2]
## Age GPA
## 1 22 3.1
## 2 23 3.2
## 3 20 2.1
## 4 22 2.1
## 5 21 2.3
students[-c(1:5), 3]
## [1] "Male" "Female" "Male" "Female" "Female"
students[1, ]
## Age GPA Gender
## 1 22 3.1 Male
students[, 2]
## [1] 3.1 3.2 2.1 2.1 2.3 2.9 2.3 3.9 2.6 3.2
students[, -3]
## Age GPA
## 1 22 3.1
## 2 23 3.2
## 3 20 2.1
## 4 22 2.1
## 5 21 2.3
## 6 21 2.9
## 7 18 2.3
## 8 22 3.9
## 9 21 2.6
## 10 18 3.2
R allows us to apply any mathematical functions to a whole vector:
x <- 1:10
2*x
## [1] 2 4 6 8 10 12 14 16 18 20
x^2
## [1] 1 4 9 16 25 36 49 64 81 100
log(x)
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 2.0794415 2.1972246 2.3025851
sum(x)
## [1] 55
y <- 21:30
x+y
## [1] 22 24 26 28 30 32 34 36 38 40
x^2+y^2
## [1] 442 488 538 592 650 712 778 848 922 1000
mean(x+y)
## [1] 31
One of the most common tasks in Statistic is to select a part of a data set for further analysis. There is even a name for this: data wrangling.
Description: Daily measurements of air quality in New York, May to September 1973.
A data frame with 154 observations on 6 variables.
Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island
Solar.R: Solar radiation in Langleys in the frequency band 4000-7700 Angstroms from 0800 to 1200 hours at Central Park
Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport
Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.
Source: The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data).
head(airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
This task of data wrangling is so important, there are quite a lot of routines that are helping with it. One of them is isubset.
Say we want to analyze the data for the months of August and September only. Notice that in the variable Month those are 8 (=August) and 9 (=September).
Here is what you do:
airAugSept <- isubset(airquality)
The app lets you use up to three conditions, we just have one (Month \(\ge\) 8), so we can leave that alone. Now choose the condition and then hit “Click when ready to run”
Here is a screenshot:
now hit Close App and return to R.
In this example we used a very simple condition: Month \(\ge\) 8. These conditions can be much more complicated using & (AND), | (OR) and !(NOT).
Let’s say what we want only those days in August and September with a Temperature less than 80:
airAugSeptTemp80 <- isubset(airquality)
Finally let’s say we want only either those days in August and September with a Temperature less than 80, or days with Wind>10:
There is another type of subsetting we sometimes need, namely comparing the values in variable with those in another. Say we want to consider those cases un the upr data set where the Aptitude.verbal score is lower than the Apritide.matem score. In this case choose Variable from the dropdown box instead of value:
Let’s get back to the days in August and September. What we want to do with those days is to find the mean Ozone level:
attach(airAugSept)
stat.table(Ozone)
## Warning: 6 missing values were removed!
## Sample Size Mean Standard Deviation
## Ozone 55 44.9 35.2
OK!
Breakdown of the population of USA and Puerto Rico by age and gender, according to the 2000 Census.
head(agesex)
## Age Male Female
## 1 Less than 1 29601 28442
## 2 1 29543 28130
## 3 2 30252 28881
## 4 3 30643 28867
## 5 4 31248 29799
## 6 5 31621 29696
tail(agesex)
## Age Male Female
## 98 97 282 418
## 99 98 189 296
## 100 99 123 196
## 101 100 - 104 258 448
## 102 105 - 109 47 59
## 103 Over 110 17 27
shows us that the data set consists of three vectors: the ages, the number of males and the number of females. The first one is a character vector (“less than 1”) and the other two are numeric.
Let’s answer a few questions about the age and gender in PR in 2000:
attach(agesex)
sum(Male)
## [1] 1833577
sum(Female)
## [1] 1975033
Simple:
sum(Male)+sum(Female)
## [1] 3808610
we will need the column with the Male and Female counts a few more times, so maybe we should do it this way:
People <- Male + Female
head(People)
## [1] 58043 57673 59133 59510 61047 61317
sum(People)
## [1] 3808610
Note
we now have another variable called People among the data sets, as we can see with
ls()
It will stay there until we close R. If we want to keep it for the next time we use R we need to save everything with File > Save Workspace. If we want to save the workspace but not this variable we first have to
rm(People)
People[1]
## [1] 58043
teenagers (Age from 13 to 19) are in rows 14 - 20, so
sum(People[14:20])
## [1] 433764
sum(Male)/sum(People)*100
## [1] 48.14294
round(sum(Male)/sum(People)*100, 1)
## [1] 48.1
sum(Male > Female)
## [1] 21
max(People)
## [1] 64795
agesex[People==64795, ]
## Age Male Female
## 11 10 33188 31607
Note == is the symbol for “is equal to”. The others are
So the age group of 10 year olds is the largest. Why is this answer a bit strange?
agesex
## Age Male Female
## 1 Less than 1 29601 28442
## 2 1 29543 28130
## 3 2 30252 28881
## 4 3 30643 28867
## 5 4 31248 29799
## 6 5 31621 29696
## 7 6 30907 29788
## 8 7 31100 30131
## 9 8 30827 29193
## 10 9 31798 30101
## 11 10 33188 31607
## 12 11 30807 29414
## 13 12 30678 29778
## 14 13 30665 29447
## 15 14 30646 29570
## 16 15 31117 29590
## 17 16 31203 30018
## 18 17 32735 31070
## 19 18 32216 31705
## 20 19 32038 31744
## 21 20 32441 32154
## 22 21 30281 30505
## 23 22 30011 30737
## 24 23 29019 29706
## 25 24 27674 28663
## 26 25 27468 28286
## 27 26 25803 26992
## 28 27 26233 27060
## 29 28 26584 27768
## 30 29 26930 28383
## 31 30 26242 27467
## 32 31 24645 26183
## 33 32 24338 26613
## 34 33 24883 27330
## 35 34 26056 29068
## 36 35 26107 28708
## 37 36 25259 28314
## 38 37 24637 28170
## 39 38 24051 27208
## 40 39 24367 28028
## 41 40 24547 28006
## 42 41 22809 26453
## 43 42 23286 26723
## 44 43 23184 26819
## 45 44 22452 26535
## 46 45 23028 26471
## 47 46 21353 24958
## 48 47 21199 24956
## 49 48 20888 24392
## 50 49 21268 24607
## 51 50 22201 25941
## 52 51 20794 24210
## 53 52 21500 25079
## 54 53 21249 24677
## 55 54 20347 23918
## 56 55 18879 21928
## 57 56 18064 21082
## 58 57 17756 20788
## 59 58 16681 19587
## 60 59 15751 18367
## 61 60 15750 18965
## 62 61 15179 17558
## 63 62 14901 17134
## 64 63 14284 16406
## 65 64 14162 16225
## 66 65 14023 16529
## 67 66 11793 14429
## 68 67 12358 14571
## 69 68 11462 14134
## 70 69 11346 13636
## 71 70 9936 11929
## 72 71 10161 12962
## 73 72 9600 11821
## 74 73 9169 11419
## 75 74 8595 11078
## 76 75 8471 10932
## 77 76 7544 9970
## 78 77 7174 9483
## 79 78 6663 8630
## 80 79 6144 8067
## 81 80 5831 7618
## 82 81 4982 6677
## 83 82 4368 6068
## 84 83 3849 5283
## 85 84 3667 5059
## 86 85 3482 4932
## 87 86 3011 4268
## 88 87 2560 3718
## 89 88 2116 3145
## 90 89 1802 2664
## 91 90 1381 2259
## 92 91 1034 1660
## 93 92 862 1423
## 94 93 673 1055
## 95 94 493 839
## 96 95 427 695
## 97 96 310 537
## 98 97 282 418
## 99 98 189 296
## 100 99 123 196
## 101 100 - 104 258 448
## 102 105 - 109 47 59
## 103 Over 110 17 27
so that rows 101-103, so
sum(People[101:103])/sum(People)*100
## [1] 0.02247539
Mortality rates and life expectancy of countries in 2017:
head(world.mortality.2017, 3)
## Country Deaths Rate LifeExpectancy.Both LifeExpectancy.Males
## 1 Burundi 113 11.0 57.1 55.1
## 2 Comoros 6 7.5 63.5 61.8
## 3 Djibouti 8 8.4 62.3 60.6
## LifeExpectancy.Females InfantMortality UnderFive Prob15.60 Prob0.70
## 1 59.1 73 123 295 550
## 2 65.2 55 78 228 476
## 3 63.9 53 83 252 475
## PercUnder5 PercUnder5.25 Perc25.65 PercOver65
## 1 44 14 25 16
## 2 33 10 31 27
## 3 23 12 37 29
attach(world.mortality.2017)
world.mortality.2017[Country=="Puerto Rico", ]
## Country Deaths Rate LifeExpectancy.Both LifeExpectancy.Males
## 179 Puerto Rico 29 7.9 79.7 75.8
## LifeExpectancy.Females InfantMortality UnderFive Prob15.60 Prob0.70
## 179 83.6 6 7 96 199
## PercUnder5 PercUnder5.25 Perc25.65 PercOver65
## 179 1 1 21 77
range(LifeExpectancy.Males)
## [1] 49.6 81.1
world.mortality.2017[LifeExpectancy.Males==49.6, ]
## Country Deaths Rate LifeExpectancy.Both
## 24 Central African Republic 64 14 51.4
## LifeExpectancy.Males LifeExpectancy.Females InfantMortality UnderFive
## 24 49.6 53.2 87 150
## Prob15.60 Prob0.70 PercUnder5 PercUnder5.25 Perc25.65 PercOver65
## 24 417 655 36 14 31 19
world.mortality.2017[LifeExpectancy.Males==81.1, ]
## Country Deaths Rate LifeExpectancy.Both LifeExpectancy.Males
## 137 Iceland 2 6.5 82.6 81.1
## 164 Switzerland 67 8.0 83.1 81.1
## LifeExpectancy.Females InfantMortality UnderFive Prob15.60 Prob0.70
## 137 84.1 2 2 50 131
## 164 85.1 4 4 50 127
## PercUnder5 PercUnder5.25 Perc25.65 PercOver65
## 137 0 1 15 84
## 164 1 0 13 86
sum(LifeExpectancy.Males>LifeExpectancy.Females)
## [1] 0
sort(InfantMortality)
## [1] 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3
## [24] 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 6
## [47] 6 6 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9
## [70] 9 9 9 9 9 9 9 9 9 10 10 10 10 11 11 11 12 12 12 12 12 13 13
## [93] 13 13 13 14 14 14 14 14 14 15 16 16 16 16 16 17 17 17 17 17 17 18 18
## [116] 18 18 19 20 20 20 21 21 23 23 23 23 24 25 25 25 26 26 27 27 27 28 29
## [139] 29 29 30 30 31 31 32 32 33 33 33 36 37 38 38 39 39 39 40 40 41 41 42
## [162] 42 43 43 43 43 43 44 45 45 45 46 47 47 49 50 53 53 53 54 55 57 59 59
## [185] 60 61 61 62 63 64 64 65 65 66 67 69 70 72 72 73 74 75 86 86 87
Country[InfantMortality==2]
## [1] "China, Hong Kong SAR" "Japan" "Singapore"
## [4] "Czechia" "Finland" "Iceland"
## [7] "Norway" "Sweden" "Portugal"
## [10] "Slovenia"
We saw before that in PR 0.0225 percent of the population is 100 years old or older. How does that compare to the other states?
The data set agesexUS has the the numbers for all states. The total population in 2000 was
attach(agesexUS)
colnames(agesexUS)
## [1] "State" "Total" "M.Total" "M.less.than.1"
## [5] "M..1" "M..2" "M..3" "M..4"
## [9] "M..5" "M..6" "M..7" "M..8"
## [13] "M..9" "M..10" "M..11" "M..12"
## [17] "M..13" "M..14" "M..15" "M..16"
## [21] "M..17" "M..18" "M..19" "M..20"
## [25] "M..21" "M..22" "M..23" "M..24"
## [29] "M..25" "M..26" "M..27" "M..28"
## [33] "M..29" "M..30" "M..31" "M..32"
## [37] "M..33" "M..34" "M..35" "M..36"
## [41] "M..37" "M..38" "M..39" "M..40"
## [45] "M..41" "M..42" "M..43" "M..44"
## [49] "M..45" "M..46" "M..47" "M..48"
## [53] "M..49" "M..50" "M..51" "M..52"
## [57] "M..53" "M..54" "M..55" "M..56"
## [61] "M..57" "M..58" "M..59" "M..60"
## [65] "M..61" "M..62" "M..63" "M..64"
## [69] "M..65" "M..66" "M..67" "M..68"
## [73] "M..69" "M..70" "M..71" "M..72"
## [77] "M..73" "M..74" "M..75" "M..76"
## [81] "M..77" "M..78" "M..79" "M..80"
## [85] "M..81" "M..82" "M..83" "M..84"
## [89] "M..85" "M..86" "M..87" "M..88"
## [93] "M..89" "M..90" "M..91" "M..92"
## [97] "M..93" "M..94" "M..95" "M..96"
## [101] "M..97" "M..98" "M..99" "M.100.104"
## [105] "M.105.109" "M.110.and.over" "F.Total" "F.less.than.1"
## [109] "F..1" "F..2" "F..3" "F..4"
## [113] "F..5" "F..6" "F..7" "F..8"
## [117] "F..9" "F..10" "F..11" "F..12"
## [121] "F..13" "F..14" "F..15" "F..16"
## [125] "F..17" "F..18" "F..19" "F..20"
## [129] "F..21" "F..22" "F..23" "F..24"
## [133] "F..25" "F..26" "F..27" "F..28"
## [137] "F..29" "F..30" "F..31" "F..32"
## [141] "F..33" "F..34" "F..35" "F..36"
## [145] "F..37" "F..38" "F..39" "F..40"
## [149] "F..41" "F..42" "F..43" "F..44"
## [153] "F..45" "F..46" "F..47" "F..48"
## [157] "F..49" "F..50" "F..51" "F..52"
## [161] "F..53" "F..54" "F..55" "F..56"
## [165] "F..57" "F..58" "F..59" "F..60"
## [169] "F..61" "F..62" "F..63" "F..64"
## [173] "F..65" "F..66" "F..67" "F..68"
## [177] "F..69" "F..70" "F..71" "F..72"
## [181] "F..73" "F..74" "F..75" "F..76"
## [185] "F..77" "F..78" "F..79" "F..80"
## [189] "F..81" "F..82" "F..83" "F..84"
## [193] "F..85" "F..86" "F..87" "F..88"
## [197] "F..89" "F..90" "F..91" "F..92"
## [201] "F..93" "F..94" "F..95" "F..96"
## [205] "F..97" "F..98" "F..99" "F.100.104"
## [209] "F.105.109" "F.110.and.over"
so we see there is a column called Total in column 2, which give us the population for each state. The columns with the people 100 and older are columns number 104, 105 106 (males) and 208, 209 and 210 (female), so
old <- agesexUS[, 104] + agesexUS[, 105] +
agesexUS[, 106] + agesexUS[, 208] +
agesexUS[, 209] + agesexUS[, 210]
percentage <- old/Total*100
names(percentage) <- State
sum(percentage<0.0225)
## [1] 40
so 40 of the 50 states have a lower percentage.