Basic R commands

Once you have started a session the first thing you see is some text, and then the > sign. This is the R prompt, it means R is waiting for you to do something.

Let’s start with

ls()

shows you a “listing”" of the files (data, routines etc.)

Everything in R is either a data set or a function. It is a function if it is supposed to do something (maybe calculate something, show you something like a graph or something else etc. ). If it is a function is ALWAYS NEEDS (). Sometimes there is something (called an argument) in between the parenthesis, like in the hplot() example above. Sometimes there isn’t like in the ls(). But the () has to be there anyway.

If you have worked for a while you might have things you need to save, do that by clicking on

File > Save Workspace

If you quit the program without saving your stuff everything you did will be lost. R has a somewhat unusual file system, everything belonging to the same project (data, routines, graphs etc.) are stored in just one file, with the extension .RData.

To close R click on the x in the upper right corner.

R has a nice recall feature, using the up and down arrow keys. Also, typing

history()

shows you the most recent things entered.

You can use the up and down arrow keys to recall recent commands.

R is case-sensitive, so a and A are two different things.

Often during a session you create objects that you need only for a short time. When you no longer need them use rm to get rid of them:

x <- 10
x^2

## [1] 100

rm(x)

the <- is the assignment character in R, it assigns what is on the right to the symbol on the left. (Think of an arrow to the left)

Data Entry with the keyboard

For a few numbers the easiest thing is to just type them in:

x <- 2
x

## [1] 2

x <-  c(10, 2, 6, 9)
x

## [1] 10  2  6  9

c() is a function that takes the objects inside the () and combines them into one single object (a vector).

Getting Data from Moodle Quizzes

Most moodle quizzes will require you to transfer data from the quiz to R. This is done with the command get.moodle.data(). There are two steps:

in moodle use the mouse to highlight the data. If it is a table with several columns ALWAYS include the column headers (names of variables).
switch to R and run

get.moodle.data()

Now the data should be in R. It is called x. You can always check by typing x and hit ENTER.

Here are some examples: (Note that you can NOT copy the data from here, these are just pictures!)

single set of numbers:

say this is what you see in the quiz:

Now use the mouse to high-light (JUST!) the numbers, go to R and type

get.moodle.data()

You should see this:

Data begins with: [1] 1 2 3 4 5 6

Data has been saved as x

and you can check that the correct data has been transferred with:

##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
##  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
##  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
##  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
##  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102
## [103] 103 104 105 106 107 108

this also works if the data is not numbers:

get.moodle.data()

##  [1] "Medium"  "X-Large" "Medium"  "Large"   "Large"   "X-Large" "Large"  
##  [8] "X-Large" "X-Large" "X-Large" "Large"   "X-Large" "Large"   "X-Large"
## [15] "X-Large" "Large"   "Large"   "Large"   "Medium"  "Large"   "Medium" 
## [22] "Small"   "Large"   "Large"   "Small"   "Small"   "Small"   "Small"  
## [29] "Large"   "X-Large" "Large"   "Large"   "Large"   "Large"   "Medium" 
## [36] "Medium"  "Small"   "Medium"

data is in the form of a table with several columns:

Note you need to high-light the column names as well! (here RPM Oil) but NOT the dashed line.

get.moodle.data()

##     RPM  Oil
## 1  2100  0.9
## 2  2200  2.4
## 3  2300  0.5
## 4  2400  0.8
## 5  2500  1.6
## 6  2600 -0.2
## 7  2700  1.0
## 8  2800  0.8
## 9  2900  1.5
## 10 3000  0.3
## 11 3100  0.5
## 12 3200  1.7
## 13 3300 -1.3
## 14 3400 -1.4
## 15 3500  2.1
## 16 3600  0.4
## 17 3700 -2.1
## 18 3800  1.7
## 19 3900  1.3

if the data is a table it is immediately attached and you can use the column names, for example

mean(RPM)

## [1] 3000

Note on rare occasions the routine can fail if the data is a table but everything is text. In that case use the argument is.table=TRUE.

Note sometimes you might get a warning from R, as long as the data is transferred correctly you can ignore that.

Data Types in R

the most basic type of data in R is a vector, simply a list of values.

Say we want the numbers 1.5, 3.6, 5.1 and 4.0 in an R vector called x, then we can type

x <- c(1.5, 3.6, 5.1, 4.0)
x

## [1] 1.5 3.6 5.1 4.0

Often the numbers have a structure one can make use of:

1:10

##  [1]  1  2  3  4  5  6  7  8  9 10

10:1

##  [1] 10  9  8  7  6  5  4  3  2  1

1:20*2

##  [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

c(1:10, 1:10*2)

##  [1]  1  2  3  4  5  6  7  8  9 10  2  4  6  8 10 12 14 16 18 20

Sometimes you need parentheses:

n <- 10
1:n-1

##  [1] 0 1 2 3 4 5 6 7 8 9

1:(n-1)

## [1] 1 2 3 4 5 6 7 8 9

Commands for Vectors

To find out how many elements a vector has use the length command:

x <- c(1.4, 5.1, 2.0, 6.8, 3.5, 2.1, 5.6, 3.3, 6.9, 1.1)
length(x)

## [1] 10

The elements of a vector are accessed with the bracket [ ] notation:

x[3]

## [1] 2

x[1:3]

## [1] 1.4 5.1 2.0

x[c(1, 3, 8)]

## [1] 1.4 2.0 3.3

x[-3]

## [1] 1.4 5.1 6.8 3.5 2.1 5.6 3.3 6.9 1.1

x[-c(1, 2, 5)]

## [1] 2.0 6.8 2.1 5.6 3.3 6.9 1.1

this also works with logic operations:

x[x>4]

## [1] 5.1 6.8 5.6 6.9

x[x<=5.1]

## [1] 1.4 5.1 2.0 3.5 2.1 3.3 1.1

Instead of numbers a vector can also consist of characters (letters, numbers, symbols etc.) These are identified by quotes:

c("A", "B", 7, "%")

## [1] "A" "B" "7" "%"

A vector is either numeric or character, but never both (see how the 7 was changed to “7”).

Data Frames

data frames are the basic format for data in R. They are essentially vectors put together as columns.

The main thing you need to know about working with data frames are the following commands:

Case Study: UPR Admissions

consider the upr data set . This is the application data for all the students who applied and were accepted to UPR-Mayaguez between 2003 and 2013.

dim(upr)

## [1] 23666    16

tells us that there were 23666 applications and that for each student there are 16 pieces of information.

colnames(upr)

##  [1] "ID.Code"        "Year"           "Gender"         "Program.Code"  
##  [5] "Highschool.GPA" "Aptitud.Verbal" "Aptitud.Matem"  "Aprov.Ingles"  
##  [9] "Aprov.Matem"    "Aprov.Espanol"  "IGS"            "Freshmen.GPA"  
## [13] "Graduated"      "Year.Grad."     "Grad..GPA"      "Class.Facultad"

shows us the variables

head(upr, 3)

##      ID.Code Year Gender Program.Code Highschool.GPA Aptitud.Verbal
## 1 00C2B4EF77 2005      M          502           3.97            647
## 2 00D66CF1BF 2003      M          502           3.80            597
## 3 00AB6118EB 2004      M         1203           4.00            567
##   Aptitud.Matem Aprov.Ingles Aprov.Matem Aprov.Espanol IGS Freshmen.GPA
## 1           621          626         672           551 342         3.67
## 2           726          618         718           575 343         2.75
## 3           691          424         616           609 342         3.62
##   Graduated Year.Grad. Grad..GPA Class.Facultad
## 1        Si       2012      3.33           INGE
## 2        No         NA        NA           INGE
## 3        No         NA        NA       CIENCIAS

shows us the first three cases.

Let’s say we want to find the number of males and females. We can use the table command for that:

table(Gender)

## Error: object 'Gender' not found

What happened? Right now R does not know what Gender is because it is “hidden” inside the upr data set. Think of upr as a box that is currently closed, so R can’t look inside and see the column names. We need to open the box first:

attach(upr)
table(Gender)

## Gender
##     F     M 
## 11487 12179

Note: you need to attach a data frame only once in each session working with R.

Note: Say you are working first with a data set “students 2016” which has a column called Gender, and you attached it. Later (but in the same R session) you start working with a data set “students 2017” which also has a column called Gender, and you are attaching this one as well. If you use Gender now it will be from “students 2017”.

Note when the data was transferred from moodle with get.moodle.data() it is automatically attached.

Subsetting of Data Frames

Consider the following data frame (not a real data set):

students

##    Age GPA Gender
## 1   22 3.1   Male
## 2   23 3.2   Male
## 3   20 2.1   Male
## 4   22 2.1   Male
## 5   21 2.3 Female
## 6   21 2.9   Male
## 7   18 2.3 Female
## 8   22 3.9   Male
## 9   21 2.6 Female
## 10  18 3.2 Female

Here each single piece of data is identified by its row number and its column number. So for example in row 2, column 2 we have “3.2”, in row 6, column 3 we have “Male”.

As with the vectors before we can use the [ ] notation to access pieces of a data frame, but now we need to give it both the row and the column number, separated by a ,:

students[6, 3]

## [1] "Male"

As before we can pick more than one piece:

students[1:5, 3]

## [1] "Male"   "Male"   "Male"   "Male"   "Female"

students[1:5, 1:2]

##   Age GPA
## 1  22 3.1
## 2  23 3.2
## 3  20 2.1
## 4  22 2.1
## 5  21 2.3

students[-c(1:5), 3]

## [1] "Male"   "Female" "Male"   "Female" "Female"

students[1, ]

##   Age GPA Gender
## 1  22 3.1   Male

students[, 2]

##  [1] 3.1 3.2 2.1 2.1 2.3 2.9 2.3 3.9 2.6 3.2

students[, -3]

##    Age GPA
## 1   22 3.1
## 2   23 3.2
## 3   20 2.1
## 4   22 2.1
## 5   21 2.3
## 6   21 2.9
## 7   18 2.3
## 8   22 3.9
## 9   21 2.6
## 10  18 3.2

Vector Arithmetic

R allows us to apply any mathematical functions to a whole vector:

x <- 1:10
2*x

##  [1]  2  4  6  8 10 12 14 16 18 20

x^2

##  [1]   1   4   9  16  25  36  49  64  81 100

log(x)

##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
##  [8] 2.0794415 2.1972246 2.3025851

sum(x)

## [1] 55

y <- 21:30

x+y

##  [1] 22 24 26 28 30 32 34 36 38 40

x^2+y^2

##  [1]  442  488  538  592  650  712  778  848  922 1000

mean(x+y)

## [1] 31

Subsetting

One of the most common tasks in Statistic is to select a part of a data set for further analysis. There is even a name for this: data wrangling.

Case Study: New York Air Quality Measurements

Description: Daily measurements of air quality in New York, May to September 1973.

A data frame with 154 observations on 6 variables.

Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island

Solar.R: Solar radiation in Langleys in the frequency band 4000-7700 Angstroms from 0800 to 1200 hours at Central Park

Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport

Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.

Source: The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data).

head(airquality)

##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

This task of data wrangling is so important, there are quite a lot of routines that are helping with it. One of them is isubset.

Say we want to analyze the data for the months of August and September only. Notice that in the variable Month those are 8 (=August) and 9 (=September).

Here is what you do:

airAugSept <- isubset(airquality)

The app lets you use up to three conditions, we just have one (Month \(\ge\) 8), so we can leave that alone. Now choose the condition and then hit “Click when ready to run”

Here is a screenshot:

now hit Close App and return to R.

In this example we used a very simple condition: Month \(\ge\) 8. These conditions can be much more complicated using & (AND), | (OR) and !(NOT).

Let’s say what we want only those days in August and September with a Temperature less than 80:

airAugSeptTemp80 <- isubset(airquality)

Finally let’s say we want only either those days in August and September with a Temperature less than 80, or days with Wind>10:

There is another type of subsetting we sometimes need, namely comparing the values in variable with those in another. Say we want to consider those cases un the upr data set where the Aptitude.verbal score is lower than the Apritide.matem score. In this case choose Variable from the dropdown box instead of value:

Let’s get back to the days in August and September. What we want to do with those days is to find the mean Ozone level:

attach(airAugSept)
stat.table(Ozone)

## Warning:  6  missing values were removed!

##       Sample Size Mean Standard Deviation
## Ozone          55 44.9               35.2

OK!

Case Study: Age and Gender in Puerto Rico in 2000

Breakdown of the population of USA and Puerto Rico by age and gender, according to the 2000 Census.

head(agesex)

##           Age  Male Female
## 1 Less than 1 29601  28442
## 2           1 29543  28130
## 3           2 30252  28881
## 4           3 30643  28867
## 5           4 31248  29799
## 6           5 31621  29696

tail(agesex)

##           Age Male Female
## 98         97  282    418
## 99         98  189    296
## 100        99  123    196
## 101 100 - 104  258    448
## 102 105 - 109   47     59
## 103  Over 110   17     27

shows us that the data set consists of three vectors: the ages, the number of males and the number of females. The first one is a character vector (“less than 1”) and the other two are numeric.

Let’s answer a few questions about the age and gender in PR in 2000:

What was the number of men and women in PR in 2000?

attach(agesex)
sum(Male)

## [1] 1833577

sum(Female)

## [1] 1975033

How many people where there in PR?

Simple:

sum(Male)+sum(Female)

## [1] 3808610

we will need the column with the Male and Female counts a few more times, so maybe we should do it this way:

People <- Male + Female
head(People)

## [1] 58043 57673 59133 59510 61047 61317

sum(People)

## [1] 3808610

Note

we now have another variable called People among the data sets, as we can see with

ls()

It will stay there until we close R. If we want to keep it for the next time we use R we need to save everything with File > Save Workspace. If we want to save the workspace but not this variable we first have to

rm(People)

How many newborns were there?

People[1]

## [1] 58043

How many teenagers were there?

teenagers (Age from 13 to 19) are in rows 14 - 20, so

sum(People[14:20])

## [1] 433764

What percentage of the population was male, rounded to 1 digit behind the decimal point?

sum(Male)/sum(People)*100

## [1] 48.14294

round(sum(Male)/sum(People)*100, 1)

## [1] 48.1

In how many age groups were there more males than females?

sum(Male > Female)

## [1] 21

What age group had the largest population?

max(People)

## [1] 64795

agesex[People==64795, ]

##    Age  Male Female
## 11  10 33188  31607

Note == is the symbol for “is equal to”. The others are

\(<\) “is less than”
\(<=\) “is less or equal to”
\(>\) “is greater than”
\(>=\) “is greater or equal to”

So the age group of 10 year olds is the largest. Why is this answer a bit strange?

what percentage of the population was 100 year or older?

agesex

##             Age  Male Female
## 1   Less than 1 29601  28442
## 2             1 29543  28130
## 3             2 30252  28881
## 4             3 30643  28867
## 5             4 31248  29799
## 6             5 31621  29696
## 7             6 30907  29788
## 8             7 31100  30131
## 9             8 30827  29193
## 10            9 31798  30101
## 11           10 33188  31607
## 12           11 30807  29414
## 13           12 30678  29778
## 14           13 30665  29447
## 15           14 30646  29570
## 16           15 31117  29590
## 17           16 31203  30018
## 18           17 32735  31070
## 19           18 32216  31705
## 20           19 32038  31744
## 21           20 32441  32154
## 22           21 30281  30505
## 23           22 30011  30737
## 24           23 29019  29706
## 25           24 27674  28663
## 26           25 27468  28286
## 27           26 25803  26992
## 28           27 26233  27060
## 29           28 26584  27768
## 30           29 26930  28383
## 31           30 26242  27467
## 32           31 24645  26183
## 33           32 24338  26613
## 34           33 24883  27330
## 35           34 26056  29068
## 36           35 26107  28708
## 37           36 25259  28314
## 38           37 24637  28170
## 39           38 24051  27208
## 40           39 24367  28028
## 41           40 24547  28006
## 42           41 22809  26453
## 43           42 23286  26723
## 44           43 23184  26819
## 45           44 22452  26535
## 46           45 23028  26471
## 47           46 21353  24958
## 48           47 21199  24956
## 49           48 20888  24392
## 50           49 21268  24607
## 51           50 22201  25941
## 52           51 20794  24210
## 53           52 21500  25079
## 54           53 21249  24677
## 55           54 20347  23918
## 56           55 18879  21928
## 57           56 18064  21082
## 58           57 17756  20788
## 59           58 16681  19587
## 60           59 15751  18367
## 61           60 15750  18965
## 62           61 15179  17558
## 63           62 14901  17134
## 64           63 14284  16406
## 65           64 14162  16225
## 66           65 14023  16529
## 67           66 11793  14429
## 68           67 12358  14571
## 69           68 11462  14134
## 70           69 11346  13636
## 71           70  9936  11929
## 72           71 10161  12962
## 73           72  9600  11821
## 74           73  9169  11419
## 75           74  8595  11078
## 76           75  8471  10932
## 77           76  7544   9970
## 78           77  7174   9483
## 79           78  6663   8630
## 80           79  6144   8067
## 81           80  5831   7618
## 82           81  4982   6677
## 83           82  4368   6068
## 84           83  3849   5283
## 85           84  3667   5059
## 86           85  3482   4932
## 87           86  3011   4268
## 88           87  2560   3718
## 89           88  2116   3145
## 90           89  1802   2664
## 91           90  1381   2259
## 92           91  1034   1660
## 93           92   862   1423
## 94           93   673   1055
## 95           94   493    839
## 96           95   427    695
## 97           96   310    537
## 98           97   282    418
## 99           98   189    296
## 100          99   123    196
## 101   100 - 104   258    448
## 102   105 - 109    47     59
## 103    Over 110    17     27

so that rows 101-103, so

sum(People[101:103])/sum(People)*100

## [1] 0.02247539

Case Study: World Mortality Rates

Mortality rates and life expectancy of countries in 2017:

head(world.mortality.2017, 3)

##    Country Deaths Rate LifeExpectancy.Both LifeExpectancy.Males
## 1  Burundi    113 11.0                57.1                 55.1
## 2  Comoros      6  7.5                63.5                 61.8
## 3 Djibouti      8  8.4                62.3                 60.6
##   LifeExpectancy.Females InfantMortality UnderFive Prob15.60 Prob0.70
## 1                   59.1              73       123       295      550
## 2                   65.2              55        78       228      476
## 3                   63.9              53        83       252      475
##   PercUnder5 PercUnder5.25 Perc25.65 PercOver65
## 1         44            14        25         16
## 2         33            10        31         27
## 3         23            12        37         29

What are the numbers for Puerto Rico?

attach(world.mortality.2017)
world.mortality.2017[Country=="Puerto Rico", ]

##         Country Deaths Rate LifeExpectancy.Both LifeExpectancy.Males
## 179 Puerto Rico     29  7.9                79.7                 75.8
##     LifeExpectancy.Females InfantMortality UnderFive Prob15.60 Prob0.70
## 179                   83.6               6         7        96      199
##     PercUnder5 PercUnder5.25 Perc25.65 PercOver65
## 179          1             1        21         77

What countries had the shortest and the longest life expectancy for men?

range(LifeExpectancy.Males)

## [1] 49.6 81.1

world.mortality.2017[LifeExpectancy.Males==49.6, ]

##                     Country Deaths Rate LifeExpectancy.Both
## 24 Central African Republic     64   14                51.4
##    LifeExpectancy.Males LifeExpectancy.Females InfantMortality UnderFive
## 24                 49.6                   53.2              87       150
##    Prob15.60 Prob0.70 PercUnder5 PercUnder5.25 Perc25.65 PercOver65
## 24       417      655         36            14        31         19

world.mortality.2017[LifeExpectancy.Males==81.1, ]

##         Country Deaths Rate LifeExpectancy.Both LifeExpectancy.Males
## 137     Iceland      2  6.5                82.6                 81.1
## 164 Switzerland     67  8.0                83.1                 81.1
##     LifeExpectancy.Females InfantMortality UnderFive Prob15.60 Prob0.70
## 137                   84.1               2         2        50      131
## 164                   85.1               4         4        50      127
##     PercUnder5 PercUnder5.25 Perc25.65 PercOver65
## 137          0             1        15         84
## 164          1             0        13         86

in how many countries is the life expectancy of men higher than of women?

sum(LifeExpectancy.Males>LifeExpectancy.Females)

## [1] 0

What 5 countries have the lowest infant mortality rates?

sort(InfantMortality)

##   [1]  2  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3  3
##  [24]  3  3  3  3  4  4  4  4  4  4  4  4  4  4  4  4  4  4  5  5  5  5  6
##  [47]  6  6  6  6  6  6  6  6  7  7  7  7  7  8  8  8  8  8  8  8  9  9  9
##  [70]  9  9  9  9  9  9  9  9  9 10 10 10 10 11 11 11 12 12 12 12 12 13 13
##  [93] 13 13 13 14 14 14 14 14 14 15 16 16 16 16 16 17 17 17 17 17 17 18 18
## [116] 18 18 19 20 20 20 21 21 23 23 23 23 24 25 25 25 26 26 27 27 27 28 29
## [139] 29 29 30 30 31 31 32 32 33 33 33 36 37 38 38 39 39 39 40 40 41 41 42
## [162] 42 43 43 43 43 43 44 45 45 45 46 47 47 49 50 53 53 53 54 55 57 59 59
## [185] 60 61 61 62 63 64 64 65 65 66 67 69 70 72 72 73 74 75 86 86 87

Country[InfantMortality==2]

##  [1] "China, Hong Kong SAR" "Japan"                "Singapore"           
##  [4] "Czechia"              "Finland"              "Iceland"             
##  [7] "Norway"               "Sweden"               "Portugal"            
## [10] "Slovenia"

Case Study: Age and Sex in US, by State

We saw before that in PR 0.0225 percent of the population is 100 years old or older. How does that compare to the other states?

The data set agesexUS has the the numbers for all states. The total population in 2000 was

attach(agesexUS)
colnames(agesexUS)

##   [1] "State"          "Total"          "M.Total"        "M.less.than.1" 
##   [5] "M..1"           "M..2"           "M..3"           "M..4"          
##   [9] "M..5"           "M..6"           "M..7"           "M..8"          
##  [13] "M..9"           "M..10"          "M..11"          "M..12"         
##  [17] "M..13"          "M..14"          "M..15"          "M..16"         
##  [21] "M..17"          "M..18"          "M..19"          "M..20"         
##  [25] "M..21"          "M..22"          "M..23"          "M..24"         
##  [29] "M..25"          "M..26"          "M..27"          "M..28"         
##  [33] "M..29"          "M..30"          "M..31"          "M..32"         
##  [37] "M..33"          "M..34"          "M..35"          "M..36"         
##  [41] "M..37"          "M..38"          "M..39"          "M..40"         
##  [45] "M..41"          "M..42"          "M..43"          "M..44"         
##  [49] "M..45"          "M..46"          "M..47"          "M..48"         
##  [53] "M..49"          "M..50"          "M..51"          "M..52"         
##  [57] "M..53"          "M..54"          "M..55"          "M..56"         
##  [61] "M..57"          "M..58"          "M..59"          "M..60"         
##  [65] "M..61"          "M..62"          "M..63"          "M..64"         
##  [69] "M..65"          "M..66"          "M..67"          "M..68"         
##  [73] "M..69"          "M..70"          "M..71"          "M..72"         
##  [77] "M..73"          "M..74"          "M..75"          "M..76"         
##  [81] "M..77"          "M..78"          "M..79"          "M..80"         
##  [85] "M..81"          "M..82"          "M..83"          "M..84"         
##  [89] "M..85"          "M..86"          "M..87"          "M..88"         
##  [93] "M..89"          "M..90"          "M..91"          "M..92"         
##  [97] "M..93"          "M..94"          "M..95"          "M..96"         
## [101] "M..97"          "M..98"          "M..99"          "M.100.104"     
## [105] "M.105.109"      "M.110.and.over" "F.Total"        "F.less.than.1" 
## [109] "F..1"           "F..2"           "F..3"           "F..4"          
## [113] "F..5"           "F..6"           "F..7"           "F..8"          
## [117] "F..9"           "F..10"          "F..11"          "F..12"         
## [121] "F..13"          "F..14"          "F..15"          "F..16"         
## [125] "F..17"          "F..18"          "F..19"          "F..20"         
## [129] "F..21"          "F..22"          "F..23"          "F..24"         
## [133] "F..25"          "F..26"          "F..27"          "F..28"         
## [137] "F..29"          "F..30"          "F..31"          "F..32"         
## [141] "F..33"          "F..34"          "F..35"          "F..36"         
## [145] "F..37"          "F..38"          "F..39"          "F..40"         
## [149] "F..41"          "F..42"          "F..43"          "F..44"         
## [153] "F..45"          "F..46"          "F..47"          "F..48"         
## [157] "F..49"          "F..50"          "F..51"          "F..52"         
## [161] "F..53"          "F..54"          "F..55"          "F..56"         
## [165] "F..57"          "F..58"          "F..59"          "F..60"         
## [169] "F..61"          "F..62"          "F..63"          "F..64"         
## [173] "F..65"          "F..66"          "F..67"          "F..68"         
## [177] "F..69"          "F..70"          "F..71"          "F..72"         
## [181] "F..73"          "F..74"          "F..75"          "F..76"         
## [185] "F..77"          "F..78"          "F..79"          "F..80"         
## [189] "F..81"          "F..82"          "F..83"          "F..84"         
## [193] "F..85"          "F..86"          "F..87"          "F..88"         
## [197] "F..89"          "F..90"          "F..91"          "F..92"         
## [201] "F..93"          "F..94"          "F..95"          "F..96"         
## [205] "F..97"          "F..98"          "F..99"          "F.100.104"     
## [209] "F.105.109"      "F.110.and.over"

so we see there is a column called Total in column 2, which give us the population for each state. The columns with the people 100 and older are columns number 104, 105 106 (males) and 208, 209 and 210 (female), so

old <- agesexUS[, 104] + agesexUS[, 105] +
       agesexUS[, 106] + agesexUS[, 208] +
       agesexUS[, 209] + agesexUS[, 210]
percentage <- old/Total*100
names(percentage) <- State
sum(percentage<0.0225)

## [1] 40

so 40 of the 50 states have a lower percentage.