Everything in R is an object. All objects have two intrinsic attributes: mode and length. The mode is the basic type of the elements of the object. There are four main modes:

Other modes exist but they do not represent data, for instance function or expression. The length is the number of elements of the object. To display the mode and the length of an object use the functions mode and length, respectively:

x <- 1; mode(x)
## [1] "numeric"
y <- "A"; mode(y)
## [1] "character"
z <- TRUE; mode(z)
## [1] "logical"

R is quite different from most computer languages in that it often tries to figure out what you might want to do, even if it is not obvious. For example R can handle some strange calculations:

1/0
## [1] Inf
0/0
## [1] NaN

Here Inf is of course infinite, and NaN stands for not a number. These can be used in calculations:

exp(-Inf)
## [1] 0

Numeric comes in two forms, integer and double. If you want to make sure an object is an integer use

n <- 2L
is.integer(n)
## [1] TRUE

Years ago this was very useful because integers require much less storage space. These days with gigabyte sized memory it is rarely needed.

R can also handle complex numbers:

z <- 1i
u <- 1+1i
v <- 1-1i
z^2
## [1] -1+0i
u+v
## [1] 2+0i
u*v
## [1] 2+0i

The real and the imaginary parts are chosen with

Re(v)
## [1] 1
Im(v)
## [1] -1

Two other standard functions for complex numbers are

\[ \begin{aligned} &z = x+iy\\ &\bar{z} = x-iy\\ \end{aligned} \]

v
## [1] 1-1i
Conj(v)
## [1] 1+1i

\[ \begin{aligned} &z = x+iy\\ &\text{Modulus} = \sqrt{x^2+y^2}\\ \end{aligned} \]

Mod(1+1i)
## [1] 1.414214

Objects of type character are identified with quotes:

y <- "A"

sometimes you want the " to be treated as a character. This can be done with the escape character \:

"color=\"red\""
## [1] "color=\"red\""

Vectors

the basic data unit of R is a vector. One can create a vector with the combine command:

x <- c(3, 5, 6, 3, 4, 5)
x
## [1] 3 5 6 3 4 5

If you want a vector of characters again use quotes:

x <- c("A", "A", "B", "C")
x
## [1] "A" "A" "B" "C"

for logical:

x <- c(FALSE, FALSE, TRUE)
x
## [1] FALSE FALSE  TRUE

note that there are no quotes. “FALSE” would be the word FALSE, not the logical value.

Note: this also works:

x <- c(F, F, T)
x
## [1] FALSE FALSE  TRUE

but I recommend writting FALSE and TRUE because sometimes F and T are used for other things (F=Female)

the symbol R uses for missing values is NA (not available). Again, no quotes:

x <- c(3, 5, NA, 3, 4, 5)
x
## [1]  3  5 NA  3  4  5

Sometimes you want to create an object without any value:

x <- NULL
x
## NULL
c(x, 1)
## [1] 1

and note, the NULL is gone! This is useful when we are building up a vector (maybe inside a function), but we don’t know ahead of time how large it will be.


Finally, dates and times are always tricky:

Sys.time()
## [1] "2018-08-24 13:50:06 -04"
Sys.Date()
## [1] "2018-08-24"

Type Conversion

Consider the following:

x <- c(3, 5, 6, 3, "A", 5)
x
## [1] "3" "5" "6" "3" "A" "5"

in this case the vector is a mixture of numeric and character. But R vectors can never be such a mixture, so R (by itself!) decides to make it a character vector. This is called type conversion, and R does a lot of this, usually in a good way.

There are a number of routines that

  1. test for a data type
  2. convert to a data type

they either start with is. or with as.:

x <- c(3, 5, NA, 3, 4, 5)
is.numeric(x)
## [1] TRUE
y <- c(2, 1, 5, 2)
y
## [1] 2 1 5 2
x <- as.character(y)
x
## [1] "2" "1" "5" "2"
as.numeric(x)
## [1] 2 1 5 2
x <- c("2", "1", "#", "2", "A")
as.numeric(x)
## [1]  2  1 NA  2 NA

Consider this:

x <- c(1, 2, 5, FALSE, 4, TRUE)
x
## [1] 1 2 5 0 4 1
as.character(x)
## [1] "1" "2" "5" "0" "4" "1"

so FALSE gets turned into 0, TRUE into 1.

But also

x <- c("1", "2" ,"5", FALSE, "4", TRUE)
x
## [1] "1"     "2"     "5"     "FALSE" "4"     "TRUE"
as.numeric(x)
## [1]  1  2  5 NA  4 NA

Here FALSE gets first turned into a character, and then stays as such.

R has almost 100 is. and as. functions built in!

Exercise

Before running these in R, try and think about the answer:

What is the result of

c(1, FALSE)
c("A", FALSE)
c(1L, FALSE)
-1 < FALSE
1 == "1"

Dates:

The default format is yyyy-mm-dd:

mydates <- as.Date(c("2018-01-01", "2018-06-13"))
mydates
## [1] "2018-01-01" "2018-06-13"
mydates[2]-mydates[1]
## Time difference of 163 days

Factor

a very common data type in Statistics is a factor. These are vectors with a fixed number of different values (called levels) and possibly an ordering.

Here is an example of their usage. Say we have a list of students, identified by their year:

students
##  [1] "Junior"    "Sophomore" "Sophomore" "Senior"    "Junior"   
##  [6] "Senior"    "Senior"    "Sophomore" "Sophomore" "Junior"

Let’s count how many of each we have:

table(students)
## students
##    Junior    Senior Sophomore 
##         3         3         4

there are two problems with this table:

  1. the ordering is wrong
  2. the Freshman class is missing.

We can fix both of these by turning the vector into a factor:

students.fac <- factor(students, 
  levels=c("Freshman", "Junior", "Sophomore", "Senior"),
  ordered=TRUE)
table(students.fac)
## students.fac
##  Freshman    Junior Sophomore    Senior 
##         0         3         4         3

Here is another difference:

c(students, "Senior")
##  [1] "Junior"    "Sophomore" "Sophomore" "Senior"    "Junior"   
##  [6] "Senior"    "Senior"    "Sophomore" "Sophomore" "Junior"   
## [11] "Senior"
c(students, "Graduate")
##  [1] "Junior"    "Sophomore" "Sophomore" "Senior"    "Junior"   
##  [6] "Senior"    "Senior"    "Sophomore" "Sophomore" "Junior"   
## [11] "Graduate"
c(students.fac, "Senior")
##  [1] "2"      "3"      "3"      "4"      "2"      "4"      "4"     
##  [8] "3"      "3"      "2"      "Senior"
c(students.fac, "Graduate")
##  [1] "2"        "3"        "3"        "4"        "2"        "4"       
##  [7] "4"        "3"        "3"        "2"        "Graduate"

so we can easily add an element to students, but if we do the same with student.fac it gets very confused! Strangely enough, it doesn’t even work when the added item is in the list of levels!

Notice also that with c(students, “Senior”) we don’t get the list of students but a list of numbers (as characters) plus “Senior”. This is because internally R stores factors as integers, but when we add “Senior” these get converted to character.

Here is what you can do:

lvls <- levels(students.fac)
x <- factor(c(as.character(students.fac), "Senior", "Graduate"),
     levels=c(lvls, "Graduate"), ordered=TRUE)
x
##  [1] Junior    Sophomore Sophomore Senior    Junior    Senior    Senior   
##  [8] Sophomore Sophomore Junior    Senior    Graduate 
## Levels: Freshman < Junior < Sophomore < Senior < Graduate
table(x)
## x
##  Freshman    Junior Sophomore    Senior  Graduate 
##         0         3         4         4         1

Exercise

ltrs <-  c("a", "b", "b", "c", "c", "c")
f1 <- factor(ltrs)
f1
## [1] a b b c c c
## Levels: a b c
levels(f1) <- c("c", "b", "a")

What does f1 now look like?