Everything in R is an object. All objects have two intrinsic attributes: mode and length. The mode is the basic type of the elements of the object. There are four main modes:
Other modes exist but they do not represent data, for instance function or expression. The length is the number of elements of the object. To display the mode and the length of an object use the functions mode and length, respectively:
x <- 1; mode(x)
## [1] "numeric"
y <- "A"; mode(y)
## [1] "character"
z <- TRUE; mode(z)
## [1] "logical"
R is quite different from most computer languages in that it often tries to figure out what you might want to do, even if it is not obvious. For example R can handle some strange calculations:
1/0
## [1] Inf
0/0
## [1] NaN
Here Inf is of course infinite, and NaN stands for not a number. These can be used in calculations:
exp(-Inf)
## [1] 0
Numeric comes in two forms, integer and double. If you want to make sure an object is an integer use
n <- 2L
is.integer(n)
## [1] TRUE
Years ago this was very useful because integers require much less storage space. These days with gigabyte sized memory it is rarely needed.
R can also handle complex numbers:
z <- 1i
u <- 1+1i
v <- 1-1i
z^2
## [1] -1+0i
u+v
## [1] 2+0i
u*v
## [1] 2+0i
The real and the imaginary parts are chosen with
Re(v)
## [1] 1
Im(v)
## [1] -1
Two other standard functions for complex numbers are
\[ \begin{aligned} &z = x+iy\\ &\bar{z} = x-iy\\ \end{aligned} \]
v
## [1] 1-1i
Conj(v)
## [1] 1+1i
\[ \begin{aligned} &z = x+iy\\ &\text{Modulus} = \sqrt{x^2+y^2}\\ \end{aligned} \]
Mod(1+1i)
## [1] 1.414214
Objects of type character are identified with quotes:
y <- "A"
sometimes you want the " to be treated as a character. This can be done with the escape character \:
"color=\"red\""
## [1] "color=\"red\""
the basic data unit of R is a vector. One can create a vector with the combine command:
x <- c(3, 5, 6, 3, 4, 5)
x
## [1] 3 5 6 3 4 5
If you want a vector of characters again use quotes:
x <- c("A", "A", "B", "C")
x
## [1] "A" "A" "B" "C"
for logical:
x <- c(FALSE, FALSE, TRUE)
x
## [1] FALSE FALSE TRUE
note that there are no quotes. “FALSE” would be the word FALSE, not the logical value.
Note: this also works:
x <- c(F, F, T)
x
## [1] FALSE FALSE TRUE
but I recommend writting FALSE and TRUE because sometimes F and T are used for other things (F=Female)
the symbol R uses for missing values is NA (not available). Again, no quotes:
x <- c(3, 5, NA, 3, 4, 5)
x
## [1] 3 5 NA 3 4 5
Sometimes you want to create an object without any value:
x <- NULL
x
## NULL
c(x, 1)
## [1] 1
and note, the NULL is gone! This is useful when we are building up a vector (maybe inside a function), but we don’t know ahead of time how large it will be.
Finally, dates and times are always tricky:
Sys.time()
## [1] "2018-08-24 13:50:06 -04"
Sys.Date()
## [1] "2018-08-24"
Consider the following:
x <- c(3, 5, 6, 3, "A", 5)
x
## [1] "3" "5" "6" "3" "A" "5"
in this case the vector is a mixture of numeric and character. But R vectors can never be such a mixture, so R (by itself!) decides to make it a character vector. This is called type conversion, and R does a lot of this, usually in a good way.
There are a number of routines that
they either start with is. or with as.:
x <- c(3, 5, NA, 3, 4, 5)
is.numeric(x)
## [1] TRUE
y <- c(2, 1, 5, 2)
y
## [1] 2 1 5 2
x <- as.character(y)
x
## [1] "2" "1" "5" "2"
as.numeric(x)
## [1] 2 1 5 2
x <- c("2", "1", "#", "2", "A")
as.numeric(x)
## [1] 2 1 NA 2 NA
Consider this:
x <- c(1, 2, 5, FALSE, 4, TRUE)
x
## [1] 1 2 5 0 4 1
as.character(x)
## [1] "1" "2" "5" "0" "4" "1"
so FALSE gets turned into 0, TRUE into 1.
But also
x <- c("1", "2" ,"5", FALSE, "4", TRUE)
x
## [1] "1" "2" "5" "FALSE" "4" "TRUE"
as.numeric(x)
## [1] 1 2 5 NA 4 NA
Here FALSE gets first turned into a character, and then stays as such.
R has almost 100 is. and as. functions built in!
Exercise
Before running these in R, try and think about the answer:
What is the result of
c(1, FALSE)
c("A", FALSE)
c(1L, FALSE)
-1 < FALSE
1 == "1"
The default format is yyyy-mm-dd:
mydates <- as.Date(c("2018-01-01", "2018-06-13"))
mydates
## [1] "2018-01-01" "2018-06-13"
mydates[2]-mydates[1]
## Time difference of 163 days
a very common data type in Statistics is a factor. These are vectors with a fixed number of different values (called levels) and possibly an ordering.
Here is an example of their usage. Say we have a list of students, identified by their year:
students
## [1] "Junior" "Sophomore" "Sophomore" "Senior" "Junior"
## [6] "Senior" "Senior" "Sophomore" "Sophomore" "Junior"
Let’s count how many of each we have:
table(students)
## students
## Junior Senior Sophomore
## 3 3 4
there are two problems with this table:
We can fix both of these by turning the vector into a factor:
students.fac <- factor(students,
levels=c("Freshman", "Junior", "Sophomore", "Senior"),
ordered=TRUE)
table(students.fac)
## students.fac
## Freshman Junior Sophomore Senior
## 0 3 4 3
Here is another difference:
c(students, "Senior")
## [1] "Junior" "Sophomore" "Sophomore" "Senior" "Junior"
## [6] "Senior" "Senior" "Sophomore" "Sophomore" "Junior"
## [11] "Senior"
c(students, "Graduate")
## [1] "Junior" "Sophomore" "Sophomore" "Senior" "Junior"
## [6] "Senior" "Senior" "Sophomore" "Sophomore" "Junior"
## [11] "Graduate"
c(students.fac, "Senior")
## [1] "2" "3" "3" "4" "2" "4" "4"
## [8] "3" "3" "2" "Senior"
c(students.fac, "Graduate")
## [1] "2" "3" "3" "4" "2" "4"
## [7] "4" "3" "3" "2" "Graduate"
so we can easily add an element to students, but if we do the same with student.fac it gets very confused! Strangely enough, it doesn’t even work when the added item is in the list of levels!
Notice also that with c(students, “Senior”) we don’t get the list of students but a list of numbers (as characters) plus “Senior”. This is because internally R stores factors as integers, but when we add “Senior” these get converted to character.
Here is what you can do:
lvls <- levels(students.fac)
x <- factor(c(as.character(students.fac), "Senior", "Graduate"),
levels=c(lvls, "Graduate"), ordered=TRUE)
x
## [1] Junior Sophomore Sophomore Senior Junior Senior Senior
## [8] Sophomore Sophomore Junior Senior Graduate
## Levels: Freshman < Junior < Sophomore < Senior < Graduate
table(x)
## x
## Freshman Junior Sophomore Senior Graduate
## 0 3 4 4 1
Exercise
ltrs <- c("a", "b", "b", "c", "c", "c")
f1 <- factor(ltrs)
f1
## [1] a b b c c c
## Levels: a b c
levels(f1) <- c("c", "b", "a")
What does f1 now look like?