You can get a free version of R for your computer from a number of sources, for example here, Click on you operating system and choose base.
Next you should install a nice fronend for R called RStudio
Once you have started a session the first thing you see is some text, and then the > sign. This is the R prompt, it means R is waiting for you to do something. Sometimes the prompt changes to a different symbol, as we will see.
R has a nice recall feature, use the up and down arrows. Also, typing history() shows you the most recent things entered.
R has a decent help facility, type help() . If you need help on a specific command use it as an argument, for example help(mean) or ?mean. Many functions have examples, type example(mean)
R is case-sensitive, so a and A are two different things.
Often during a session you create objects that you need only for a short time. When you no longer need them use rm() to get rid of them
Example:
x=1:10
sum(x^2)
rm(x)
R has a number of data structures:
the most basic structure for a simple set of numbers. Say we want the numbers 1.5, 3.6, 5.1 and 4.0 in an R vector called x, then we can type x=c(1.5,3.6,5.1,4.0) “c” stands for concatenate, meaning “put together”
There are various ways to generate a vector, here are some examples:
x=1:10
x=10:1
x=1:20*2
x=c(1:10,1:10*2)
Sometimes you need prentices:
n=10
1:n-1
1:(n-1)
The rep (“repeat”) command is very useful:
rep(1,10)
rep(1:3,10)
rep(1:3,rep(3,3))
To find out how many elements a vector has use the length command
The elements of a vector are accessed with the bracket notation:
x=1:10*5
x[3]
x[1:3]
x[c(1,3,8)]
x[-3]
x[-c(1,2,5)]
Instead of numbers a vector can also consist of characters (letters, numbers, symbols etc.) These are identified by quotes:
x=c(“A”,“B”,“7”,“%”)
A vector is either numeric or character, but never both. You can turn one into the other (if possible) as follows:
x=1:10
as.character(x)
x=c(“1”,“5”)
as.numeric(x)
R does a lot of automatic type conversion. Say you type
x=c(“A”,“B”,6:10,“%”)
Now this is a mix of numbers and characters. The numbers can be turned into characters but not vice versa, so the result is a character vector:
“A” “B” “6” “7” “8” “9” “10” “%”
Automatic type conversion is very useful but also dangerous, sometimes R turns an object into a different type, and you don’t notice, still looking for the old type.
The elements of a vector can have names:
Say we want the ages of three people in a vector called Age, together with their names:
Age=c(37,25,29)
names(Age)=c(“Joe”,“Jack”,“Jim”)
If a vector has names, these can be used to access the info:
Age[“Jack”]
The elements of a vector can also be logical values, TRUE or FALSE:
x=c(TRUE, TRUE, FALSE, TRUE)
Often these are generated by conditions:
x=c(1,5,4,3,7,6)
x<5
These can then be used to select parts of a vector:
x[x<5]
These conditions can be rather complex, with AND (&) and OR (|)
x=c(2,3,7,5,4,7,6,5,2,5,10)
x[x>4 & x<=6]
x[x>6 | x<=3]
Another special type is NA, indicating a missing value
x=c(28,21,24,NA,23)
This can pose a problem because some functions don’t know what to do with missing values, for example mean(x) returns an error message. One way to deal with that is to type mean(x[!is.na(x)]). Note what this does:
is.na(x) tests each value of x, if it is NA it yields TRUE, otherwise FALSE
!is.na(x) turns the TRUE-FALSE values around
x[!is.na(x)] removes all the x where x is NA
Sofar the expression on the left of the = sign always was a simple item, for example y=x[!is.na(x)], but this need not be so. For example, say we want to replace the NA’s with 0, then use
x[is.na(x)]=0
say we want the following dataset in R, as a matrix:
Jack | Jim | |
---|---|---|
Age | 25 | 29 |
Income | 45000 | 32000 |
IQ | 105 | 98 |
x <- cbind(c(25,45000,105),c(29,32000,98))
dimnames(x) <- list(c("Age","Income","IQ"),c("Jack","Jim"))
x
## Jack Jim
## Age 25 29
## Income 45000 32000
## IQ 105 98
cbind stands for “column bind”, of course there is also an rbind
rbind(1:10,1:10*2,1:10*3)
To find out the dimensions of a matrix use the dim command
Elements of a matrix are accessed with the [,] notation:
x=cbind(1:10,101:110)
x[1,2]
x[,1]
x[2,]
x[3:5,]
x[3:5,1]
Notice that if the result is either just one column or just one row the data structure changes to a vector
A matrix can also be generated with the following commands:
matrix(0,2,5)
diag(5)
The entries of a matrix can also be characters:
x=cbind(c(“A”,“B”,“C”),1:3)
A very useful command for matrices is “apply”. Say we have the following matrix
x=matrix(1:40,4,10)
and we want to find the sums of the columns. We could type
sum(x[,1])
sum(x[,2])
etc, but much better is
apply(x,2,sum)
R has just about all standard linear algebra methods implemented. For example say we want to solve the following linear system:
2x+3y+4z=9
x+y+z=1
x+3y=5
Do this:
x=c(2,1,1)
y=(3,1,3)
z=(4,1,0)
A=cbind(x,y,z)
y=c(9,1,5)
solve(A,y)
solve(A) finds the inverse matrix of A
Often we have a mix of data types, numeric and character. Say we have the following dataset:
Jack | Jim | |
---|---|---|
Age | 25 | 29 |
Job | Teacher | Baker |
Married | Yes | No |
For this we can use a dataframe:
x <- data.frame(Age=c(25,29),
Job=c("Teacher","Baker"),
Married=c("Yes","No"))
dimnames(x)[[1]] <- c("Jack","Joe")
dataframes can be used almost like matrices. In addition we have the [[ ]] notation:
x[[1]]
Say we want to work for a while with this data. Then i can type
attach(x)
after which Age and Job are directly available
Age[1]
When we are done with x we can
detach(x)
As we will see many of the routines that work with a specific dataframe start with attach() and end with detach(). Unfortunately when the routine crashes in between it never gets to the detach, and the next time you run it the dataframe gets attached again, And again .. You can check if there are mutliplte copies by typing
search()
and if the same dataframe shows up you can detach them one at a time
search is also useful if you want to get a listing of files in one of the packages. Say i want to know what’s in the library MASS. search lists MASS in position 3, so
ls(3)
gets us this listing.
dataframes will be our basic data structure in this class.
The main restriction on data frames is the same as for matrices, they have to be rectangular. If this is not so we can use the last of the data structures:
Say we have the following dataset: The blood pressure of three patients on their visits to a doctor:
Jack: 110 112 127
Jim: 105 119
Joe: 132 135 125 132 135
To get this into R do the following:
BloodPressure=list(Jack=c(110,112,127),Jim=c(105,119),Joe=c(132,135,125,132,135))
list elements can be accest in a number of ways:
BloodPressure[[1]]
BloodPressure\(Jack BloodPressure\)Jack[2:3]
There is a version of the apply command for lists. Say we want to find the mean blood pressure of the three men:
lapply(BloodPressure,mean)
Notice that this results again in a list. If we want to turn it into a vector we can do this
as.numeric(lapply(BloodPressure,mean))
The above are the most important, but not the only data types. There are many specialized ones, you can even create you own!
Data can be entered as seen above from the keyboard. In addition data can be read from a file and copy-pasted from other applications:
Say you have a file called “data.dat” with a set of numbers in the directory c:\mystuff. Read it into R with the command
x=scan(“c:\\mystuff\\data.dat”)
Notice the double slashes for the Windows directory
Alternatively you can open the application use the mouse to highlight the data, type CTRL-C, switch to R and type
x=scan(“clipboard”)
Say the data consist of several columns of data, and you want to import it as a data frame, then use
read.table(“c:\\mystuff\\data.dat”)
To write data to a file we have the commands write and write.table and dump
R is a vector based language. Generally whatever you can do with numbers you can do with a vector, and it is then done with each element of the vector.
4.7, and we want to add 0.5 to each element. In most programing languages this would require a loop over the elements, one by one. In R all we need is
y=x+0.5
Similarly we can do any arithmatic we want:
y=2*x-1
y=x2+x3
y=1/x
Even more, ideally every function in R is written so that its arguments are vectors:
y=log(x)
y=exp(-x)+exp(x)
Some of this extends to matrices as well:
x=matrix(1:30,3,10)
x^2
log(x)
but this is not as general and does not always work.
As we already said, R is also a progamming language with all the usual parts. First throughout this page let x=c(1,2,5,3,4,7,3,4,6,8,5)
The most common loop is a “for” loop:
y=rep(0,5)
for(i in 1:5) y[i]=mean(sample(x,3))
If a number of calculations should be done within a loop use { }:
y=matrix(0,5,2)
for(i in 1:5) {
y[i,1]=mean(sample(x,1))
y[i,2]=mean(sample(x,3))
}
Loops can be nested:
for(i in 1:5) {
for(j in 1:2) y[i,j]=mean(sample(x,c(1,3)[j]))
}
performs branched excecution:
for(i in 1:11) {
if(x[i]<3) {
x[i]=x[i]^2
}
else {
x[i]=sqrt(x[i])
}
}
An extension of an if-else is the case statement
Sometimes the above can be simplified:
y=ifelse(x<3,0,x)
Writing your won functions is a very important part of using R, for example to do simulations and also to do data analysis. First you should install a good ASCII editor on you computer, for example notepad2.exe, a very nice editor for programming. In R you need to type the following sequence so that R knows you want to use this editor:
options(editor=“C:\\R\\notepad2.exe”)
(change the path to the directory where notepad2 is)
Now when you type fix(myfun) R will open this editor and you can write your function. When you are done type ALT-f-s (to save) and Alt-f-x (to exit the editor). If your programhas a syntax mistake R will give you an error message (with the line where the mistake is). Type myfun=edit() to fix the mistake.
As an illustration let’s write a little function that calculates the five-number summary of a vector:
function (x)
{
y=rep(0,5)
names(y)=c(“Min”,“Q1”,“Median”,“Q3”,“Max”)
y[c(1,5)]=c(min(x),max(x))
y[c(2,4)]=quantile(x,c(0.25,0.75))
y[3]=median(x)
y
}