You can get a free version of R for your computer from a number of sources, for example here, Click on you operating system and base. The download is about 30MB and setup is fully automatic.
If you are using a Windows machine one change you should make is to install R in a folder C:\R , not the C:\Program Files\R folder that is the default..
Next you need the file with the data for the class, download the file R6665.RData. Just double-click the file and it will open R with all the data in place. The first time out you will get some error messages about missing libraries, ignore this for now.
Once you have started a session the first thing you see is some text, and then the > sign. This is the R prompt, it means R is waiting for you to do something. Sometimes the prompt changes to a different symbol, as we will see.
ls() shows you a "listing" of the files (data, routines etc.)
One file that is not shown is .First. It is always executed at startup.
If you have worked for a while you might have things you need to save, do that with File > Save Worksheet. If you quit the program without saving your stuff everything you did will be lost. R has a somewhat unusual file system, everything belonging to the same project (data, routines graphs etc.) are stored in just one file, with the extension .RData. On my own computer I have a routine called sv() which reads
sv <-
function() {
a = substring(shell("dir", intern = T)[4], 15)
save.image(paste(a,"\\R6665.RData",sep=""))
}
This program figures out which folder you started R in and saves the file there. I run that everytime I did some work worth saving.
To quit R, type q() or click the x in the upper right corner. R will ask you whether you want to save the workspace, say yes if you have not done so already.
R has a nice recall feature, use the up and down arrows. Also, typing history() shows you the most recent things entered.
R has a decent help facility, type help() . If you need help on a specific command use it as an argument, for example help(mean) or ?mean. Many functions have examples, type example(mean)
R is case-sensitive, so a and A are two different things.
Often during a session you create objects that you need only for a short time. When you no longer need them use rm() to get rid of them
Example:
x=1:10
sum(x^2)
rm(x)
There are various ways to generate a vector, here are some examples:
x=1:10
x=10:1
x=1:20*2
x=c(1:10,1:10*2)
Sometimes you need prentices:
n=10
1:n-1
1:(n-1)
The rep ("repeat") command is very useful:
rep(1,10)
rep(1:3,10)
rep(1:3,rep(3,3))
To find out how many elements a vector has use the length command
The elements of a vector are accessed with the bracket notation:
x=1:10*5
x[3]
x[1:3]
x[c(1,3,8)]
x[-3]
x[-c(1,2,5)]
Instead of numbers a vector can also consist of characters (letters, numbers, symbols etc.) These are identified by quotes:
x=c("A","B","7","%")
A vector is either numeric or character, but never both. You can turn one into the other (if possible) as follows:
x=1:10
as.character(x)
x=c("1","5")
as.numeric(x)
R does a lot of automatic type conversion. Say you type
x=c("A","B",6:10,"%")
Now this is a mix of numbers and characters. The numbers can be turned into characters but not vice versa, so the result is a character vector:
"A" "B" "6" "7" "8" "9" "10" "%"
Automatic type conversion is very useful but also dangerous, sometimes R turns an object into a different type, and you don't notice, still looking for the old type.
The elements of a vector can have names:
Say we want the ages of three people in a vector called Age, together with their names:
Age=c(37,25,29)
names(Age)=c("Joe","Jack","Jim")
If a vector has names, these can be used to access the info:
Age["Jack"]
x=c(T,T,F,T,F)
Often these are generated by conditions:
x=c(1,5,4,3,7,6)
x<5
These can then be used to select parts of a vector:
x[x<5]
These conditions can be rather complex, with AND (&) and OR (|)
x=c(2,3,7,5,4,7,6,5,2,5,10)
x[x>4 & x<=6]
x[x>6 | x<=3]
Another special type is NA, indicating a missing value
x=c(28,21,24,NA,23)
This can pose a problem because some functions don't know what to do with missing values, for example mean(x) returns an error message. One way to deal with that is to type mean(x[!is.na(x)]). Note what this does:
is.na(x) tests each value of x, if it is NA it yields TRUE, otherwise FALSE
!is.na(x) turns the TRUE-FALSE values around
x[!is.na(x)] removes all the x where x is NA
Sofar the expression on the left of the = sign always was a simple item, for example y=x[!is.na(x)], but this need not be so. For example, say we want to replace the NA's with 0, then use
x[is.na(x)]=0
| Jack | Jim | |
|---|---|---|
| Age | 25 | 29 |
| Income | 45000 | 32000 |
| IQ | 105 | 98 |
x=cbind(c(25,45000,105),c(29,32000,98))
dimnames(x)=list(c("Age","Income","IQ"),c("Jack","Jim"))
cbind stands for "column bind", of course there is also an rbind
rbind(1:10,1:10*2,1:10*3)
To find out the dimensions of a matrix use the dim command
Elements of a matrix are accessed with the [,] notation:
x=cbind(1:10,101:110)
x[1,2]
x[,1]
x[2,]
x[3:5,]
x[3:5,1]
Notice that if the result is either just one column or just one row the data structure changes to a vector
A matrix can also be generated with the following commands:
matrix(0,2,5)
diag(5)
The entries of a matrix can also be characters:
x=cbind(c("A","B","C"),1:3)
A very useful command for matrices is "apply". Say we have the following matrix
x=matrix(1:40,4,10)
and we want to find the sums of the columns. We could type
sum(x[,1])
sum(x[,2])
etc, but much better is
apply(x,2,sum)
R has just about all standard linear algebra methods implemented. For example say we want to solve the following linear system:
2x+3y+4z=9
x+y+z=1
x+3y=5
Do this:
x=c(2,1,1)
y=(3,1,3)
z=(4,1,0)
A=cbind(x,y,z)
y=c(9,1,5)
solve(A,y)
solve(A)
finds the inverse matrix of A
| Jack | Jim | |
|---|---|---|
| Age | 25 | 29 |
| Job | Teacher | Baker |
| Married | Yes | No |
x=data.frame(Age=c(25,29),Job=c("Teacher","Baker"),Married=c("Yes","No"))
dimnames(x)[[1]]=c("Jack","Joe")
dataframes can be used almost like matrices. In addition we have the [[ ]] notation:
x[[1]]
Say we want to work for a while with this data. Then i can type
attach(x)
after which Age and Job are directly available
Age[1]
When we are done with x we can
detach(x)
As we will see many of the routines that work with a specific dataframe start with attach() and end with detach(). Unfortunately when the routine crashes in between it never gets to the detach, and the next time you runit the dataframe gets attached again, And again .. You can check if there are mutliplte copies bu typing
search()
and if the same dataframe shows up you can detach them one at a time
search is also useful if you want to get a listing of files in one of the packages. Say i want to know what's in the library MASS. search lists MASS in position 3, so
ls(3)
gets us this listing.
dataframes will be our basic data structure in this class.
The main restriction on data frames is the same as for matrices, they have to be rectangular. If this is not so we can use the last of the data structures:
Jack: 110 112 127
Jim: 105 119
Joe: 132 135 125 132 135
To get this into R do the following:
BloodPressure=list(Jack=c(110,112,127),Jim=c(105,119),Joe=c(132,135,125,132,135))
list elements can be accest in a number of ways:
BloodPressure[[1]]
BloodPressure$Jack
BloodPressure$Jack[2:3]
There is a version of the apply command for lists. Say we want to find the mean blood pressure of the three men:
lapply(BloodPressure,mean)
Notice that this results again in a list. If we want to turn it into a vector we can do this
as.numeric(lapply(BloodPressure,mean))
Say you have a file called "data.dat" with a set of numbers in the directory c:\mystuff. Read it into R with the command
x=scan("c:\\mystuff\\data.dat")
Notice the double slashes for the Windows directory
Alternatively you can open the application, use the mouse to highlight the data, type CTRL-C, switch to R and type
x=scan("clipboard")
Say the data consist of several columns of data, and you want to import it as a data frame, then use
read.table("c:\\mystuff\\data.dat")
To write data to a file we have the commands write, write.table and dump
Example Say we have the vector x with elements 1.2, 3.4, 5.1, 7.9, 4.7, and we want to add 0.5 to each element. In most programing languages this would require a loop over the elements, one by one. In R all we need is
y=x+0.5
Similarly we can do any arithmatic we want:
y=2*x-1
y=x^2+x^3
y=1/x
Even more, ideally every function in R is written so that its arguments are vectors:
y=log(x)
y=exp(-x)+exp(x)
Some of this extends to matrices as well:
x=matrix(1:30,3,10)
x^2
log(x)
but this is not as general and does not always work