The basic functions to display information on the screen are
options(digits=4)
x <- rnorm(5, 100, 30)
print(x)
## [1] 117.96 82.75 84.21 86.88 68.30
cat(x)
## 118 82.75 84.21 86.88 68.3
Both of these have certain advantages:
print(x, digits=6)
## [1] 117.9571 82.7533 84.2148 86.8807 68.3047
cat("The mean is ", round(mean(x), 1), "\n")
## The mean is 88
The “\n” (newline) is needed so that the cursor moves to the next line. This also sometimes referred as a carriage return.
Another advantage of cat is that one can have different rounding for different numbers:
x <- 1100; y <- 0.00123334
print(c(x, y), 4)
## [1] 1.100e+03 1.233e-03
cat(x, " ", round(y, 4), "\n")
## 1100 0.0012
Notice that in the case of print R switches to scientific notation. This default behavior can be changed with
options(scipen=999)
print(c(x, y), 4)
## [1] 1100.000000 0.001233
options(scipen=0)
print(c(x, y), 4)
## [1] 1.100e+03 1.233e-03
Some times you need a high level of control over the output, for example when writing data to a file that then will be read by a computer program that wants things just so. For this you can use the sprintf command.
sprintf("%f", pi)
## [1] "3.141593"
Here the f stands for floating point, the most common type. Also note that the result of a call to sprintf is a character vector.
Here are some variations:
sprintf("%.3f", pi) # everything before the ., 3 digits after
## [1] "3.142"
sprintf("%1.0f", pi) # 1 space, 0 after
## [1] "3"
sprintf("%5.1f", pi) # 5 spaces total, 1 after
## [1] " 3.1"
sprintf("%05.1f", pi) # same but fill with 0
## [1] "003.1"
sprintf("%+f", pi) # all with + in front
## [1] "+3.141593"
sprintf("% f", pi) # space in front
## [1] " 3.141593"
sprintf("%e", pi) # in scientific notation, small e
## [1] "3.141593e+00"
sprintf("%E", pi) # or large E
## [1] "3.141593E+00"
sprintf("%g", 1e6*pi)
## [1] "3.14159e+06"
Here is another example. In Statistics we often find a p value. These should generally be quoted to three digits. But when the p value is less than \(10^{-3}\) R uses scientific notation. If you want to avoid that do this
x <- 1100; pval <- 0.00123334
c(x, pval)
## [1] 1.100e+03 1.233e-03
sprintf("%.3f", c(x, pval))
## [1] "1100.000" "0.001"
Often the easiest thing to do is to use copy-paste the data and then simply scan it into R:
x <- scan("clipboard")
Note: if you are using a Mac you need to use
x <- scan(pipe("pbpaste"))
use the argument sep=“;” to change the symbol that is being used as a separator. The default is empty space, common cases include comma, semi-colon, and newline (\n)
scan assumes that the data is numeric, if not use the argument what=“char”.
I need to do this so often I wrote a little routine for it:
getx <- function(sep="") {
options(warn=-1) # It might give a warning, I don't care
x <- scan("clipboard", what="character", sep=sep)
# always read as character
if(all(!is.na(as.numeric(x)))) # are all elements numeric?
x <- as.numeric(x) # then make it numeric
options(warn=0) # reset warning
x
}
Notice some features:
the routine always reads the data as a character vector, whether it is character or numeric.
it then tries to turn it into numeric. If that works, fine, otherwise it stays character. This is done with as.numeric(x), which returns NA if it can’t turn an entry into numeric, so is.na(as.numeric(x)) returns TRUE if x can’t be made numeric.
when trying to turn a character into a number R prints a warning. This is good in general to warn you that you are doing something strange. Here, though, it is expected behavior and we don’t need the warning. The routine suppresses them by setting options(warn=-1), and setting it back to the default afterwards.
If the data is in a stand-alone file saved on your hard drive you can also read it from there:
x <- scan("c:/folder/file.R")
If it is a webpage use
x <- scan(url("http://somesite.html"))
Notice the use of / in writing folders. \ does not work on Windows because it is already used for other things, \\ would work but is more work to type!
scan has a lot of arguments:
args(scan)
## function (file = "", what = double(), nmax = -1L, n = -1L, sep = "",
## quote = if (identical(sep, "\n")) "" else "'\"", dec = ".",
## skip = 0L, nlines = 0L, na.strings = "NA", flush = FALSE,
## fill = FALSE, strip.white = FALSE, quiet = FALSE, blank.lines.skip = TRUE,
## multi.line = TRUE, comment.char = "", allowEscapes = FALSE,
## fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
## NULL
the most useful are
Example: A non-standard input format.
Consider the file at
x <- scan("http://academic.uprm.edu/wrolke/esma6835/sales.txt")
for(i in 1:5) cat(x[i],"\n")
152 278,11 202 998,05 89 060,44
1 803 360,69 24 608,24 49 004,89
812 679,54 186 289,80 95 946,42
171 266,69 208 691,32 28 503,93
56 646,34 41 287,10 15 483,96
these are sales data for some store. We want to find the mean sales amount. So we need to read the data into R, but there are some issues:
the data delimits the decimals European-style, using a comma instead of a period.
for easier readability the million and the thousand are separated by a space.
so the first number really is 152278.11.
How can we do this? To start we need to read the data as a single character string:
x <- paste0()
scan("http://academic.uprm.edu/wrolke/esma6835/sales.txt",
sep="\n"), collapse="")
Let’s see what we have, at least at the beginning:
substring(x, 1, 100)
## [1] " 152 278,11 202 998,05 89 060,44 1 803 360,69 24 608,24 49 004,89 812 679,"
Next we can replace the , with .:
x <- gsub(",", "\\.", x)
substring(x, 1, 100)
## [1] " 152 278.11 202 998.05 89 060.44 1 803 360.69 24 608.24 49 004.89 812 679."
notice the \\. This is needed because . is a special character in R, it actually needs to be escaped twice!
Next notice that the numbers are always separated by at least two spaces, so we can split them up with
x <- strsplit(x, " ")[[1]]
x[1:10]
## [1] "" "" "152 278.11" ""
## [5] " 202 998.05" "" "" "89 060.44"
## [9] "1 803 360.69" ""
Now we can remove any spaces:
x <- gsub(" ", "", x)
x[1:10]
## [1] "" "" "152278.11" "" "202998.05"
## [6] "" "" "89060.44" "1803360.69" ""
and get rid of the "":
x <- x[x!=""]
x[1:10]
## [1] "152278.11" "202998.05" "89060.44" "1803360.69" "24608.24"
## [6] "49004.89" "812679.54" "186289.80" "95946.42" "171266.69"
Almost done:
x <- as.numeric(x)
mean(x)
## [1] 198450
the standard command to read data from a table into a data frame is read.table.
x <- read.table("c:/folder/file.R")
it has many of the same arguments as scan (for example sep). It also has the argument header=FALSE. If your table has column names use header=TRUE. The same for row names.
Example:
say the following data is saved in a file named student.data.R:
ID | Age | GPA | Gender |
---|---|---|---|
63368 | 22 | 2.9 | Male |
75382 | 22 | 2.6 | Female |
43337 | 18 | 2.7 | Male |
56341 | 18 | 2.8 | Male |
43988 | 19 | 3.9 | Male |
47648 | 21 | 2.6 | Female |
10959 | 19 | 3.3 | Male |
57902 | 25 | 2.6 | Female |
48890 | 20 | 3.6 | Female |
18430 | 22 | 3.2 | Female |
Now we can use
read.table("c:/folder/student.data.R",
header=TRUE, row.names = 1)
the row.names=1 tells R to use the first column as row names.
Say you have a few data sets and routines you want to send to someone else. The easiest thing to do is use dump and source.
dump(c("data1", " data2", "fun1"), "c:/folder/mystuff.R")
Now to read in the stuff simply use
source("c:/folder/mystuff.R")
I often have to transfer stuff from one R project to another, so I wrote myself these two routines:
dp <- function (x) dump(x, "clipboard")
sc <- function () source("clipboard")
There are routines to read all sorts of file formats. The most important one is likely read.csv, which can read Excel files saved in the comma delimited format.
there are a number of packages written to help with data I/O. We will discuss some of them later.
R can also be used to create, copy, move and delete files and folders on your hard drive. The routines are
dir.create(…)
dir.exists(…)
file.create(…)
file.exists(…)
file.remove(…)
file.rename(from, to)
file.append(file1, file2)
file.copy(from, to)
You can also get a listing of the files in a folder:
head(dir("c:/R"))
## [1] "bin" "CHANGES" "COPYING" "doc" "etc" "include"
for the folder from which R started use
head(dir(getwd()))
## [1] "_book" "_bookdown.yml" "_bookdown_files"
## [4] "_main.Rmd" "_main_files" "Additional Material"