WARNING: In what follows I will only discuss a FEW of the issues involved with environments, and I will simplify them greatly. For a much more detailed discussion see http://adv-r.had.co.nz/Environments.html.
Let’s start with this:
search()
## [1] ".GlobalEnv" "package:knitr" "package:stats"
## [4] "package:graphics" "package:grDevices" "package:utils"
## [7] "package:datasets" ".MyEnv" "package:moodleR"
## [10] "package:wolfr" "package:shiny" "package:Rcpp"
## [13] "package:grid" "package:ggplot2" "package:methods"
## [16] "Autoloads" "package:base"
These are the environments currently loaded into my R. In some ways you can think of this as a folder tree, like:
paste0(search(), "/", collapse="")
## [1] ".GlobalEnv/package:knitr/package:stats/package:graphics/package:grDevices/package:utils/package:datasets/.MyEnv/package:moodleR/package:wolfr/package:shiny/package:Rcpp/package:grid/package:ggplot2/package:methods/Autoloads/package:base/"
This has the following effect. Say you type
x <- runif(10)
mean(x)
## [1] 0.4867
What has R just done? First it created an object called “x”, and stored it in the folder “.GlobalEnv”. We can check:
ls()
## [1] "x"
Next R starts looking for an object called mean. To do that it again first looks into “.GlobalEnv”, but we already know it is not there.
Next R looks into “package:knitr”, which we can do with
ls(2, pattern="mean")
## character(0)
and again no luck. This continues until we get to “package:base”:
ls(17, pattern="mean")
## [1] "mean" "mean.Date" "mean.default" "mean.difftime"
## [5] "mean.POSIXct" "mean.POSIXlt"
and there it is!
If an object is not in any of these environments it will give an error:
ddgdg
## Error in eval(expr, envir, enclos): object 'ddgdg' not found
This makes it clear that a routine that is part of a library can only be found if that library is loaded.
One difference between a folder tree and this is that R starts looking at the top (in .GobalEnv) and then works its way down.
There is an easy way to find out in which environment an object is located, with the routine where in the library pryr:
library(pryr)
where("x")
## <environment: R_GlobalEnv>
where("mean")
## <environment: base>
Another important consequence of this is that R stops when it finds an object, even if the one you want is in a later environment. Here is an example:
my.data <- data.frame(x=1:10)
attach(my.data)
## The following object is masked _by_ .GlobalEnv:
##
## x
mean(x)
## [1] 0.4867
rm(x)
mean(x)
## [1] 5.5
search()[1:3]
## [1] ".GlobalEnv" "my.data" "package:pryr"
So here is what happens:
Notice that R gives a warning when we attach the data frame, telling us that there are now two x’s.
The rules that R uses to find things are called scoping rules.
Let’s clean up before we continue:
detach(2)
How does this work when we run a function? To find out we can write a little function:
show.env <- function(){
x <- 1
print(list(ran.in=environment(),
parent=parent.env(environment()),
objects=ls.str(environment())))
}
show.env()
## $ran.in
## <environment: 0x0000000018b40920>
##
## $parent
## <environment: R_GlobalEnv>
##
## $objects
## x : num 1
this tells us that R ran the function in an environment with a very strange name, which usually means it was created randomly. We can also see that its parent environment was .GlobalEnv and that x is an object in it.
This means that any object created inside a function is only known there, it does not overwrite any objects outside the function. One consequence is that if we need to create some temporary objects we can use simple names like x or i, even if these already exist outside of the function.
Now where does show.env live?
environment(show.env)
## <environment: R_GlobalEnv>
Obvious, because that is where we created it!
How about a function inside a function?
show.env <- function() {
f <- function(){
print(list(ran.in=environment(),
parent=parent.env(environment()),
objects=ls.str(environment())))
}
f()
x <- 1
print(list(ran.in=environment(),
parent=parent.env(environment()),
objects=ls.str(environment())))
}
show.env()
## $ran.in
## <environment: 0x0000000017139948>
##
## $parent
## <environment: 0x0000000017139830>
##
## $objects
##
## $ran.in
## <environment: 0x0000000017139830>
##
## $parent
## <environment: R_GlobalEnv>
##
## $objects
## f : function ()
## x : num 1
As we expect, the parent environment of f is the runtime environment of show.env.
Sometimes we want to save an object created inside a function to the global environment:
f <- function() {
a<-1
assign("a", a, envir=.GlobalEnv)
}
ls()
## [1] "f" "my.data" "show.env"
f()
ls()
## [1] "a" "f" "my.data" "show.env"
One place where this is useful is if you have a routine like a simulation that runs for a long time and you want to save intermediate results.
As we just saw, environments can come about by loading libraries, by attaching data frames (also lists) and (at least for a short while) by running a function. In fact we can also make our own:
test_env <- new.env()
attach(test_env)
search()[1:3]
## [1] ".GlobalEnv" "test_env" "package:pryr"
Now we can add stuff to our environment using the list notation:
test_env$a <- 1
test_env$fun <- function(x) x^2
ls(2)
## character(0)
Where are a and fun? Ops, we forgot to attach test_env:
attach(test_env)
ls(2)
## [1] "a" "fun"
search()[1:3]
## [1] ".GlobalEnv" "test_env" "test_env"
note that we had to attach the environment again for the two new objects to be useful, but now we have two of them. It would be better if we detached it first.
Actually, let’s detach it completely
detach(2)
search()
## [1] ".GlobalEnv" "package:pryr" "package:knitr"
## [4] "package:stats" "package:graphics" "package:grDevices"
## [7] "package:utils" "package:datasets" ".MyEnv"
## [10] "package:moodleR" "package:wolfr" "package:shiny"
## [13] "package:Rcpp" "package:grid" "package:ggplot2"
## [16] "package:methods" "Autoloads" "package:base"
Why would you want to make a new environment? I have one called .MyEnv that is created at startup. It has a set of small functions that I like to have available at all times but I don’t want to “see” them when I run ls().
ls(".MyEnv")
## [1] "dp" "h" "hh" "ht" "ip" "mcat" "s" "sc" "sr" "trw"
If an object is part of a package that is installed on your computer you can also use it without loading the package with the :: operator. As an example consider the package mailR, which has the function send.mail to send emails from within R:
args(mailR::send.mail)
## function (from, to, subject = "", body = "", encoding = "iso-8859-1",
## html = FALSE, inline = FALSE, smtp = list(), authenticate = FALSE,
## send = TRUE, attach.files = NULL, debug = FALSE, ...)
## NULL
Some R texts suggest to avoid using attach at all, and to always use ::. The reason is that what works on your computer with its specific setup may not work on someone elses. My preference is to use :: if I use a function in this package just once but to attach the package if I use the function several times.
As we have already seen, packages/libraries are at the heart of R. Mostly it is where we can find routines already written for various tasks. The main repository is at https://cran.r-project.org/web/packages/. Currently there are over 14500!
In fact, that is a problem: for any one task there are likely a dozen packages that would work. Finding the one that works for you is not easy!
Once you decide which one you want you can download it by clicking on the Packages tab in RStudio, select Install and typing the name. Occasionally RStudio won’t find it, then you can do it manually:
install.packages("pckname")
Useful arguments are
Notice that this only downloads the package, you still have to load it into R:
library(mypcks)
If you install a new version of R you want to update all the packages:
update.packages(ask=FALSE)
Note sometimes after a major upgrade this fails, and you have to update each package one by one. The last time this happened was after the upgrade from Ver 3.4.0 to 3.5.0.
It has been said that as soon as your project has two functions, make a library. While that might be a bit extreme, putting a collection of routines and data sets into a common library certainly is worthwhile. Here are the main steps to do so:
First we need a couple of libraries. If you are using RStudio (and you really should when creating a library), you likely have them already. If not get them as usual:
## install.packages("devtools")
library(devtools)
## devtools::install_github("klutometis/roxygen")
library(roxygen2)
First let’s make a new folder for our project and a folder called R inside of it:
create("../testlib")
Open an explorer window and go to the folder testlib
Open the file DESCRIPTION. It looks like this:
Package: testlib
Title: What the Package Does (one line, title case)
Version: 0.0.0.9000
Authors@R: person(“First”, “Last”, email = “first.last@example.com”, role = c(“aut”, “cre”))
Description: What the package does (one paragraph).
Depends: R (>= 3.5.0)
License: What license is it under?
Encoding: UTF-8
LazyData: true
and so we can change it to
Package: testlib
Title: Test Library Version: 0.0.0.9000
Authors@R: person(“W”, “R”, email = “w.r@gmail.com”, role = c(“aut”, “cre”))
Description: Let’s us learn how to make our own libraries
Depends: R (>= 3.5.0)
License: Free
Encoding: UTF-8
LazyData: true
Next we have to put the functions we want to have in our library into the R folder:
f1 <- function(x) x^2
f2 <- function(x) sqrt(x)
dump("f1", "../testlib/R/f1.R")
dump("f2", "../testlib/R/f2.R")
Let’s change the working directory to testlib and check what we have in there:
setwd("../testlib")
dir()
## [1] "Data" "DESCRIPTION" "man" "NAMESPACE"
## [5] "R" "testlib.Rproj"
dir("R")
## [1] "f1.R" "f2.R"
Often we also want some data sets as part of the library:
test.x <- 1:10
test.y <- c(2, 3, 7)
use_data(test.x, test.y)
dir("Data")
## [1] "test.x.rda" "test.y.rda"
Notice that this saves the data in the .rda format, which is good because this format can be read by R very fast.
In the next step we need to add comments to the functions.
Eventually these are the things will appear in the help files. They are
#' f1 Function
#'
#' This function finds the square.
#' @param x a number
#' @keywords square
#' @return a numeric value
#' @export
#' @examples
#' f1(2)
and the corresponding one for f2.
Now we need to process the documentation:
document()
One step left. You need to do this one from the parent working directory that contains the testlib folder.
setwd("..")
install("testlib")
Note if you now look into the folder C:/R/library there will be a folder testlib, which is this library.
Let’s check:
library(testlib)
search()[1:4]
ls(2)
f1(2)
f2(2)
And that’s it!
Now there will be two folders with the name testlib:
These two are NOT the same and only the second one is an actual R library. In essence the install command takes the first folder and turns it into a library that it puts in the place where R can find it.
I have several libraries that I often change, so I wrote a small routine to make it easy:
# ' make.library
# '
# ' This function creates a library called name in folder
# ' @param name name of library
# ' @param folder folder with library files
# ' @export
# ' @examples
# ' make.library("moodlr", folder="c:/files")
make.library <- function (name, folder)
{
library(devtools)
library(roxygen2)
olddir <- getwd()
setwd(folder) # go where you need to be
document() # make lib
setwd("..")
install(name)
setwd(olddir) # go back
}
so when I make a change to one of the routines in (say) wolfr all I need to do is run
make.library(name="wolfr",
folder="c:/wolfgang/R/mylibs")
Note that ultimately a library is a folder. You can send someone a library by sending them the folder (usually as a compressed zip file)
The easiest way to check whether there are any issues with your package is to run
devtools::check("libname")
this will give a lot of info on your package:
Errors: are just that, something that would likely prevent the package to run on someone elses computer, even if it runs on yours.
Warnings: things that could be problematic.
Notes: things that you could leave alone but should consider fixing.
including any notes, warnings and errors. All of these should be cleaned up before uploading the library.
If the library is just for yourself (or your friends) this is not necessary but it is still highly recommended. A “clean” library typically will work much better!
If you want to upload your library to CRAN, you need to do some additional checking. To start, CRAN does not allow libraries that yield errors or warnings, and sometimes even notes, so those need to be fixed. If there is a Note that you can’t or don’t want to fix you need to let the CRAN reviewer know why. This is done by creating a file called News.md and simply explaining why you don’t want to change what is causing the Note.
Packages need to run on different operating systems. Here are some checks you can do to assure that:
devtools::check_rhub("libname")
devtools::check_win_devel("libname")
Finally you are ready to upload your package to CRAN. This is done by running
release()
You will be asked a number of questions, and if you can answer yes to all of them it will upload the package. Then someone (an actual person!) will have a look at it, and if there are any issues send you an email. If not you are done!