Environments and Libraries

WARNING: In what follows I will only discuss a FEW of the issues involved with environments, and I will simplify them greatly. For a much more detailed discussion see http://adv-r.had.co.nz/Environments.html.

Let’s start with this:

search()
##  [1] ".GlobalEnv"        "package:knitr"     "package:stats"    
##  [4] "package:graphics"  "package:grDevices" "package:utils"    
##  [7] "package:datasets"  ".MyEnv"            "package:moodleR"  
## [10] "package:wolfr"     "package:shiny"     "package:Rcpp"     
## [13] "package:grid"      "package:ggplot2"   "package:methods"  
## [16] "Autoloads"         "package:base"

These are the environments currently loaded into my R. In some ways you can think of this as a folder tree, like:

paste0(search(), "/", collapse="")
## [1] ".GlobalEnv/package:knitr/package:stats/package:graphics/package:grDevices/package:utils/package:datasets/.MyEnv/package:moodleR/package:wolfr/package:shiny/package:Rcpp/package:grid/package:ggplot2/package:methods/Autoloads/package:base/"

This has the following effect. Say you type

x <- runif(10)
mean(x)
## [1] 0.4867

What has R just done? First it created an object called “x”, and stored it in the folder “.GlobalEnv”. We can check:

ls()
## [1] "x"

Next R starts looking for an object called mean. To do that it again first looks into “.GlobalEnv”, but we already know it is not there.

Next R looks into “package:knitr”, which we can do with

ls(2, pattern="mean")
## character(0)

and again no luck. This continues until we get to “package:base”:

ls(17, pattern="mean")
## [1] "mean"          "mean.Date"     "mean.default"  "mean.difftime"
## [5] "mean.POSIXct"  "mean.POSIXlt"

and there it is!

If an object is not in any of these environments it will give an error:

ddgdg
## Error in eval(expr, envir, enclos): object 'ddgdg' not found

This makes it clear that a routine that is part of a library can only be found if that library is loaded.

One difference between a folder tree and this is that R starts looking at the top (in .GobalEnv) and then works its way down.

There is an easy way to find out in which environment an object is located, with the routine where in the library pryr:

library(pryr)
where("x")
## <environment: R_GlobalEnv>
where("mean")
## <environment: base>

Another important consequence of this is that R stops when it finds an object, even if the one you want is in a later environment. Here is an example:

my.data <- data.frame(x=1:10)
attach(my.data)
## The following object is masked _by_ .GlobalEnv:
## 
##     x
mean(x)
## [1] 0.4867
rm(x)
mean(x)
## [1] 5.5
search()[1:3]
## [1] ".GlobalEnv"   "my.data"      "package:pryr"

So here is what happens:

  • the first time we call mean(x) R finds an x (the original one) in .GlobalEnv, and so calculates its mean.
  • after removing this x, the next time we call mean(x) it looks into the data frame my.data, finds a variable called x, and now calculates its mean.

Notice that R gives a warning when we attach the data frame, telling us that there are now two x’s.

The rules that R uses to find things are called scoping rules.

Let’s clean up before we continue:

detach(2)

runtime environments

How does this work when we run a function? To find out we can write a little function:

show.env <- function(){
  x <- 1
  print(list(ran.in=environment(),
       parent=parent.env(environment()),
       objects=ls.str(environment())))
  
}
show.env()
## $ran.in
## <environment: 0x0000000018b40920>
## 
## $parent
## <environment: R_GlobalEnv>
## 
## $objects
## x :  num 1

this tells us that R ran the function in an environment with a very strange name, which usually means it was created randomly. We can also see that its parent environment was .GlobalEnv and that x is an object in it.

This means that any object created inside a function is only known there, it does not overwrite any objects outside the function. One consequence is that if we need to create some temporary objects we can use simple names like x or i, even if these already exist outside of the function.

Now where does show.env live?

environment(show.env)
## <environment: R_GlobalEnv>

Obvious, because that is where we created it!

How about a function inside a function?

show.env <- function() {
  f <- function(){
  print(list(ran.in=environment(),
       parent=parent.env(environment()),
       objects=ls.str(environment())))
  
  }
  f()
  x <- 1
  print(list(ran.in=environment(),
       parent=parent.env(environment()),
       objects=ls.str(environment())))
  
}
show.env()
## $ran.in
## <environment: 0x0000000017139948>
## 
## $parent
## <environment: 0x0000000017139830>
## 
## $objects
## 
## $ran.in
## <environment: 0x0000000017139830>
## 
## $parent
## <environment: R_GlobalEnv>
## 
## $objects
## f : function ()  
## x :  num 1

As we expect, the parent environment of f is the runtime environment of show.env.

Sometimes we want to save an object created inside a function to the global environment:

f <- function() {
  a<-1
  assign("a", a, envir=.GlobalEnv)
  
}
ls()
## [1] "f"        "my.data"  "show.env"
f()
ls()
## [1] "a"        "f"        "my.data"  "show.env"

One place where this is useful is if you have a routine like a simulation that runs for a long time and you want to save intermediate results.


As we just saw, environments can come about by loading libraries, by attaching data frames (also lists) and (at least for a short while) by running a function. In fact we can also make our own:

test_env <- new.env()
attach(test_env)
search()[1:3]
## [1] ".GlobalEnv"   "test_env"     "package:pryr"

Now we can add stuff to our environment using the list notation:

test_env$a <- 1
test_env$fun <- function(x) x^2
ls(2)
## character(0)

Where are a and fun? Ops, we forgot to attach test_env:

attach(test_env)
ls(2)
## [1] "a"   "fun"
search()[1:3]
## [1] ".GlobalEnv" "test_env"   "test_env"

note that we had to attach the environment again for the two new objects to be useful, but now we have two of them. It would be better if we detached it first.

Actually, let’s detach it completely

detach(2)
search()
##  [1] ".GlobalEnv"        "package:pryr"      "package:knitr"    
##  [4] "package:stats"     "package:graphics"  "package:grDevices"
##  [7] "package:utils"     "package:datasets"  ".MyEnv"           
## [10] "package:moodleR"   "package:wolfr"     "package:shiny"    
## [13] "package:Rcpp"      "package:grid"      "package:ggplot2"  
## [16] "package:methods"   "Autoloads"         "package:base"

Why would you want to make a new environment? I have one called .MyEnv that is created at startup. It has a set of small functions that I like to have available at all times but I don’t want to “see” them when I run ls().

ls(".MyEnv")
##  [1] "dp"   "h"    "hh"   "ht"   "ip"   "mcat" "s"    "sc"   "sr"   "trw"

If an object is part of a package that is installed on your computer you can also use it without loading the package with the :: operator. As an example consider the package mailR, which has the function send.mail to send emails from within R:

args(mailR::send.mail)
## function (from, to, subject = "", body = "", encoding = "iso-8859-1", 
##     html = FALSE, inline = FALSE, smtp = list(), authenticate = FALSE, 
##     send = TRUE, attach.files = NULL, debug = FALSE, ...) 
## NULL

Some R texts suggest to avoid using attach at all, and to always use ::. The reason is that what works on your computer with its specific setup may not work on someone elses. My preference is to use :: if I use a function in this package just once but to attach the package if I use the function several times.

Packages

As we have already seen, packages/libraries are at the heart of R. Mostly it is where we can find routines already written for various tasks. The main repository is at https://cran.r-project.org/web/packages/. Currently there are over 14500!

In fact, that is a problem: for any one task there are likely a dozen packages that would work. Finding the one that works for you is not easy!

Once you decide which one you want you can download it by clicking on the Packages tab in RStudio, select Install and typing the name. Occasionally RStudio won’t find it, then you can do it manually:

install.packages("pckname")

Useful arguments are

  • lib: the folder on you hard drive where you want to store the package (usually c:/R/lib).
  • repos: the place on the internet where the package is located (if not it pops up a list to choose from).
  • dependencies=TRUE will also download any additional packages required.

Notice that this only downloads the package, you still have to load it into R:

library(mypcks)

If you install a new version of R you want to update all the packages:

update.packages(ask=FALSE)

Note sometimes after a major upgrade this fails, and you have to update each package one by one. The last time this happened was after the upgrade from Ver 3.4.0 to 3.5.0.

Creating your own library

It has been said that as soon as your project has two functions, make a library. While that might be a bit extreme, putting a collection of routines and data sets into a common library certainly is worthwhile. Here are the main steps to do so:

First we need a couple of libraries. If you are using RStudio (and you really should when creating a library), you likely have them already. If not get them as usual:

## install.packages("devtools")
library(devtools)
## devtools::install_github("klutometis/roxygen")
library(roxygen2)

First let’s make a new folder for our project and a folder called R inside of it:

create("../testlib")

Open an explorer window and go to the folder testlib

Open the file DESCRIPTION. It looks like this:


Package: testlib
Title: What the Package Does (one line, title case)
Version: 0.0.0.9000
: person(“First”, “Last”, email = “”, role = c(“aut”, “cre”))
Description: What the package does (one paragraph).
Depends: R (>= 3.5.0)
License: What license is it under?
Encoding: UTF-8
LazyData: true


and so we can change it to


Package: testlib
Title: Test Library Version: 0.0.0.9000
: person(“W”, “R”, email = “”, role = c(“aut”, “cre”))
Description: Let’s us learn how to make our own libraries
Depends: R (>= 3.5.0)
License: Free
Encoding: UTF-8
LazyData: true


Next we have to put the functions we want to have in our library into the R folder:

f1 <- function(x) x^2
f2 <- function(x) sqrt(x)
dump("f1", "../testlib/R/f1.R")
dump("f2", "../testlib/R/f2.R")

Let’s change the working directory to testlib and check what we have in there:

setwd("../testlib")
dir()
## [1] "Data"          "DESCRIPTION"   "man"           "NAMESPACE"    
## [5] "R"             "testlib.Rproj"
dir("R")
## [1] "f1.R" "f2.R"

Often we also want some data sets as part of the library:

test.x <- 1:10
test.y <- c(2, 3, 7)
use_data(test.x, test.y)
dir("Data")
## [1] "test.x.rda" "test.y.rda"

Notice that this saves the data in the .rda format, which is good because this format can be read by R very fast.

In the next step we need to add comments to the functions.

Eventually these are the things will appear in the help files. They are

#' f1 Function  
#'  
#' This function finds the square.  
#' @param x a number  
#' @keywords square  
#' @return a numeric value
#' @export  
#' @examples  
#' f1(2)

and the corresponding one for f2.

Now we need to process the documentation:

document()

One step left. You need to do this one from the parent working directory that contains the testlib folder.

setwd("..")
install("testlib")

Note if you now look into the folder C:/R/library there will be a folder testlib, which is this library.

Let’s check:

library(testlib)
search()[1:4]
ls(2)
f1(2)
f2(2)

And that’s it!

Now there will be two folders with the name testlib:

  • the one we just created
  • another one in the default library folder. On Windows machines that is usually ../R/library and on Macs /Library/Frameworks/R.framework/Resources/library .

These two are NOT the same and only the second one is an actual R library. In essence the install command takes the first folder and turns it into a library that it puts in the place where R can find it.


I have several libraries that I often change, so I wrote a small routine to make it easy:

# ' make.library
# '
# ' This function creates a library called name in folder
# ' @param name  name of library
# ' @param folder folder with library files
# ' @export
# ' @examples
# ' make.library("moodlr", folder="c:/files")

make.library <- function (name, folder) 
{
    library(devtools)  
    library(roxygen2)
    olddir <- getwd() 
    setwd(folder) # go where you need to be
    document()  # make lib
    setwd("..")
    install(name)
    setwd(olddir) # go back
    
}

so when I make a change to one of the routines in (say) wolfr all I need to do is run

make.library(name="wolfr",
    folder="c:/wolfgang/R/mylibs")

Note that ultimately a library is a folder. You can send someone a library by sending them the folder (usually as a compressed zip file)

Testing a library

The easiest way to check whether there are any issues with your package is to run

devtools::check("libname")

this will give a lot of info on your package:

  • Errors: are just that, something that would likely prevent the package to run on someone elses computer, even if it runs on yours.

  • Warnings: things that could be problematic.

  • Notes: things that you could leave alone but should consider fixing.

including any notes, warnings and errors. All of these should be cleaned up before uploading the library.

If the library is just for yourself (or your friends) this is not necessary but it is still highly recommended. A “clean” library typically will work much better!

CRAN

If you want to upload your library to CRAN, you need to do some additional checking. To start, CRAN does not allow libraries that yield errors or warnings, and sometimes even notes, so those need to be fixed. If there is a Note that you can’t or don’t want to fix you need to let the CRAN reviewer know why. This is done by creating a file called News.md and simply explaining why you don’t want to change what is causing the Note.

Packages need to run on different operating systems. Here are some checks you can do to assure that:

devtools::check_rhub("libname")
devtools::check_win_devel("libname")

Finally you are ready to upload your package to CRAN. This is done by running

release()

You will be asked a number of questions, and if you can answer yes to all of them it will upload the package. Then someone (an actual person!) will have a look at it, and if there are any issues send you an email. If not you are done!