programming.utf8.md

Programming in R

Some General Advice on Computer Programming

Start small

Even if ultimately you want to write a “big” program, first write a small one, a special case.

then add to it, piece by piece
test often

and don’t add more until you have a working program

break it up in pieces (modularity)

If your program gets too big, break it up into several smaller self-contained programs

Comment, Comment, Comment

write lots of comments. This is not only so that someone else can understand your program, but so that your future-self can as well!

In R comments are done with the hashtag # . Notice that if you have comments over several lines each line needs a hashtag.

Finding Errors, Debugging

There are essentially two type of errors:

syntax
run-time

A syntax error is a bad piece of code, such as missing braces etc. You will get an error message from R when you try to read such code into R, for example with source. In general these are fairly easy to find, R will tell you the line where (more or less) the error is.

run-time errors occur when at some point in the execution an illegal command is to be executed. If that happens R generally stops the program and returns an error.

Finding a run-time error can be very time consuming. Here are some simple things to do:

add print statements

In order to find out where in the execution a problem occurs you might add a print statement in a few places. Sometime that should print out some values of interest, sometimes it might just be print(“A”) print(“B”) etc. so you can pin-point where the program crashes.

when an error occurs depending on a random number, use set.seed so it will always occur at the same spot:

set.seed(1111)

R has some routines to help with debugging, such traceback, debug and browser. For some explanations of these read https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio

Functional Programming

In R everything is an object, and objects can be passed to and returned from functions. This includes functions!

Consider the example from earlier:

centre <- function(x, type) {
  switch(type,
         mean = mean(x),
         median = median(x),
         trimmed = mean(x, trim = .1))
}
x <- rcauchy(10)
c(centre(x, "mean"), centre(x, "median"), centre(x, "trimmed"))

## [1] 0.5397 0.1549 0.3793

We could just as easily have written it this way:

centre.alt1 <- function(x, fun) {
  fun(x)
}
c(centre.alt1(x, mean), centre.alt1(x, median))

## [1] 0.5397 0.1549

How about the last one, the trimmed mean? Here we also need to pass an argument (trim=0.1) on to function inside centre.alt. One way would be to define the function explicitly:

centre.alt1(x, function(x) {mean(x, trim = 0.1)})

## [1] 0.3793

Notice that here the function we created doesn’t have a name. It is called an anonymous function.

Or we can use the … syntax:

centre.alt2 <- function(x, fun, ...) {
  fun(x, ...)
}
centre.alt2(x, mean, trim=0.1)

## [1] 0.3793

Finally, a function can also be the output of a function:

centre.alt3 <- function(type) {
  switch(type,
         mean = mean,
         median = median,
         trimmed = function(x) {mean(x, trim = .1)})
}
c(centre.alt3("mean")(x), centre.alt3("median")(x), 
  centre.alt3("trimmed")(x))

## [1] 0.5397 0.1549 0.3793

Sometimes it is better to return a list of functions:

centre.alt4 <- function(type) {
  list(Mean = mean,
       Median = median,
       Trimmed = function(x) {mean(x, trim = .1)})
}
c(centre.alt4()$Mean(x), centre.alt4()$Median(x), 
  centre.alt4()$Trimmed(x))

## [1] 0.5397 0.1549 0.3793

Here is a way to run all of them at once:

lapply(centre.alt4(), function(f) f(x))

## $Mean
## [1] 0.5397
## 
## $Median
## [1] 0.1549
## 
## $Trimmed
## [1] 0.3793

Turning a String into a Function.

We saw before the use of get to call a function using its name as a character string:

get("mean")(rnorm(10))

## [1] -0.1449

This works fine with an object that already is a function (like mean), but let’s say we want to have the function body as a character string, not its name.

We can use the following (very famous or infamous!) construct:

txt <- "exp(-x^2)"
fun <- function(x) eval(parse(text=txt))
is.function(fun)

## [1] TRUE

x <- 0.5
c(exp(-x^2), fun(x))

## [1] 0.7788 0.7788

Why would we want to do that? We could just define a new function:

f <- function(x) exp(-x^2)
f(0.5)

## [1] 0.7788

this works fine here but what if we want to do that inside another function? Maybe one that reads a character expression from a file and wants to turn it into a function. In a while we will discuss how to write interactive web applications with R shiny, and there a user might just type a mathematical expression into a box, which R then can evaluate and maybe draw the graph.

What happens if the expression typed as a character string is not a syntctically correct function?

# expression is missing closing )
fun <- function(x) eval(parse(text="exp(-x^2"))
fun(2)

## Error in parse(text = "exp(-x^2"): <text>:2:0: unexpected end of input
## 1: exp(-x^2
##    ^

We can use the try routine to rewrite this so that it does not yield an error. It returns a character vector if the calculation inside results in error, and so we can check for that:

fun <- function(x, txt) {
  tmp <- try(parse(text=txt), silent=TRUE)
  if(is.character(tmp)) 
    return("Bad Expression!")
  else  return(eval(tmp))
}  
fun(2, "exp(-x^2")

## [1] "Bad Expression!"

fun(2, "exp(-x^2)")

## [1] 0.01832

Computing on the Language

Consider this example:

x <- seq(-1, 1, length=250)
square <- x^2
plot(x, square, type = "l")

Notice something strange? In the graph we have not only the values of x and square, also the names “x” and “square”, used as labels. How is this done? Could we use the name of the variables also (for example) in the title of the graph?

The R functions we need for this are deparse and substitute:

myplot <- function(x, y) {
  yname <- deparse((substitute(y)))
  plot(x, y, type="l", main=yname)
}
myplot(x, square)

What are substitute and deparse doing? Let’s take a closer look:

substitute does the following: it takes as its argument a string without quotes. If asked it replaces some of it with other things. Then it returns the result as an expression, which can be evaluated. Rather then evaluate the expression we want to turn it into a character string, and deparse does just that.

# just return argument
substitute(2*x)

## 2 * x

# replace something before returning
substitute(2*x, list(x=3))

## 2 * 3

#  and then evaluate it
eval(substitute(2*x, list(x=3)))

## [1] 6

# or turn it into a character string
deparse(substitute(2*x, list(x=3)))

## [1] "2 * 3"

# works for more than one variable as well
substitute(x+y, list(x=1, y=3))

## 1 + 3

eval(substitute(x+y, list(x=1, y=3)))

## [1] 4

deparse(substitute(x+y, list(x=1, y=3)))

## [1] "1 + 3"

In there is nothing to substitute one can also use the quote function:

quote(2+3)

## 2 + 3

eval(quote(2+3))

## [1] 5

deparse(quote(2+3))

## [1] "2 + 3"

The name of quote seems wrong, it does not actually put quotes on anything!

Exercise

Consider a function that just returns the argument as a character string:

f <- function(x) deparse(substitute(x))
f(1:10)

## [1] "1:10"

Let’s break the function up into two pieces:

f1 <- function(x) substitute(x)
f2 <- function(x) deparse(f1(x))

Does this the same?

Notice the call to the function rm:

ls()

## character(0)

x <- 1
ls()

## [1] "x"

rm(x)
ls()

## character(0)

x <- 1
rm("x")
ls()

## character(0)

Apparently it does not matter whether we call rm with or without quotes, it works either way. There are a number such routines, for example library.

How does this work?

Let’s write one:

f <- function(expr) {
  sexpr <- substitute(expr) 
  if(!is.character(sexpr))
    sexpr <- deparse(sexpr)
  sexpr
}
f(z)

## [1] "z"

f("z")

## [1] "z"

so either way now we have a character string.

We have previously used a nice function in R to draw graphs of functions called curve:

curve(x^2*sin(2*pi*x), 0, 1)

notice how the function is entered here, without quotes. So it is not a character string.

Let’s write our own curve routine. In addition we are going to show the function as the title! Finally, our routine will also allow us to pass two parameters to the function.

mycurve <- function(fn, # function to be drawn
              A=0, B=1, # range of graph
              np=250, # points to plot
              par1=0, par2=0 # parameters of fn
              ) {
  sexpr <- deparse(substitute(fn)) 
# get function as character string
  x <- seq(A, B, length=np) 
# make an equal-spaced grid
  f <- function(x) eval(parse(text=sexpr), 
            list(x=x, par1=par1, par2=par2))
#  now as function, with x and parameters passed to it  
  y <- f(x)
  sexpr <- gsub("par1", par1, sexpr) 
# replace text with value
  sexpr <- gsub("par2", par2, sexpr) # but as text! 
  plot(x, y, type="l", 
       main=parse(text=sexpr),
       ylab="f")
}

some examples:

mycurve(x^par1*sin(par2*pi*x), par1=2, par2=4)

Notice it even uses \(\pi\), not pi!

mycurve(dnorm(x, par1, par2), A=0, B=20, par1=10, par2=3)

There is a strange difference in the way the list and data.frame functions work:

first <- 1:4
second <- letters[1:4]
list(first, second)

## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] "a" "b" "c" "d"

data.frame(first, second)

##   first second
## 1     1      a
## 2     2      b
## 3     3      c
## 4     4      d

so data.frame uses the variable names as column names (with deparse(substitute) ) , but list does not!

Consider this:

predictor <- 1:10
response <- 20 + predictor + rnorm(10, 0, 1)
plot(predictor, response)
fit <- lm(response~predictor)
abline(fit)

summary(fit)

## 
## Call:
## lm(formula = response ~ predictor)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0608 -0.7966 -0.0824  0.7413  1.2514 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   19.965      0.637    31.3  1.2e-09
## predictor      1.045      0.103    10.2  7.5e-06
## 
## Residual standard error: 0.933 on 8 degrees of freedom
## Multiple R-squared:  0.928,  Adjusted R-squared:  0.919 
## F-statistic:  103 on 1 and 8 DF,  p-value: 7.49e-06

Notice the first part of the printout, Call. This is what we typed into R, but how does R know that? It uses the function match.call:

f <- function(x, y, z) {
  cl <- match.call()
  print(cl)
  x+y+z
}
f(1, 2, 3)

## f(x = 1, y = 2, z = 3)

## [1] 6

and it even adds the names of the arguments!

This can be used in a number of ways, for example again as text in graphs, to show how something was done. Another common usage is in update functions, which can build on the already run function.

Unusual Functions

Infix functions

In R every object is either a data set or a function. As a general principle, if it does something it is a function. This is true even for things that don’t look that way, for example the addition operation +:

1 + 2

## [1] 3

In fact we can write this explicitly as a function:

'+'(1, 2)

## [1] 3

Another example is subsetting:

x <- letters
x[3:5]

## [1] "c" "d" "e"

`[`(x, 3:5)

## [1] "c" "d" "e"

Because these types of functions are sitting between their arguments they are called infix operators, in contrast to prefix operators that have the name of the function first.

Because they are functions we can write our own. User-written prefix functions always have to start and end with %.

Example

Let’s write a power operator that calculates \(x^y\).

`%pow%` <- function(x, y) x^y
c(2, 2, 3, 3) %pow% c(2, 1/2, 3, 1/3)

## [1]  4.000  1.414 27.000  1.442

Writing an infix function makes the most sense if it is some basic arithmetic operation, where the format

x operator y

is sensible.

Exercise

We already encountered one such infix function. What was it?

Replacement Functions

Consider the following code:

x <- 1:3
names(x) <- c("a", "b", "c")

Nothing strange here, right? Except, look at it again: we are assigning values to a function call!

How is this possible?

The above code is actually short for this one:

x <- "names<-"(x, value=letters[1:3])

and yes, the <- needs to be inside the quotes.

In general functions never change the objects outside of them

## a b c 
## 1 2 3

f <- function(x) {
  x <- "A"
  x
}
f(x)

## [1] "A"

## a b c 
## 1 2 3

This function (names) however does just that, it changes the names of x. This type of function is called a replacement function.

And yes, we can write our own:

Example

Let’s write a function that changes the order of the objects, maybe reverses it:

`change.order<-` <- 
  function(x, value) {
    x <- x[value]
    x
  }
x <- letters[1:5]
x

## [1] "a" "b" "c" "d" "e"

change.order(x) <- c(2, 4, 3, 1, 5)
x

## [1] "b" "d" "c" "a" "e"

Case Study: Binary Arithmetic

In our everyday world we use 10 digits to express numbers, a system called decimal. There is however nothing special about 10 (except that we have 10 fingers and toes!) In fact, computers only use 2 digits, 0 and 1 (or electricity is on and off), a system called binary. Every decimal number has an equivalent binary number, and one can do arithmetic in either system.

Note that we will restrict ourselves to integers only, but in general this works for any real or even complex number.

Here are the first numbers: \[ \begin{aligned} &0=0 \\ &1=1 \\ &2=10 \\ &3=11 \\ &4=100 \\ &5=101 \\ &6=110 \\ &7=111 \\ &8=1000 \\ \end{aligned} \] We want to implement a system that allows us to work with binary numbers in R.

First we need to figure out how to represent these numbers in R. The easiest way is as a sequence of 0’s and 1’s, like (1, 0, 1) (equivalent to 5).

We begin by writing some routines that convert a number in the decimal system to binary and vice versa:

How can we turn a decimal into a binary? Notice the following

\[ \begin{aligned} &0=0=0\times2^0 \\ &1=1=1\times2^0 \\ &2=10=1\times2^1+0\times2^0 \\ &3=11=1\times2^1+1\times2^0 \\ &4=100=1\times2^2+0\times2^1+0\times2^0 \\ &5=101=1\times2^2+0\times2^1+1\times2^0 \\ &6=110=1\times2^2+1\times2^1+1\times2^0 \\ &7=111=1\times2^2+1\times2^1+1\times2^0 \\ &8=1000=1\times2^3+0\times2^2+0\times2^1+0\times2^0 \\ \end{aligned} \] so the idea is to write x in the form \(\sum i_k 2^k\) where \(i_k \in \left\{0, 1 \right\}\)

Let’s try an example:

\[ 26 = \\ 2^4+2^3+2^1 = \\ (1, 1, 0, 1, 0) \] why does this start with \(2^4\)? Because \(2^4\) is the largest power of 2 still less or equal to 26, or \(4 =\max\left\{i \in N: 2^i<26\right\}\).

For a general number m, how can we find out what this largest i is? We have

\[ \begin{aligned} &2^i \le m \\ &i\log(2)\le \log(m) \\ &i \le \log(m)/\log(2) \\ \end{aligned} \] and in fact if we use log base 2 we have

\[ i=\text{floor}(\log(m, base=2)) \] Once we have the first such i, we can simply subtract \(2^i\) from m and start all over again, until there is nothing left:

decimal.2.binary <- function(x) {
    if(x==0) return(0)  # simple cases 
    if(x==1) return(1)  
    i <- floor(log(x, base=2)) # largest power of 2 less than x
    bin.x <- rep(1, i+1) # we will need i+1 0'1 and 1's, 
                         # first is always a 1
    x <- x-2^i # done with largest 2^i
    for(j in (i-1):0) {
       if(2^j>x) 
         bin.x[j+1] <- 0
       else {
         bin.x[j+1] <- 1
         x <- x-2^j
       }
    }
    bin.x[length(bin.x):1]
# Order needs to be reversed    
}
decimal.2.binary(7)

## [1] 1 1 1

decimal.2.binary(8)

## [1] 1 0 0 0

decimal.2.binary(26)

## [1] 1 1 0 1 0

Of course, the other way around is much simpler:

binary.2.decimal <- function(x) sum(x*2^(length(x):1-1))
binary.2.decimal(c(1, 1, 1))

## [1] 7

binary.2.decimal(c(1, 0, 0, 0))

## [1] 8

binary.2.decimal(c(1, 1, 0, 1, 0))

## [1] 26

binary.2.decimal(decimal.2.binary(126))

## [1] 126

How does addition work? essentially we add the numbers piece by piece from right to left, with a 1 carried over whenever there are two 1’s. For example

\[ \begin{aligned} \text{ }&0010 \text{ }\text{ }\text{ }(2)\\ +\text{ }&0110 \text{ }\text{ }\text{ }(6) \\ =\text{ }&1000 \text{ }\text{ }\text{ }(8)\\ \end{aligned} \] A general algorithm for addition is given here:

It uses some strange words, but those are not important to us. Here is the corresponding R function:

binary_addition <- function(x, y) {
# First make x and y of equal length and with one extra 
# slot in case it's needed for carry over
# Fill x and y with 0's as needed. 
  n <- length(x)
  m <- length(y)
  N <- max(n, m)+1
  x <- c(rep(0, N-n), x)
  y <- c(rep(0, N-m), y)  
  s <- rep(0, N) #  for result
  ca <- 0 # for carry over term
  for(i in N:1) {
      n <- x[i]+y[i]+ca
      if(n<=1) {# no carry over
        s[i] <- n
        ca <- 0
      }  
      else {# with carry over
        s[i] <- 0
        ca <- 1
      }
    }
  if(s[1]==0) s <- s[-1]# leading 0 removed if necessary
  s
}
binary_addition(c(1, 0), c(1, 1, 0))

## [1] 1 0 0 0

Let’s turn this into an infix addition operator. We could just call it %+% but instead I will call it %+b% so the chances of this name already being used by R somewhere else are small, and it will be easier to remember later what it is doing.

'%+b%' <- function(x, y) binary_addition(x, y)
x <- c(1, 0) #  2
y <- c(1, 1, 0) #  6
x %+b% y # 2+6=8

## [1] 1 0 0 0

Notice a big problem with how we have defined binary numbers: how are we going to define a vector of them, as a vector of vectors? Lists might work, but would be a bit ugly. A better idea might be as a character string like "0101". But before we can handle that we will need to learn how to work with strings.

Let’s write two more functions:

is.binary should check whether a vector is (can be) a binary number. For this it has to consist entirely of 0’s and 1’s:

is.binary <- function(x) {
  if(all(x==0)) return(TRUE)
  x <- x[x!=0]
  x <- x[x!=1]
  if(length(x)==0) return(TRUE)
  return(FALSE)
}
is.binary(c(1, 0, 1, 1))

## [1] TRUE

is.binary(1)

## [1] TRUE

is.binary(0)

## [1] TRUE

is.binary(c(1, 2, 1, 1))

## [1] FALSE

as.binary should turn vectors into a binary number. For this we will use the following rules:

\(0 \rightarrow 0\), \(x \ne 0 \rightarrow 1\)
FALSE \(\rightarrow 0\), TRUE \(\rightarrow 1\)
Anything else NA

as.binary <- function(x) {
  if(is.logical(x)) return(as.numeric(x))
  if(is.numeric(x)) return(ifelse(x==0, 0, 1))
  NA
}
as.binary(c(0, 1, 2, 1, 2))

## [1] 0 1 1 1 1

as.binary(1:4 > 2)

## [1] 0 0 1 1

as.binary(0)

## [1] 0

as.binary(c(1, 2, 1, "a"))

## [1] NA