Writing Functions

General Information

In R/RStudio you have several ways to write your own functions:

  • In the R console type
myfun <- function(x) {
  out <- x^2
  out
}  
  • RStudio: click on File > New File > R Script. A new empty window pops up. Type fun, hit enter, and the following text appears:

name <- function(variables) {

}

change the name to myfun, save the file as myfun.R with File > Save. Now type in the code. When done click the Source button.

  • fix: In the R console run
fix(myfun)

now a window with an editor pops up and you can type in the code. When you are done click on Save. If there is some syntax error DON’T run fix again, instead run

myfun <- edit()
  • Open any code editor outside of RStudio, type in the code, save it as myfun.R, go to the console and run
source('../some.folder/myfun.R')

Which of these is best? In large part that depends on your preferences. In my case, if I expect to need that function just for a bit I use the fix option. If I expect to need that function again later I start with the first method, but likely soon open the .R file outside RStudio because most code editors have many useful features not available in RStudio.

If myfun is open in RStudio there are some useful keyboard shortcuts. If the curser is on some line in the RStudio editor you can hit

  • CTRL-Enter run current line or section
  • CTRL-ALT-B run from beginning to line
  • CTRL-Shift-Enter run complete chunk
  • CTRL-Shift-P rerun previous

Testing

As always you can test whether an object is a function:

x <- 1
f <- function(x) x
is.function(x)
## [1] FALSE
is.function(f)
## [1] TRUE

The get function takes a character string and returns a function (if it exists)

get("f")(4)
## [1] 4
get("g")(4)
## Error in get("g"): object 'g' not found

Arguments

There are several ways to specify arguments in a function:

calc.power <- function(x, y, n=2) x^n + y^n

here n has a default value, x and y do not.

if the arguments are not named they are matched in order:

calc.power(2, 3) 
## [1] 13

or they can be explicitly named in any order:

calc.power(y=2, x=3)
## [1] 13
calc.power(n=3, 2, 3)
## [1] 35

This however is not recommend as it can be very confusing.


R does partial matching of arguments:

f <- function(first, second) {
  first + second
}
f(fi=1, s=3)
## [1] 4

but this is not a good programming style.

Default arguments can be defined in terms of others:

f <- function(first, second=2*first) {
  first + second
}
f(1)
## [1] 3

If an argument does not have a default it can be tested for

f <- function(first, second) {
  if(!missing(second))
      out <- first + second
  else out <- first
  out
}
f(1)
## [1] 1
f(1, s=3)
## [1] 4

There is a special argument …, used to pass arguments on to other functions:

f <- function(x, which, ...) {
  f1 <- function(x, mult) mult*x 
  f2 <- function(x, pow) x^pow
  if(which==1)
    out <- f1(x, ...)
  else
    out <- f2(x, ...)
  out
}
f(1:3, 1, mult=2)
## [1] 2 4 6
f(1:3, 2, pow=3)
## [1]  1  8 27

This is one of the most useful programming structures in R!

Note this example also shows that in R functions can call other functions. In many computer programs there are so called sub-routines, in R this concept does not exist, functions are just functions.

Functions can even call themselves:

f <- function() {
  cat("A")
  if(sample(1:5, 1)>1) f()
  cat("\n")
  
}
f()
## AAAAAAA

this is called recursion and is a very powerful programming technique, although for reasons of memory management not as useful in R as in other languages.

Lazy Evaluation

R uses a concept called lazy evaluation. This means that an argument is not evaluated until it is used:

f <- function(first, second) {
  if(first<10)
    out <- first
  else
    out <- first + second
  out
}
f(5, "A")
## [1] 5
f(11, "A")
## Error in first + second: non-numeric argument to binary operator

This can be a source of computer bugs. One can override this behavior with the force command:

f <- function(first, second) {
  force(first+second)
  if(first<10)
    out <- first
  else
    out <- first + second
  out
}
f(5, "A")
## Error in first + second: non-numeric argument to binary operator

Note there is another simple way to accomplish the same thing: just use a statement like

test <- first+second

but force makes it clearer that the purpose here is to make sure first and second are of the correct type.

Return Values

A function can either return nothing or exactly one thing. It will automatically return the last object evaluated:

f <- function(x) {
  x^2
}
f(1:3)
## [1] 1 4 9

however, it is better programming style to have an explicit return object:

f <- function(x) {
  out <- x^2
  out
}
f(1:3)
## [1] 1 4 9

There is another way to specify what is returned:

f <- function(x) {
  return(x^2)
}
f(1:3)
## [1] 1 4 9

but this is usually used to return something early in the program:

f <- function(x) {
  if(!any(is.numeric(x)))
    return("Works only for numeric!")
  out <- sum(x^2)
  out
}
f(1:3)
## [1] 14
f(letters[1:3])
## [1] "Works only for numeric!"

If you want to return more than one item use a list:

f <- function(x) {
  sq <- x^2
  sm <- sum(x)
  list(sq=sq, sum=sm)
}
f(1:3)
## $sq
## [1] 1 4 9
## 
## $sum
## [1] 6

on.exit

on.exit is a routine that you use inside a function and that gets called and executed whenever the function terminates.

The advantage of on.exit is that is gets called when the function exits, regardless of whether an error was thrown. This means that its main use is for cleaning up after risky behavior. Risky, in this context, usually means accessing resources outside of R (that consequently cannot be guaranteed to work).

Common examples include connecting to databases or files (where the connection must be closed when you are finished, even if there was an error), or saving a plot to a file (where the graphics device must be closed afterwards)

Exercise

What does this function return?

f <- function(x=y) {
  y <- 10
  x
}
f()

Basic Programming Structures in R

R has all the standard programming structures:

Conditionals (if-else)

f <- function(x) {
  if(x>0) y <- log(x)
  else y <- NA
  y
}
f(2)
## [1] 0.6931
f(-2)
## [1] NA

A useful variation on the if statement is switch:

centre <- function(x, type) {
  switch(type,
         mean = mean(x),
         median = median(x),
         trimmed = mean(x, trim = .1))
}
x <- rcauchy(10)
centre(x, "mean")
## [1] -0.9172
centre(x, "median")
## [1] 0.245
centre(x, "trimmed")
## [1] 0.1813

special R construct: ifelse

x <- sample(1:10, size=7, replace = TRUE)
x
## [1] 10  4  2 10  7  3  8
ifelse(x<5, "Yes", "No")
## [1] "No"  "Yes" "Yes" "No"  "No"  "Yes" "No"

Exercise

What is the difference between these two functions:

f1 <- function(x) {
  if(x<10) return(0)
  x
}
f2 <- function(x) {
  ifelse(x<10, 0, x)
}

Loops

there are three standard loops in R:

  • for loop
y <- rep(0, 10)
for(i in 1:10) y[i] <- i*(i+1)/2
y
##  [1]  1  3  6 10 15 21 28 36 45 55

sometimes we don’t know the length of y ahead of time, then we can use

for(i in seq_along(y)) y[i] <- i*(i+1)/2
y
##  [1]  1  3  6 10 15 21 28 36 45 55

If there is more than one statement inside a loop use curly braces:

for(i in seq_along(y)) {
  y[i] <- i*(i+1)/2
  if(y[i]>40) y[i] <- (-1)
}
y
##  [1]  1  3  6 10 15 21 28 36 -1 -1

You can nest loops:

A <- matrix(0, 4, 4)
for(i in 1:4) {
  for(j in 1:4)
    A[i, j] <- i*j
}
A
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    2    4    6    8
## [3,]    3    6    9   12
## [4,]    4    8   12   16
  • while loop

this is useful if we don’t know how often the loop needs to run.

Let’s say we want to do a simulation of rolling three dice and we want to generate the event “number of repetitions needed until a triple” (triple = all three dice equal). If so x has the equal entries, so table(x) has length one:

k <- 1
x <- sample(1:6, size=3, replace=TRUE)
while (length(table(x))!=1) {
  k <- k+1
  x <- sample(1:6, size=3, replace=TRUE)
}
k
## [1] 29
  • repeat loop

similar to while loop, except that the check is done at the end

k <- 0
repeat {
  k <- k+1
  x <- sample(1:6, size=3, replace=TRUE)
  if(length(table(x))==1) break
}
k
## [1] 38

Notice that a while and repeat loop could in principle run forever. I often include a counter that ensures the loop will eventually stop:

k <- 0
counter <- 0
repeat {
  k <- k+1
  counter <- counter+1
  x <- sample(1:6, size=3, replace=TRUE)
  if(length(table(x))==1 | counter>1000) break
}
k
## [1] 9

Useful functions for loops:

  • next immediately jumps to the next iteration
y <- rep(0, 10)
for(i in 1:10) {
  x <- round(runif(1, 1, 10))
  if(x<6) next
  y[i] <- x
}
y
##  [1]  0  9  0  6  0 10  0 10  9  6
  • break immediately terminates the loop
for(i in 1:10) {
  x <- round(runif(1, 1, 10))
  cat(x," ")
  if(x<3) break
}
## 3  10  2