In R/RStudio you have several ways to write your own functions:
myfun <- function(x) {
out <- x^2
out
}
name <- function(variables) {
}
change the name to myfun, save the file as myfun.R with File > Save. Now type in the code. When done click the Source button.
fix(myfun)
now a window with an editor pops up and you can type in the code. When you are done click on Save. If there is some syntax error DON’T run fix again, instead run
myfun <- edit()
source('../some.folder/myfun.R')
Which of these is best? In large part that depends on your preferences. In my case, if I expect to need that function just for a bit I use the fix option. If I expect to need that function again later I start with the first method, but likely soon open the .R file outside RStudio because most code editors have many useful features not available in RStudio.
If myfun is open in RStudio there are some useful keyboard shortcuts. If the curser is on some line in the RStudio editor you can hit
As always you can test whether an object is a function:
x <- 1
f <- function(x) x
is.function(x)
## [1] FALSE
is.function(f)
## [1] TRUE
The get function takes a character string and returns a function (if it exists)
get("f")(4)
## [1] 4
get("g")(4)
## Error in get("g"): object 'g' not found
There are several ways to specify arguments in a function:
calc.power <- function(x, y, n=2) x^n + y^n
here n has a default value, x and y do not.
if the arguments are not named they are matched in order:
calc.power(2, 3)
## [1] 13
or they can be explicitly named in any order:
calc.power(y=2, x=3)
## [1] 13
calc.power(n=3, 2, 3)
## [1] 35
This however is not recommend as it can be very confusing.
R does partial matching of arguments:
f <- function(first, second) {
first + second
}
f(fi=1, s=3)
## [1] 4
but this is not a good programming style.
Default arguments can be defined in terms of others:
f <- function(first, second=2*first) {
first + second
}
f(1)
## [1] 3
If an argument does not have a default it can be tested for
f <- function(first, second) {
if(!missing(second))
out <- first + second
else out <- first
out
}
f(1)
## [1] 1
f(1, s=3)
## [1] 4
There is a special argument …, used to pass arguments on to other functions:
f <- function(x, which, ...) {
f1 <- function(x, mult) mult*x
f2 <- function(x, pow) x^pow
if(which==1)
out <- f1(x, ...)
else
out <- f2(x, ...)
out
}
f(1:3, 1, mult=2)
## [1] 2 4 6
f(1:3, 2, pow=3)
## [1] 1 8 27
This is one of the most useful programming structures in R!
Note this example also shows that in R functions can call other functions. In many computer programs there are so called sub-routines, in R this concept does not exist, functions are just functions.
Functions can even call themselves:
f <- function() {
cat("A")
if(sample(1:5, 1)>1) f()
cat("\n")
}
f()
## AAAAAAA
this is called recursion and is a very powerful programming technique, although for reasons of memory management not as useful in R as in other languages.
R uses a concept called lazy evaluation. This means that an argument is not evaluated until it is used:
f <- function(first, second) {
if(first<10)
out <- first
else
out <- first + second
out
}
f(5, "A")
## [1] 5
f(11, "A")
## Error in first + second: non-numeric argument to binary operator
This can be a source of computer bugs. One can override this behavior with the force command:
f <- function(first, second) {
force(first+second)
if(first<10)
out <- first
else
out <- first + second
out
}
f(5, "A")
## Error in first + second: non-numeric argument to binary operator
Note there is another simple way to accomplish the same thing: just use a statement like
test <- first+second
but force makes it clearer that the purpose here is to make sure first and second are of the correct type.
A function can either return nothing or exactly one thing. It will automatically return the last object evaluated:
f <- function(x) {
x^2
}
f(1:3)
## [1] 1 4 9
however, it is better programming style to have an explicit return object:
f <- function(x) {
out <- x^2
out
}
f(1:3)
## [1] 1 4 9
There is another way to specify what is returned:
f <- function(x) {
return(x^2)
}
f(1:3)
## [1] 1 4 9
but this is usually used to return something early in the program:
f <- function(x) {
if(!any(is.numeric(x)))
return("Works only for numeric!")
out <- sum(x^2)
out
}
f(1:3)
## [1] 14
f(letters[1:3])
## [1] "Works only for numeric!"
If you want to return more than one item use a list:
f <- function(x) {
sq <- x^2
sm <- sum(x)
list(sq=sq, sum=sm)
}
f(1:3)
## $sq
## [1] 1 4 9
##
## $sum
## [1] 6
on.exit is a routine that you use inside a function and that gets called and executed whenever the function terminates.
The advantage of on.exit is that is gets called when the function exits, regardless of whether an error was thrown. This means that its main use is for cleaning up after risky behavior. Risky, in this context, usually means accessing resources outside of R (that consequently cannot be guaranteed to work).
Common examples include connecting to databases or files (where the connection must be closed when you are finished, even if there was an error), or saving a plot to a file (where the graphics device must be closed afterwards)
Exercise
What does this function return?
f <- function(x=y) {
y <- 10
x
}
f()
R has all the standard programming structures:
f <- function(x) {
if(x>0) y <- log(x)
else y <- NA
y
}
f(2)
## [1] 0.6931
f(-2)
## [1] NA
A useful variation on the if statement is switch:
centre <- function(x, type) {
switch(type,
mean = mean(x),
median = median(x),
trimmed = mean(x, trim = .1))
}
x <- rcauchy(10)
centre(x, "mean")
## [1] -0.9172
centre(x, "median")
## [1] 0.245
centre(x, "trimmed")
## [1] 0.1813
special R construct: ifelse
x <- sample(1:10, size=7, replace = TRUE)
x
## [1] 10 4 2 10 7 3 8
ifelse(x<5, "Yes", "No")
## [1] "No" "Yes" "Yes" "No" "No" "Yes" "No"
Exercise
What is the difference between these two functions:
f1 <- function(x) {
if(x<10) return(0)
x
}
f2 <- function(x) {
ifelse(x<10, 0, x)
}
there are three standard loops in R:
y <- rep(0, 10)
for(i in 1:10) y[i] <- i*(i+1)/2
y
## [1] 1 3 6 10 15 21 28 36 45 55
sometimes we don’t know the length of y ahead of time, then we can use
for(i in seq_along(y)) y[i] <- i*(i+1)/2
y
## [1] 1 3 6 10 15 21 28 36 45 55
If there is more than one statement inside a loop use curly braces:
for(i in seq_along(y)) {
y[i] <- i*(i+1)/2
if(y[i]>40) y[i] <- (-1)
}
y
## [1] 1 3 6 10 15 21 28 36 -1 -1
You can nest loops:
A <- matrix(0, 4, 4)
for(i in 1:4) {
for(j in 1:4)
A[i, j] <- i*j
}
A
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 2 4 6 8
## [3,] 3 6 9 12
## [4,] 4 8 12 16
this is useful if we don’t know how often the loop needs to run.
Let’s say we want to do a simulation of rolling three dice and we want to generate the event “number of repetitions needed until a triple” (triple = all three dice equal). If so x has the equal entries, so table(x) has length one:
k <- 1
x <- sample(1:6, size=3, replace=TRUE)
while (length(table(x))!=1) {
k <- k+1
x <- sample(1:6, size=3, replace=TRUE)
}
k
## [1] 29
similar to while loop, except that the check is done at the end
k <- 0
repeat {
k <- k+1
x <- sample(1:6, size=3, replace=TRUE)
if(length(table(x))==1) break
}
k
## [1] 38
Notice that a while and repeat loop could in principle run forever. I often include a counter that ensures the loop will eventually stop:
k <- 0
counter <- 0
repeat {
k <- k+1
counter <- counter+1
x <- sample(1:6, size=3, replace=TRUE)
if(length(table(x))==1 | counter>1000) break
}
k
## [1] 9
Useful functions for loops:
y <- rep(0, 10)
for(i in 1:10) {
x <- round(runif(1, 1, 10))
if(x<6) next
y[i] <- x
}
y
## [1] 0 9 0 6 0 10 0 10 9 6
for(i in 1:10) {
x <- round(runif(1, 1, 10))
cat(x," ")
if(x<3) break
}
## 3 10 2