1 Overview

In this tutorial, we’ll cover the basics of how to write your own functions in R. This skill will be useful when you inevitably want to do something in R that doesn’t already have a function. It can also be used to combine existing functions to make your life that much easier.

1.1 Packages (and versions) used in this document

## [1] "R version 4.0.2 (2020-06-22)"
##    Package Version
##  tidyverse   1.3.0
##      readr   1.3.1
##      dplyr   1.0.1

2 Basics

To write a function, you will use this basic format:

func_name <- function(argument1, argument2, etc) {
  body_of_function
}

For a simple example, let’s write a function that returns the difference between two numbers. We’ll call it difference(). As you can see, this function will take two arguments x and y. The function will take the input numbers and subract y from x and return the result.

difference <- function(x, y) {
  x - y
}

Now let’s try out our function.

difference(x = 5, y = 4)
## [1] 1

2.1 Functions inside of Functions

Let’s make it just a tad more complex and demonstrate that we can utilize functions inside of other functions.
What if we only want the magnitude of the difference and not the direction? We can write a new function to give us that.

diff_mag <- function(x, y) {
  abs(x - y)
}

In our new function, diff_mag(), we’re using an existing function (abs()) to return the absolute value of the difference between our two arguments. To make sure this is working, we can do two quick tests. If this function is working, then we should get the same result, regardless of the order of our inputs.

diff_mag(5, 4)
## [1] 1
diff_mag(4, 5)
## [1] 1

2.2 Default Values

Let’s add another layer of complexity: default values. If we were to try and run our function without an input for either the x or y arguments (as shown below), we would get an error.

diff_mag(x = 5)
diff_mag(y = 4)

Because no default values are set for the arguments, the function won’t run unless both arguments are specified. That makes sense for this function, but for the sake of illustration, let’s set the default y-value to 0.

diff_mag2 <- function(x, y = 0) {
  abs(x - y)
}

Now that y has a default value, our function will run even if no y input is specified.

diff_mag2(x = 5)
## [1] 5

2.3 Return Statements

We can also specify a return() statement. By default, a function will return the last output generated; however, this may not be the only piece of information that you wish to have returned.

diff_mag3 <- function(x, y = 0) {
  z <- abs(x - y)
  return(paste0("The absolute difference between ", x, " and ", y, " is ", z,"."))
}

For example, in the function above, the return() statement specifies that we want the output to be returned as a character string.

diff_mag3(x = 5)
## [1] "The absolute difference between 5 and 0 is 5."

We could, however, have this set to simply return the value like earlier, with the following code.

diff_mag4 <- function(x, y = 0) {
  z <- abs(x - y)
  return(z)
}

diff_mag4(x = 5, y = 17)
## [1] 12

2.4 ... argument

One very important argument to be aware of when writing your own functions is the ... argument. The ... argument allows multiple additional arguments to be passed to it. We can include this in our functions in various ways in order to achieve different results. For example, if we include it as the first argument, we can recreate the concatenate, c(), function. We’ll call it v().

v <- function(...) {
  unlist(list(...))
}

v(1, 2, 3)
## [1] 1 2 3

If we place the ... argument after the other arguments in our function, we can use it to pass additional arguments to our function, such as na.rm =. To illustrate this, we’ll recreate the mean() function. First, we’ll do it without including ....

our_mean <- function(x) {
    n = length(x)
    total = sum(x)
    average = total/n
    average
}

We’ll give this a quick test to see if it’s working.

vector <- c(1:99)
mean(vector)
## [1] 50
our_mean(vector)
## [1] 50

If we only used this one test, we might assume that our function, our_mean(), is working properly. Let’s give it another quick test. This time we’ll see how it handles missing data. Because we know there’s missing data, we’ll add na.rm = TRUE to ignore it.

vector2 <- c(1:99, NA)
mean(vector2, na.rm = TRUE)
## [1] 50

We can try to do this with our_mean(), but we’ll get an error returned, because our function isn’t built to allow the use of na.rm =.

our_mean(vector2, na.rm = TRUE)

We need to go back to the drawing board and add ... to our function to allow the use of additional arguments. We’ll also tweak the n = portion of our function to account for the removal of any missing values.

our_mean2 <- function(x, ...) {
    n = length(x) - sum(is.na(x))
    total = sum(x, ...)
    average = total/n
    average
}

This should fix the missing data issue for us.

mean(vector2, na.rm = TRUE)
## [1] 50
our_mean2(vector2, na.rm = TRUE)
## [1] 50

3 Why Write Functions?

If you don’t plan to ever write a package in R, you may be wondering why you would ever want to write a function. Knowing how to write functions can save a lot of copying/pasting/editing time if you’re going to be doing something semi-repetitive but with different inputs.

For example, we’ll look at some data on male heights.

male_ht <- read_csv("https://raw.githubusercontent.com/KRR1114/public_data_files/master/male_ht.csv")
head(x = male_ht, n = 2)
## # A tibble: 2 x 13
##      X1 age    mean    sd `3rd` `5th` `10th` `25th` `50th` `75th` `90th` `95th`
##   <dbl> <chr> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1     1 Birt…  51.4  2.9   45.7  46.4   48.3   49.5   51.4   53.3   54.6   55.9
## 2     2 0–0.…  52.0  2.46  48.3  48.3   48.8   50.8   52.1   53.3   55.8   55.9
## # … with 1 more variable: `97th` <dbl>
female_ht <- read_csv("https://raw.githubusercontent.com/KRR1114/public_data_files/master/female_ht.csv")
head(x = female_ht, n = 2)
## # A tibble: 2 x 13
##      X1 age    mean    sd `3rd` `5th` `10th` `25th` `50th` `75th` `90th` `95th`
##   <dbl> <chr> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1     1 Birt…  50.6  2.8   45.1  45.7   47     48.9   50.8   52.1     54   54.6
## 2     2 0–0.…  51.3  2.28  47.3  48.2   48.3   49.5   50.8   53.3     54   55.2
## # … with 1 more variable: `97th` <dbl>

Here is some code that I recently wrote (altered to make it uglier to help illustrate my point) to change reported standard deviations to reflect various intra-class correlation (ICC) values.

male_ht$sd_icc80 <- sqrt(male_ht$sd^2 + 
                           (male_ht$sd^2 - 
                              .8*(male_ht$sd^2))/.8)

That code is ugly, but it’s not too bad…At least not until I tell you that we’re going to recode it for ICC values of .8, .7, and .5 for both males and females. That would amount to 5 copy/paste/edit sequences, and a good rule of thumb is to write a function if you’re going to need to copy and paste more than twice. So, let’s write a function!

In the following code, notice that we’re setting the default icc value to 1, we’re creating variables inside of the function, and that we’re making the code much clearer than the earlier chunk of code.

sd_from_icc <- function(sd, icc = 1) { 
  x <- sd^2
  sig_sq_err <- ((x - icc*(x))/icc)
  sig_sq_tot <- (x + sig_sq_err)
  new_sd <- sqrt(sig_sq_tot)
  return(new_sd)
}

Now that we have our new function, we might as well use it! On second thought…we should probably test it to make sure that our function works the same as our earlier code.

test <- sd_from_icc(sd = male_ht$sd, icc = .8)

setequal(test, male_ht$sd_icc80)
## [1] TRUE

Looks like it works!

From all of this, you now know how to write functions, but did you pick up on HOW to write functions? Let’s explicitly state the 5 steps for writing good (i.e., correct & understandable) functions:

  1. Start with a problem (changing sd by ICC)
  2. Get a working piece of code
  3. Rewrite to use temporary variables
  4. Rewrite for clarity
  5. Turn it into a function

We may have done 3 & 4 in conjunction with 5, but it all worked out the same.

Let’s go back and do that task I mentioned earlier (.8, .7, .5 for males and females). We can do it two different ways. First, we could just use our new function six times as shown below.

male_ht$sd_icc80 <- sd_from_icc(sd = male_ht$sd, icc = .8)
male_ht$sd_icc70 <- sd_from_icc(sd = male_ht$sd, icc = .7)
male_ht$sd_icc50 <- sd_from_icc(sd = male_ht$sd, icc = .5)

female_ht$sd_icc80 <- sd_from_icc(sd = female_ht$sd, icc = .8)
female_ht$sd_icc70 <- sd_from_icc(sd = female_ht$sd, icc = .7)
female_ht$sd_icc50 <- sd_from_icc(sd = female_ht$sd, icc = .5)

Or, if we recognize that some of the information is consistent in our uses (i.e., sd = in each group), we can write a function that prevents us from having to repeat the unchanging information. In the code below, you can see that we’re setting the default values to be equal to the inputs we used in the previous code. This makes it unnecessary for us to type that information out with each use.

m_sd_from_icc <- function(sd = male_ht$sd, icc = 1) { 
  x <- sd^2
  sig_sq_err <- ((x - icc*(x))/icc)
  sig_sq_tot <- (x + sig_sq_err)
  new_sd <- sqrt(sig_sq_tot)
  return(new_sd)
}

f_sd_from_icc <- function(sd = female_ht$sd, icc = 1) { 
  x <- sd^2
  sig_sq_err <- ((x - icc*(x))/icc)
  sig_sq_tot <- (x + sig_sq_err)
  new_sd <- sqrt(sig_sq_tot)
  return(new_sd)
}