June 15, 2022

How to Write Functions in R (with 18 Code Examples)

Functions are essential tools in R. Here’s what you need to know about creating and calling them — and more.

A function in R is one of the most used objects. It's very important to understand the purpose and syntax of R functions and knowing how to create or use them. In this tutorial, we'll learn all these things and more: what an R function is, what types of functions exist in R, when we should use a function, the most popular built-in functions, how to create and call a user-defined function, how to call one function inside another one, and how to nest functions.

What Is a Function in R?

A function in R is an object containing multiple interrelated statements that are run together in a predefined order every time the function is called. Functions in R can be built-in or created by the user (user-defined). The main purpose of creating a user-defined function is to optimize our program, avoid the repetition of the same block of code used for a specific task that is frequently performed in a particular project, prevent us from inevitable and hard-to-debug errors related to copy-paste operations, and make the code more readable. A good practice is creating a function whenever we're supposed to run a certain set of commands more than twice.

Built-in Functions in R

There are plenty of helpful built-in functions in R used for various purposes. Some of the most popular ones are:

  • min(), max(), mean(), median() – return the minimum / maximum / mean / median value of a numeric vector, correspondingly
  • sum() – returns the sum of a numeric vector
  • range() – returns the minimum and maximum values of a numeric vector
  • abs() – returns the absulute value of a number
  • str() – shows the structure of an R object
  • print() – displays an R object on the console
  • ncol() – returns the number of columns of a matrix or a dataframe
  • length() – returns the number of items in an R object (a vector, a list, etc.)
  • nchar() – returns the number of characters in a character object
  • sort() – sorts a vector in ascending or descending (decreasing=TRUE) order
  • exists() – returns TRUE or FALSE depending on whether or not a variable is defined in the R environment

Let's see some of the above functions in action:

vector <- c(3, 5, 2, 3, 1, 4)

print(min(vector))
print(mean(vector))
print(median(vector))
print(sum(vector))
print(range(vector))
print(str(vector))
print(length(vector))
print(sort(vector, decreasing=TRUE))
print(exists('vector'))  ## note the quotation marks
[1] 1
[1] 3
[1] 3
[1] 18
[1] 1 5
 num [1:6] 3 5 2 3 1 4
NULL
[1] 6
[1] 5 4 3 3 2 1
[1] TRUE

Creating a Function in R

While applying built-in functions facilitates many common tasks, often we need to create our own function to automate the performance of a particular task. To declare a user-defined function in R, we use the keyword function. The syntax is as follows:

function_name <- function(parameters){
  function body 
}

Above, the main components of an R function are: function name, function parameters, and function body. Let's take a look at each of them separately.

Function Name

This is the name of the function object that will be stored in the R environment after the function definition and used for calling that function. It should be concise but clear and meaningful so that the user who reads our code can easily understand what exactly this function does. For example, if we need to create a function for calculating the circumference of a circle with a known radius, we'd better call this function circumference rather than function_1 or circumference_of_a_circle. (Side note: While commonly we use verbs in function names, it's ok to use just a noun if that noun is very descriptive and unambiguous.)

Function Parameters

Sometimes, they are called formal arguments. Function parameters are the variables in the function definition placed inside the parentheses and separated with a comma that will be set to actual values (called arguments) each time we call the function. For example:

circumference <- function(r){
    2*pi*r
}
print(circumference(2))
[1] 12.56637

Above, we created a function to calculate the circumference of a circle with a known radius using the formula $C = 2\pi$$r$, so the function has the only parameter r. After defining the function, we called it with the radius equal to 2 (hence, with the argument 2).

It's possible, even though rarely useful, for a function to have no parameters:

hello_world <- function(){
    'Hello, World!'
}
print(hello_world())
[1] "Hello, World!"

Also, some parameters can be set to default values (those related to a typical case) inside the function definition, which then can be reset when calling the function. Returning to our circumference function, we can set the default radius of a circle as 1, so if we call the function with no argument passed, it will calculate the circumference of a unit circle (i.e., a circle with a radius of 1). Otherwise, it will calculate the circumference of a circle with the provided radius:

circumference <- function(r=1){
    2*pi*r
}
print(circumference())
print(circumference(2))
[1] 6.283185
[1] 12.56637

Function Body

The function body is a set of commands inside the curly braces that are run in a predefined order every time we call the function. In other words, in the function body, we place what exactly we need the function to do:

sum_two_nums <- function(x, y){
    x + y
}
print(sum_two_nums(1, 2))
[1] 3

Note that the statements in the function body (in the above example – the only statement x + y) should be indented by 2 or 4 spaces, depending on the IDE where we run the code, but the important thing is to be consistent with the indentation throughout the program. While it doesn't affect the code performance and isn't obligatory, it makes the code easier to read.

It's possible to drop the curly braces if the function body contains a single statement. For example:

sum_two_nums <- function(x, y) x + y
print(sum_two_nums(1, 2))
[1] 3

As we saw from all the above examples, in R, it usually isn't necessary to explicitly include the return statement when defining a function since an R function just automatically returns the last evaluated expression in the function body. However, we still can add the return statement inside the function body using the syntax return(expression_to_be_returned). This becomes inevitable if we need to return more than one result from a function. For example:

mean_median <- function(vector){
    mean <- mean(vector)
    median <- median(vector)
    return(c(mean, median))
}
print(mean_median(c(1, 1, 1, 2, 3)))
[1] 1.6 1.0

Note that in the return statement above, we actually return a vector containing the necessary results, and not just the variables separated by a comma (since the return() function can return only a single R object). Instead of a vector, we could also return a list, especially if the results to be returned are supposed to be of different data types.

Calling a Function in R

In all the above examples, we actually already called the created functions many times. To do so, we just put the punction name and added the necessary arguments inside the parenthesis. In R, function arguments can be passed by position, by name (so-called named arguments), by mixing position-based and name-based matching, or by omitting the arguments at all.

If we pass the arguments by position, we need to follow the same sequence of arguments as defined in the function:

subtract_two_nums <- function(x, y){
    x - y
}
print(subtract_two_nums(3, 1))
[1] 2

In the above example, x is equal to 3 and y – to 1, and not vice versa.

If we pass the arguments by name, i.e., explicitly specify what value each parameter defined in the function takes, the order of the arguments doesn't matter:

subtract_two_nums <- function(x, y){
    x - y
}
print(subtract_two_nums(x=3, y=1))
print(subtract_two_nums(y=1, x=3))
[1] 2
[1] 2

Since we explicitly assigned x=3 and y=1, we can pass them either as x=3, y=1 or y=1, x=3 – the result will be the same.

It's possible to mix position- and name-based matching of the arguments. Let's look at the example of the function for calculating BMR (basal metabolic rate), or daily consumption of calories, for women based on their weight (in kg), height (in cm), and age (in years). The formula that will be used in the function is the Mifflin-St Jeor equation:

calculate_calories_women <- function(weight, height, age){
    (10 * weight) + (6.25 * height) - (5 * age) - 161
}

Now, let's calculate the calories for a woman 30 years old, with a weight of 60 kg and a height of 165 cm. However, for the age parameter, we'll pass the argument by name and for the other two parameters, we'll pass the arguments by position:

print(calculate_calories_women(age=30, 60, 165))
[1] 1320.25

In the case like above (when we mix matching by name and by position), the named arguments are extracted from the whole succession of arguments and are matched first, while the rest of the arguments are matched by position, i.e., in the same order as they appear in the function definition. However, this practice isn't recommended and can lead to confusion.

Finally, we can omit some (or all) of the arguments at all. This can happen if we set some (or all) of the parameters to default values inside the function definition. Let's return to our calculate_calories_women function and set the default age of a woman as 30 y.o.:

calculate_calories_women <- function(weight, height, age=30){
    (10 * weight) + (6.25 * height) - (5 * age) - 161
}
print(calculate_calories_women(60, 165))
[1] 1320.25

In the above example, we passed only two arguments to the function, despite it having three parameters in its definition. However, since one of the parameters has a default value assigned to it when we pass two arguments to the function, R interprets that the third missing argument should be set to its default value and makes the calculations accordingly, without throwing an error.

When calling a function, we usually assign the result of this operation to a variable, to be able to use it later:

circumference <- function(r){
    2*pi*r
}
circumference_radius_5 <- circumference(5)
print(circumference_radius_5)
[1] 31.41593

Using Functions Inside Other Functions

Inside the definition of an R function, we can use other functions. We've already seen such an example earlier, when we used the built-in mean() and median() functions inside a user-defined function mean_median:

mean_median <- function(vector){
    mean <- mean(vector)
    median <- median(vector)
    return(c(mean, median))
}

It's also possible to pass the output of calling one function directly as an argument to another function:

radius_from_diameter <- function(d){
    d/2
}

circumference <- function(r){
    2*pi*r
}

print(circumference(radius_from_diameter(4)))
[1] 12.56637

In the above piece of code, we created two simple functions first: for calculating the radius of a circle given its diameter and for calculating the circumference of a circle given its radius. Since originally we knew only the diameter of a circle (equal to 4), we called the radius_from_diameter function inside the circumference function to calculate first the radius from the provided value of diameter and then calculate the circumference of the circle. While this approach can be useful in many cases, we should be careful with it and avoid passing too many functions as arguments to other functions since it can affect the code readability.

Finally, functions can be nested, meaning that we can define a new function inside another function. Let's say that we need a function that sums up the circle areas of 3 non-intersecting circles:

sum_circle_ares <- function(r1, r2, r3){
    circle_area <- function(r){
        pi*r^2
    }
    circle_area(r1) + circle_area(r2) + circle_area(r3)
}

print(sum_circle_ares(1, 2, 3))
[1] 43.9823

Above, we defined the circle_area function inside the sum_circle_ares function. We then called that inner function three times (circle_area(r1), circle_area(r2), and circle_area(r3)) inside the outer function to calculate the area of each circle for further summing up those areas. Now, if we try to call the circle_area function outside the sum_circle_ares function, the program throws an error, because the inner function exists and works only inside the function where it was defined:

print(circle_area(10))
Error in circle_area(10): could not find function "circle_area"
Traceback:

1. print(circle_area(10))

When nesting functions, we have to keep in mind two things:

  1. Similar to creating any function, the inner function is supposed to be used at least 3 times inside the outer function. Otherwise, it isn't viable to create it.
  2. If we want to be able to use the function independent of the bigger function, we should create it outside the bigger function instead of nesting these functions. For example, if we were going to use the circle_area function outside the sum_circle_ares function, we would write the following code:
circle_area <- function(r){
    pi*r^2
}

sum_circle_ares <- function(r1, r2, r3){
    circle_area(r1) + circle_area(r2) + circle_area(r3)
}

print(sum_circle_ares(1, 2, 3))
print(circle_area(10))
[1] 43.9823
[1] 314.1593

Here, we go again used the circle_area function inside the sum_circle_ares function. However, this time, we were also able to call it outside that function and get the result rather than an error.

Summary

In this tutorial, we learned quite a few aspects related to functions in R. In particular, we discussed the following:

  • Types of functions in R
  • Why and when we would need to create a function
  • Some of the most popular built-in functions in R and what they are used for
  • How to define a user-defined function
  • The main components of a function
  • The best practices for naming a function
  • When and how to set function parameters to default values
  • The function body and the nuances of its syntax
  • When we should explicitly include the return statement in the function definition
  • How to call an R function with named, positional, or mixed arguments
  • What happens when we mix positional and named arguments — and why this isn't a good practice
  • When we can omit some (or all) of the arguments
  • How to apply functions inside other functions
  • How to pass a function call as an argument to another function
  • When and how to nest functions

With these skills and information in hand, you're ready to start creating and using functions in R.

Elena Kosourova

About the author

Elena Kosourova

Elena is a petroleum geologist and community manager at Dataquest. You can find her chatting online with data enthusiasts and writing tutorials on data science topics. Find her on LinkedIn.