How to Append Rows to a DataFrame in R (with 7 Code Examples)
In this tutorial, we'll cover how to append to a DataFrame in R — one of the most common modifications performed on a DataFrame.
A DataFrame is one of the essential data structures in the R programming language. It's known to be very flexible because it can contain many data types and is easy to modify. One of the most typical modifications performed on a DataFrame in R is adding new observations (rows) to it.
In this tutorial, we'll discuss the different ways of appending one or more rows to a DataFrame in R.
Before starting, let's create a simple DataFrame as an experiment:
super_sleepers_initial <- data.frame(animal=c('koala', 'hedgehog'),
country=c('Australia', 'Italy'),
avg_sleep_hours=c(21, 18),
stringsAsFactors=FALSE)
print(super_sleepers_initial)
animal | country | avg_sleep_hours | |
1 | koala | Australia | 21 |
2 | hedgehog | Italy | 18 |
Note: when creating the above DataFrame, we added the optional parameter
stringsAsFactors=FALSE
. While by default this parameter isTRUE
, in the majority of cases (unless we have no column of character data type), it's strongly recommended to set it toFALSE
to suppress the default conversion of character to factor data type and hence avoid undesired side effects. As an experiment, you can remove this parameter from the above piece of code, run this and the subsequent code cells, and observe the results.
Appending a Single Row to a DataFrame in R
Using rbind()
To append one row to a DataFrame in R, we can use the rbind()
built-in function, which stands for "row-bind". The basic syntax is the following:
dataframe <- rbind(dataframe, new_row)
Note that in the above syntax, by new_row
we'll most probably understand a list rather than a vector, unless all the columns of our DataFrame are of the same data type (which isn't frequently the case).
Let's reconstruct a new DataFrame super_sleepers
from the initial one super_sleepers_initial
and append one more row to it:
# Reconstructing the super_sleepers
dataframe
super_sleepers <- super_sleepers_initial
super_sleepers <- rbind(super_sleepers, list('sloth', 'Peru', 17))
print(super_sleepers)
animal | country | avg_sleep_hours | |
1 | koala | Australia | 21 |
2 | hedgehog | Italy | 18 |
3 | sloth | Peru | 17 |
The new row was appended to the end of the DataFrame.
It's important to keep in mind that here and in all the subsequent examples, the new row (or rows) must reflect the structure of the DataFrame to which it's appended, meaning here that the length of the list has to be equal to the number of columns in the DataFrame, and the succession of data types of the items in the list has to be the same as the succession of data types of the DataFrame variables. In the opposite case, the program will throw an error.
Using nrow()
Another way to append a single row to an R DataFrame is by using the nrow()
function. The syntax is as follows:
dataframe[nrow(dataframe) + 1,] <- new_row
This syntax literally means that we calculate the number of rows in the DataFrame (nrow(dataframe)
), add 1 to this number (nrow(dataframe) + 1
), and then append a new row new_row
at that index of the DataFrame (dataframe[nrow(dataframe) + 1,]
) — i.e., as a new last row.
Just as before, our new_row
will most probably have to be a list rather than a vector, unless all the columns of the DataFrame are of the same data type, which is not common.
Let's add one more "super-sleeper" to our table:
super_sleepers[nrow(super_sleepers) + 1,] <- list('panda', 'China', 10)
print(super_sleepers)
animal | country | avg_sleep_hours | |
1 | koala | Australia | 21 |
2 | hedgehog | Italy | 18 |
3 | sloth | Peru | 17 |
4 | panda | China | 10 |
Again, the new row was appended to the end of the DataFrame.
Using add_row()
of tidyverse
What if, instead, we want to add a new row not to the end of the DataFrame but at some specific index of it? For example, we found out that tigers sleep 16 hours daily, (i.e., more than pandas in our rating, so we need to insert this observation as the second to the end row of the DataFrame). In this case, using the base R isn't enough, but we can use the add_row()
function of the tidyverse
R package (we may need to install it, if it isn't installed yet, by running install.packages("tidyverse")
):
library(tidyverse)
super_sleepers <- super_sleepers
country='India',
avg_sleep_hours=16,
.before=4)
print(super_sleepers)
animal | country | avg_sleep_hours | |
1 | koala | Australia | 21 |
2 | hedgehog | Italy | 18 |
3 | sloth | Peru | 17 |
4 | tiger | India | 16 |
5 | panda | China | 10 |
Here, we should note the following:
- We passed the same names of the columns in the same order as in the existing DataFrame and assigned the corresponding new values to them.
- After that sequence, we added the
.before
optional parameter and specified the necessary index. If we didn't do that, the row would be added by default to the end of the DataFrame. Alternatively, we could use the.after
optional parameter and assign to it the index of the row after which to insert the new observation. - We used the assignment operator
<-
to save the modifications applied to the current DataFrame.
Appending Multiple Rows to a DataFrame in R
Using rbind()
Often, we need to append not one but multiple rows to an R DataFrame. The simplest method here is again to use the rbind()
function. More precisely, in this case, we practically need to combine two DataFrames: the initial one and the one containing the rows to be appended.
dataframe_1 <- rbind(dataframe_1, dataframe_2)
To see how it works, let's reconstruct a new DataFrame super_sleepers
from the initial one super_sleepers_initial
, print it out to recall what it looks like, and append two new rows to its end:
# Reconstructing the super_sleepers
DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n') # printing an empty line
# Creating a new DataFrame with the necessary rows
super_sleepers_2 <- data.frame(animal=c('squirrel', 'panda'),
country=c('Canada', 'China'),
avg_sleep_hours=c(15, 10),
stringsAsFactors=FALSE)
# Appending the rows of the new DataFrame to the end of the existing one
super_sleepers <- rbind(super_sleepers, super_sleepers_2)
print(super_sleepers)
animal | country | avg_sleep_hours | |
1 | koala | Australia | 21 |
2 | hedgehog | Italy | 18 |
animal | country | avg_sleep_hours | |
1 | koala | Australia | 21 |
2 | hedgehog | Italy | 18 |
3 | squirrel | Canada | 15 |
4 | tiger | India | 16 |
As a reminder, the new rows must reflect the structure of the DataFrame to which they are appended, meaning that the number of columns in both DataFrames, the column names, their succession, and data types have to be the same.
Using nrow()
Alternatively, we can use the nrow()
function to append multiple rows to a DataFrame in R. However, here this approach is not recommended because the syntax becomes very clumsy and difficult to read. Indeed, in this case, we need to calculate the start and the end index. For this, we have to do much more manipulation with the nrow()
function. Technically, the syntax is as follows:
dataframe_1[(nrow(dataframe_1) + 1):(nrow(dataframe_1) + nrow(dataframe_2)),] <- dataframe_2
Let's reconstruct our super_sleepers
from super_sleepers_initial
and append to them the rows of the already existing DataFrame super_sleepers_2
:
# Reconstructing the super_sleepers
dataframe
super_sleepers <- super_sleepers_initial
super_sleepers[(nrow(super_sleepers) + 1):(nrow(super_sleepers) + nrow(super_sleepers_2)),] <- super_sleepers_2
print(super_sleepers)
animal | country | avg_sleep_hours | |
1 | koala | Australia | 21 |
2 | hedgehog | Italy | 18 |
3 | squirrel | Canada | 15 |
4 | panda | China | 10 |
We obtained the same DataFrame as in the previous example and observed that the previous approach is much more elegant.
Using add_row()
of tidyverse
Finally, we can use again the add_row()
function of the tidyverse
package. This approach is more flexible since we can either append the new rows at the end of the current DataFrame or insert them before/after a certain row specified by its index.
Let's insert the observations for sloth and tiger between hedgehog and squirrel in the current DataFrame super_sleepers
:
library(tidyverse)
super_sleepers <- super_sleepers
country=c('Peru', 'India'),
avg_sleep_hours=c(17, 16),
.before=3)
print(super_sleepers)
animal | country | avg_sleep_hours | |
1 | koala | Australia | 21 |
2 | hedgehog | Italy | 18 |
3 | sloth | Peru | 17 |
4 | tiger | India | 16 |
5 | squirrel | Canada | 15 |
6 | panda | China | 10 |
Note that to obtain the above result, we could use .after=2
instead of .before=3
.
Conclusion
In this tutorial, we learned how to append a single row (or multiple rows) to a DataFrame in R — or how to insert it (or them) at a specific index of the DataFrame. In particular, we considered 3 approaches: using the rbind()
or nrow()
functions of the base R, or the add_row()
function of the tidyverse
R package. We paid special attention to the best use cases for each method and the nuances that have to be taken into account in various situations.