June 14, 2022

How to Add a Column to a DataFrame in R (with 18 Code Examples)

In this tutorial, you'll learn one of most common operations used for manipulating DataFrames in R — adding columns.

A DataFrame is one of the basic data structures of the R programming language. It is also a very versatile data structure since it can store multiple data types, be easily modified, and easily updated.

In this tutorial, we'll consider one of the most common operations used for manipulating DataFrames in R: how to add a column to a DataFrame in the base R.

What is a Dataframe in R?

Technically speaking, a DataFrame in R is a specific case of a list of vectors of the same length, where different vectors can be (and usually are) of different data types. Since a DataFrame has a tabular, 2-dimensional form, it has columns (variables) and rows (data entries).

Adding a Column to a DataFrame in R

We may want to add a new column to an R DataFrame for various reasons: to calculate a new variable based on the existing ones, to add a new column based on the available one but with a different format (keeping in this way both columns), to append an empty or placeholder column for further filling it, to add a column containing completely new information.

Let's explore different ways of adding a new column to a DataFrame in R. For our experiments, we'll be mostly using the same DataFrame called super_sleepers which we'll reconstruct each time from the following initial DataFrame:

super_sleepers_initial <- data.frame(rating=1:4, 
                                     animal=c('koala', 'hedgehog', 'sloth', 'panda'), 
                                     country=c('Australia', 'Italy', 'Peru', 'China'))
print(super_sleepers_initial)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

Our task will be to add to this DataFrame a new column called avg_sleep_hours representing the average time in hours that each of the above animals sleeps per day, according to the following scheme:

Animal Avg hrs of sleep per day
koala 21
hedgehog 18
sloth 17
panda 10

For some examples, we'll experiment with adding two other columns: avg_sleep_hours_per_year and has_tail.

Now, let's dive in.

Adding a Column to a DataFrame in R Using the \$ Symbol

Since a DataFrame in R is a list of vectors where each vector represents an individual column of that DataFrame, we can add a column to a DataFrame just by adding the corresponding new vector to this "list". The syntax is as follows:

dataframe_name$new_column_name <- vector

Let's reconstruct our super_sleepers DataFrame from the initial super_sleepers_initialDataFrame (we'll do so for each subsequent experiment) and add to it a column called avg_sleep_hours represented by the vector c(21, 18, 17, 10):

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')  # printing an empty line

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame
super_sleepers$avg_sleep_hours <- c(21, 18, 17, 10)
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia              21
2      2 hedgehog     Italy              18
3      3    sloth      Peru              17
4      4    panda     China              10

Note that the number of items added in the vector must be equal to the current number of rows in a DataFrame, otherwise, the program throws an error:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Attempting to add a new column avg_sleep_hours to the super_sleepers DataFrame
# with the number of items in the vector NOT EQUAL to the number of rows in the DataFrame
super_sleepers$avg_sleep_hours <- c(21, 18, 17)
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

Error in $<-.data.frame(*tmp*, avg_sleep_hours, value = c(21, 18, : replacement has 3 rows, data has 4
Traceback:

1. $<-(*tmp*, avg_sleep_hours, value = c(21, 18, 17))

2. $<-.data.frame(*tmp*, avg_sleep_hours, value = c(21, 18, 
 . 17))

3. stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
 .     "replacement has %d rows, data has %d"), N, nrows), domain = NA)

Instead of assigning a vector, we can assign a single value, whether numeric or character, for all the rows of a new column:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame and assigning it to 0
super_sleepers$avg_sleep_hours <- 0
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia               0
2      2 hedgehog     Italy               0
3      3    sloth      Peru               0
4      4    panda     China               0

In this case, the new column plays a role of a placeholder for the real values of the specified data type (in the above case, numeric) that we can insert later.

Alternatively, we can calculate a new column based on the existing ones. Let's first add the avg_sleep_hours column to our DataFrame and then calculate a new column avg_sleep_hours_per_year from it. We want to know how many hours these animals sleep on average per year:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame
super_sleepers$avg_sleep_hours <- c(21, 18, 17, 10)
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours_per_year calculated from avg_sleep_hours
super_sleepers$avg_sleep_hours_per_year <- super_sleepers$avg_sleep_hours * 365
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia              21
2      2 hedgehog     Italy              18
3      3    sloth      Peru              17
4      4    panda     China              10

  rating   animal   country avg_sleep_hours avg_sleep_hours_per_year
1      1    koala Australia              21                     7665
2      2 hedgehog     Italy              18                     6570
3      3    sloth      Peru              17                     6205
4      4    panda     China              10                     3650

Also, it's possible to copy a column from one DataFrame to another using the following syntax: df1$new_col <- df2$existing_col. Let's replicate such a situation:

# Creating the super_sleepers_1 dataframe with the only column rating
super_sleepers_1 <- data.frame(rating=1:4)
print(super_sleepers_1)
cat('\n\n')

# Copying the animal column from super_sleepers_initial to super_sleepers_1
# Note that in the new DataFrame, the column is called ANIMAL instead of animal 
super_sleepers_1$ANIMAL <- super_sleepers_initial$animal
print(super_sleepers_1)
  rating
1      1
2      2
3      3
4      4

  rating   ANIMAL
1      1    koala
2      2 hedgehog
3      3    sloth
4      4    panda

The drawback of this approach (i.e., using the \$ operator to append a column to a DataFrame) is that we can't add in this way a column whose name contains white spaces or special symbols. Indeed, it can't contain anything that is not a letter (upper- or lowercase), a number, a dot, or an underscore. Also, this approach doesn't work for adding multiple columns.

Adding a Column to a DataFrame in R Using Square Brackets

Another way of adding a new column to an R DataFrame is more "DataFrame-style" rather than "list-style": by using bracket notation. Let's see how it works:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame:
super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10)
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia              21
2      2 hedgehog     Italy              18
3      3    sloth      Peru              17
4      4    panda     China              10

In the piece of code above, we can substitute this line:

super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10)

This line can also be substituted:

super_sleepers[['avg_sleep_hours']] <- c(21, 18, 17, 10)

Lastly, this one can be substituted as well:

super_sleepers[,'avg_sleep_hours'] <- c(21, 18, 17, 10)

The result will be identical, those are just 3 different versions of the syntax.

As it was for the previous method, we can assign a single value instead of a vector to the new column:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame and assigning it to 'Unknown'
super_sleepers['avg_sleep_hours'] <- 'Unknown'
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia         Unknown
2      2 hedgehog     Italy         Unknown
3      3    sloth      Peru         Unknown
4      4    panda     China         Unknown

As an alternative, we can calculate a new column based on the existing ones:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame
super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10)
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours_per_year calculated from avg_sleep_hours
super_sleepers['avg_sleep_hours_per_year'] <- super_sleepers['avg_sleep_hours'] * 365
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia              21
2      2 hedgehog     Italy              18
3      3    sloth      Peru              17
4      4    panda     China              10

  rating   animal   country avg_sleep_hours avg_sleep_hours_per_year
1      1    koala Australia              21                     7665
2      2 hedgehog     Italy              18                     6570
3      3    sloth      Peru              17                     6205
4      4    panda     China              10                     3650

Using another option we can copy a column from another DataFrame:

# Creating the super_sleepers_1 dataframe with the only column rating
super_sleepers_1 <- data.frame(rating=1:4)
print(super_sleepers_1)
cat('\n\n')

# Copying the animal column from super_sleepers_initial to super_sleepers_1
# Note that in the new DataFrame, the column is called ANIMAL instead of animal 
super_sleepers_1['ANIMAL'] <- super_sleepers_initial['animal']
print(super_sleepers_1)
  rating
1      1
2      2
3      3
4      4

  rating   ANIMAL
1      1    koala
2      2 hedgehog
3      3    sloth
4      4    panda

The advantage of using square brackets over the $ operator to append a column to a DataFrame is that we can add a column whose name contains white spaces or any special symbols.

Adding a Column to a DataFrame in R Using the cbind() Function

The third way of adding a new column to an R DataFrame is by applying the cbind() function that stands for "column-bind" and can also be used for combining two or more DataFrames. Using this function is a more universal approach than the previous two since it allows adding several columns at once. Its basic syntax is as follows:

df <- cbind(df, new_col_1, new_col_2, ..., new_col_N)

The piece of code below adds the avg_sleep_hours column to the super_sleepers DataFrame:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame
super_sleepers <- cbind(super_sleepers, 
                        avg_sleep_hours=c(21, 18, 17, 10))
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia              21
2      2 hedgehog     Italy              18
3      3    sloth      Peru              17
4      4    panda     China              10

The next piece of code adds two new columns – avg_sleep_hours and has_tail – to the super_sleepers DataFrame at once:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding two new columns avg_sleep_hours and has_tail to the super_sleepers DataFrame
super_sleepers <- cbind(super_sleepers, 
                        avg_sleep_hours=c(21, 18, 17, 10), 
                        has_tail=c('no', 'yes', 'yes', 'yes'))
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours has_tail
1      1    koala Australia              21       no
2      2 hedgehog     Italy              18      yes
3      3    sloth      Peru              17      yes
4      4    panda     China              10      yes

Apart from adding multiple columns at once, another advantage of using the cbind() function is that it allows assigning the result of this operation (i.e., adding one or more columns to an R DataFrame) to a new DataFrame leaving the initial one unchanged:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Creating a new DataFrame super_sleepers_new based on super_sleepers with a new column avg_sleep_hours
super_sleepers_new <- cbind(super_sleepers,
                            avg_sleep_hours=c(21, 18, 17, 10),
                            has_tail=c('no', 'yes', 'yes', 'yes'))
print(super_sleepers_new)
cat('\n\n')
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours has_tail
1      1    koala Australia              21       no
2      2 hedgehog     Italy              18      yes
3      3    sloth      Peru              17      yes
4      4    panda     China              10      yes

  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

As it was for the previous two approaches, inside the cbind() function, we can assign a single value to the whole new column:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame and assigning it to 0.999
super_sleepers <- cbind(super_sleepers, 
                        avg_sleep_hours=0.999)
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia           0.999
2      2 hedgehog     Italy           0.999
3      3    sloth      Peru           0.999
4      4    panda     China           0.999

Another option allows us to calculate it based on the existing columns:

# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours to the super_sleepers DataFrame
super_sleepers <- cbind(super_sleepers, 
                        avg_sleep_hours=c(21, 18, 17, 10))
print(super_sleepers)
cat('\n\n')

# Adding a new column avg_sleep_hours_per_year calculated from avg_sleep_hours
super_sleepers <- cbind(super_sleepers, 
                        avg_sleep_hours_per_year=super_sleepers['avg_sleep_hours'] * 365)
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

  rating   animal   country avg_sleep_hours
1      1    koala Australia              21
2      2 hedgehog     Italy              18
3      3    sloth      Peru              17
4      4    panda     China              10

  rating   animal   country avg_sleep_hours avg_sleep_hours
1      1    koala Australia              21            7665
2      2 hedgehog     Italy              18            6570
3      3    sloth      Peru              17            6205
4      4    panda     China              10            3650

With the following option we can copy a column from another DataFrame:

# Creating the super_sleepers_1 DataFrame with the only column rating
super_sleepers_1 <- data.frame(rating=1:4)
print(super_sleepers_1)
cat('\n\n')

# Copying the animal column from super_sleepers_initial to super_sleepers_1
# Note that in the new DataFrame, the column is still called animal despite setting the new name ANIMAL 
super_sleepers_1 <- cbind(super_sleepers_1, 
                          ANIMAL=super_sleepers_initial['animal'])
print(super_sleepers_1)
  rating
1      1
2      2
3      3
4      4

  rating   animal
1      1    koala
2      2 hedgehog
3      3    sloth
4      4    panda

However, unlike the \$ operator and square bracket approaches, pay attention to the following two nuances here:

  1. We can't create a new column and calculate one more column based on the new one inside the same cbind() function. For example, the piece of code below will throw an error.
# Reconstructing the super_sleepers DataFrame
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')

# Attempting to add a new column avg_sleep_hours to the super_sleepers DataFrame 
# AND another new column avg_sleep_hours_per_year based on it
super_sleepers <- cbind(super_sleepers, 
                        avg_sleep_hours=c(21, 18, 17, 10), 
                        avg_sleep_hours_per_year=super_sleepers['avg_sleep_hours'] * 365)
print(super_sleepers)
  rating   animal   country
1      1    koala Australia
2      2 hedgehog     Italy
3      3    sloth      Peru
4      4    panda     China

Error in [.data.frame(super_sleepers, "avg_sleep_hours"): undefined columns selected
Traceback:

1. cbind(super_sleepers, avg_sleep_hours = c(21, 18, 17, 10), avg_sleep_hours_per_year = super_sleepers["avg_sleep_hours"] * 
 .     365)

2. super_sleepers["avg_sleep_hours"]

3. [.data.frame(super_sleepers, "avg_sleep_hours")

4. stop("undefined columns selected")
  1. When we copy a column from another DataFrame and try to give it a new name inside the cbind() function, this new name will be ignored, and the new column will be called exactly as it was called in the original DataFrame. For example, in the piece of code below, the new name ANIMAL was ignored, and the new column was called animal, just as in the DataFrame from which it was copied:
# Creating the super_sleepers_1 DataFrame with the only column rating
super_sleepers_1 <- data.frame(rating=1:4)
print(super_sleepers_1)
cat('\n\n')

# Copying the animal column from super_sleepers_initial to super_sleepers_1
# Note that in the new DataFrame, the column is still called animal despite setting the new name ANIMAL 
super_sleepers_1 <- cbind(super_sleepers_1, 
                          ANIMAL=super_sleepers_initial['animal'])
print(super_sleepers_1)
  rating
1      1
2      2
3      3
4      4

  rating   animal
1      1    koala
2      2 hedgehog
3      3    sloth
4      4    panda

Conclusion

In this tutorial, we discussed the various reasons why we may need to add a new column to an R DataFrame and what kind of information it can store. Then, we explored the three different ways of doing so: using the \$ symbol, square brackets, and the cbind() function. We considered the syntax of each of those approaches and its possible variations, the pros and cons of each method, possible additional functionalities, the most common pitfalls and errors, and how to avoid them. Also, we learned how to add multiple columns to an R dataframe at once.

It's worth noting that the discussed approaches are not the only ways to add a column to a DataFrame in R. For example, for the same purpose, we can use the mutate() or add_column() functions. However, to be able to apply these functions, we need to install and load specific R packages (dplyr and tibble, correspondingly) without them adding any extra functionalities to the operation of interest than those that we discussed in this tutorial. Instead, using the \$ symbol, square brackets, and the cbind() function doesn't require any installation to be implemented in the base R.

Elena Kosourova

About the author

Elena Kosourova

Elena is a petroleum geologist and community manager at Dataquest. You can find her chatting online with data enthusiasts and writing tutorials on data science topics. Find her on LinkedIn.

Learn data skills for free

Headshot Headshot

Join 1M+ learners

Try free courses