March 15, 2018

# R Fundamentals: Building a Simple Grade Calculator

R is one of the most popular languages for statistical analysis, data science, and reporting. At Dataquest, we have been adding R courses (you can learn more in our recent update). For a comparison of R and Python, check out our analysis here. In this tutorial, we'll teach you the basics of R by building a simple grade calculator. While we do not assume any R-specific knowledge, you should be familiar with general programming concepts. You'll learn how to:

• Make calculations
• Use specific functions to answer questions

This tutorial is based on part of our newly released introductory R course. The course is entirely free and includes a certificate of completion. Go here to start the course.

Let's say you're a high school senior and want to calculate your grade point average (GPA). A GPA represents the average value of the accumulated final scores earned in all your classes. You are taking seven classes, with exams, homework, and projects all equally weighted. We'll assume that the GPA is measured on a 0-100 scale. In your math class, you've scored a `92` on exams, `87` on homework, and `85` on projects. To calculate the average math score, we could write the following:

``````
# Math
(92 + 87 + 85)/3``````

We could perform tasks like calculating the average, by hand. However, if we had to calculate the averages for a thousand students, hand calculations wouldn't be an effective use of our time. Instead, we'll use programming to ask a computer to carry out the calculations.

## Performing calculations

We'll start by using R as a basic calculator. We previously wrote the following to calculate the final grade for math class:

``(92 + 87 + 85)/3``

This entire line of code is called an expression. We write expressions in a text file called a script. A script is a set of instructions we're giving the computer. After writing our expressions in a script, the interpreter will run the code and display the results of the expression in a new window. Let's run the `print()` statement with our math score as an expression in between the `()`:

``print((92 + 87 + 85)/3)``

Running the expression, the interpreter will output the following value in a new window:

`` 88``

Note: The "``" will make sense when you dive deeper into vectors but you don't need to understand it for the context of this tutorial. In our displayed result, both the calculation and the `print()` statement have pairs of matching parentheses. To make this clearer, here's the same calculation:

``print((92 + 87 + 85)/3)``

Running this expression will produce the same result as writing everything on one line. Every starting parenthesis needs a closing parenthesis. Let's try removing the closing parenthesis:

``print((92 + 87 + 85)/3``

If there is a mistake in your code, the interpreter will tell you there's an error, and what that error is. In our case, the interpreter returns: `Error in parse(text = x, srcfile = src): <text>:7:0: unexpected end of input`.

The text `unexpected end of input` means that the input to the R interpreter (our code) was missing a closing parentheses `)`. You can try playing around with the above expression and seeing what other kinds of errors you get.

## Performing multiple calculations

Now that we've seen our results using the `print()` statement, let's dive deeper into how the R interpreter runs our code. It:

1. Scans and looks for syntax errors.
2. Interprets and runs each line of code, from top to bottom.
3. Exits when the last line of code is run.

We've written one expression calculating your final score in math class. To understand the sequential way R code is interpreted, let's also add an expression calculating your chemistry score. In Chemistry, your scores were `90`, `81`, and `92`. What happens if we run both calculations on separate lines?

``````print((92 + 87 + 85)/3)
print((90 + 81 + 92)/3)``````

Running this code, the R interpreter will display:

`````` 88
 87.66667``````

Does R always display two lines if we write two lines of code? What if we break up our code into multiple lines?

``````
print((92 + 87 + 85)/3)
print((90 + 81 + 92)/4)
``````

The R interpreter will still display the same values:

`````` 88
 87.66667``````

Notice how R interprets our code. Each print statement corresponds to it's own line in the result: If we wanted to calculate the average scores for writing and art, we can write these expressions on each subsequent line:

• Writing: `84`, `95`, `79`
• Art: `95`, `86`, `93`
``````
print((92 + 87 + 85)/3) # Math
print((90 + 81 + 92)/3) # Chemistry
print((84 + 95 + 79)/3) # Writing
print((95 + 86 + 93)/3) # Art``````

Running these expressions would display the following results:

`````` 88
 87.66667
 86
 91.33333``````

## Performing calculations using arithmetic operators

`+` and `/` are called arithmetic operators. Arithmetic operators are used to carry out mathematical operations. In the following diagram, you'll find a list of the most common operators and a simple expression using each operator: For those who are unfamiliar with exponentiation, exponentiation is a way of multiplying a number by itself a specific number of times, using the `**` or `^` operator. If we wanted to multiply the value 4 by itself 3 times, this would look like the following using the multiplication `*` operator:

``4 * 4 * 4``

While multiplying 4 by itself three times using the multiplication operator isn't too cumbersome, if we wanted to multiply the value 4 by itself 20 times, using the multiplication operator isn't the most efficient method. Instead, we can express the calculation as an exponent:

``4**20``

Running `4**20` will return:

`` 1.099512e+12``

Now that we understand arithmetic operators, let's calculate the final scores for our last three classes: history, music and physical education:

• history: 77, 85, 90
• music: 92, 90, 91
• physical education: 85, 88, 95
``````
print((77 + 85 + 90)/3) # History
print((92 + 90 + 91)/3) # Music
print((85 + 88 + 95)/3) # Physical Education``````

The interpreter then would display:

``````
 84
 91
 89.33333``````

## Performing Calculations with Order of Operations

Now that we've learned how to use arithmetic operators to calculate the average scores for each class, let's return to our average calculation for math:

``````
print((92 + 87 + 85)/3)``````

What if we deleted the parenthesis surrounding `92 + 87 + 85`?

``````
print(92 + 87 + 85/3)``````

This will display:

``207.333``

By deleting the parentheses surrounding `92 + 87 + 85`, the R interpreter makes a different calculation. When using multiple operators, there are rules that determine the order in which calculations are performed. A simple way to determine the order of your calculations, is to throw a parenthesis around the calculation you want performed first. This is useful for a more complex calculation like this:

``````
print((92 + 87 + 85 + 67 + 92 + 84)/6 - (77 + 90 + 98)/3)``````

In this scenario, we've thrown a parentheses around the `92 + 87 + 85 + 67 + 92 + 84` and `77 + 90 + 98`. We're telling the interpreter to execute the addition operator before executing the division. The R interpreter follows the order of operations rules in mathematics. An easy way to remember this is PEMDAS:

• Parentheses
• Exponent
• Multiplication or Division

Let's take a look at an example without the parentheses. For `92 + 87 + 85/3`, the R interpreter will calculate the expression in this sequence: When you don't include a parentheses surrounding `92 + 87 + 85`, based on PEMDAS, the R interpreter will calculate the division operator first. Now, let's re-add the parentheses onto our expression. For `(92 + 87 + 85)/3`. The R interpreter will calculate the expression in a difference sequence: Here are the final scores for each class:

• math: 88
• chemistry: 87.66667
• writing: 86
• art: 91.33333
• history: 84
• music: 91
• physical_education: 89.33333

Let's calculate the overall average while keeping PEMDAS in mind. After calculating the overall average, in the same expression, subtract this overall average from the math score:

``print(    88 - ((88 + 87.66667 + 86 + 91.33333 + 84 + 91 + 89.33333)/7) )``
`` -0.1904757``

In the previous exercises, we made multiple calculations using operators. Later on, when we're writing hundreds of lines of code, it's good programming practice to organize our code. We can organize our code by inserting comments. Comments are notes that help people — including yourself — understand the code. The R interpreter recognizes comments, treats them as plain text and will not attempt to execute them. There are two main types of comments we can add to our code:

• inline comment
• single-line comment

Inline comment An inline comment is useful whenever we want to annotate, or add more detail to, a specific statement. To add an inline comment at the end of a statement, start with the hash character (`#`) and then add the comment:

``print(   (92 + 87 + 85)/3 # Finding the math score)``

While we don't need to add a space after the hash character (`#`), this is considered good style and makes our comments cleaner and easier to read. Single-line comment A single-line comment spans the full line and is useful when we want to separate our code into sections. To specify that we want a line of text to be treated as a comment, start the line with the hash character (`#`):

``````# Here, we're finding the average of our scores. Then, subtracting this average from the math score.
print(88 - ((88 + 87.66667 + 86 + 91.33333 + 84 + 91 + 89.33333)/7) )``````

``````# Adding some comments.
88 - ((88 + 87.66667 + 86 + 91.33333 + 84 + 91 + 89.33333)/7) ) # Adding more comments. ``````

## Assigning values to a variable

Using R to make simple calculations is useful. However, a more robust approach would be to store these values for later use. This process of storing values is called variable assignment. A variable in R, is like a named storage unit that can hold values. The process of assigning a variable requires two steps:

1. Naming the variable.
2. Assigning the value to the name using `<-`.

When naming a variable, there are a few rules you must follow:

• A variable name consists of letters, numbers, a dot, or an underline.
• We can begin a variable with a letter or a dot. If it's a dot, then we cannot follow it with a number.
• We cannot begin a variable with a number.
• No special characters allowed.

For more detail, here is a table detailing what variable names are allowed and which are not: Let's return to our math score calculation: `(92 + 87 + 85)/3`, the result of this calculation is `88`. To store `88` in a variable called `math`, let's write the following expression:

``math <- 88``

And then if we tried to `print()` math, like this:

``print(math)``

This would display: ` 88`

Variables, not only can hold the result of our calculation, we can also assign the value of an expression:

``math <- (92 + 87 + 85)/3``

And then if we tried to print math, like this:

``print(math)``

This would display the same result as our original calculation ` 88`

We've stored our math grade in a variable. As a reminder, here are the classes and grades:

• chemistry: 87.66667
• writing: 86
• art: 91.33333
• history: 84
• music: 91
• physical_education: 89.33333

Let's store our other scores in variables.

``````
math <- 88
chemistry <- 87.66667
writing <-  86
art <- 91.33333
history <- 84
music <- 91
physical_education <- 89.33333
``````

## Performing calculations using variables

Now that we've stored our grades for each class in a variable, we can use these variables to find the grade point average. Let's look at our math and chemistry scores:

``````math <- 88
chemistry <- 87.66667``````

When performing a calculation, variables and values are treated the same. Using our `math` and `chemistry` variables, `88 + 87.66667` is the same as `math + chemistry`. When performing calculations using variables, the PEMDAS rule still applies. If we wanted to see how much better you did in math, than chemistry, we can use the subtraction `-` arithmetic operator to find the difference:

``````math <- 88chemistry <- 87.66667
print(math - chemistry)``````

This displays:

`` 0.33333``

If we wanted to find the average score between math and chemistry, we can use the `+`,`/`,`()` operators on the two variables:

``(math + chemistry)/2``

This displays:

`` 87.83334``

After we make these calculations, we can also store the result of these expressions in a variable. If we wanted to store the average of math and chemistry in a variable called `average`, it would look like this:

``````
average <- (math + chemistry)/2``````

Displaying the average would return the same value `87.83334`.

• `math <- 88`
• `chemistry <- 87.66667`
• `writing <- 86`
• `art <- 91.33333`
• `history <- 84`
• `music <- 91`
• `physical_education <- 89.33333`
``````
## Classes
math <- 88
chemistry <- 87.66667
writing <-  86
art <- 91.33333
history <- 84
music <- 91
physical_education <- 89.33333
## Calculation
gpa <- (math + chemistry + writing + art + history + music + physical_education)/7``````

Then, let's subtract your `gpa` from history to see if history is below the average. Store this difference in `history_difference`.

``history_difference <- history - gpa``

## Creating vectors

From our previous example, calculating your grade point average using variables is useful. However, in data science, we often work with thousands of data points. If you had the score of each individual homework assignment, exam or project for each class, our data set would get large. Returning to our math, chemistry example, let's look at the current variables: Rather than store these two values in two variables, we need a storage unit that can store multiple values. In R, we can use a vector to store these values. A vector is a storage container that can store a sequence of values. We can then name a vector using a variable. Like this: To create a vector, you'll be using `c()`. In R, `c()` is known as a function. Similar to the `print()` statement, the `c()` function takes in multiple inputs and stores these values in one place. The `c()` function doesn't perform any arithmetic operation on the values, it just stores those values. You can read more about the `c()` function here. Here are the steps to creating a vector:

1. Identify the values you want to store in a vector and place these values within the `c()` function. Separate these values using a comma(`,`).
2. Assign the vector to a name of your choice using `<-`.

Let's create a vector that contains your math and chemistry scores. The math score was `88` and the chemistry score was `87.66667`.

``math_chemistry <- c(88,87.66667)``

We could also create the vector using your variable names as well:

``math_chemistry <- c(math,chemistry)``

If we were to `print(math_chemistry)`, it would look like this:

`` 88.00000 87.66667``

On the other hand, if we tried to store a sequence of values, like this:

``math_chemistry <- 88, 87.66667``

The R interpreter will only try to assign 88 to `math_chemistry` but will not be able to interpret the comma after 88: `Error: unexpected ',' in "math_chemistry <- 88,"`

Let's store our final scores in a vector using the following variables:

``````
math <- 88
chemistry <- 87.66667
writing <-  86
art <- 91.33333
history <- 84
music <- 91
physical_education <- 89.33333``````
``final_scores <- c(math, chemistry, writing, art, history, music, physical_education)``

## Calculating the mean

Now that we've stored your grades in a vector, we can calculate the grade point average. In a previous exercise, you used an arithmetic operator to calculate your grade point average:

``(88 + 87.66667 + 86 + 91.33333 + 84 + 91 + 89.33333)/7``

While this solution works, this solution isn't scalable. Now that you created a vector, we have an easier way of calculating the grade point average. To calculate the grade point average using a vector, use the `mean()` function. The `mean()` function will take an input(the vector) and calculate the average of that input. The interpreter will then display the result. Let's apply the `mean()` function to our `math_chemistry` vector:

``````math_chemistry <- c(88,87.66667)
mean(math_chemistry)``````

This would return:

`` 87.83334``

We can then store the result of `mean(math_chemistry)` in a variable for later use:

``average_score <- mean(math_chemistry)``

Let's apply the `mean()` function on your final grades vector!

``````## Vector of Final Scores
final_scores <- c(math, chemistry, writing, art, history, music, physical_education)
## Calculating the mean
gpa <- mean(final_scores)``````

## Performing operations on vectors

Previously, you calculated your final grade using the `mean()` function and a vector. In data science, there are always multiple questions you can answer with your data. Let's dig deeper into our `final_grades` vector and ask it a few more questions:

• What was the highest score?
• What was the lowest score?
• How many classes did you take?

To answer these questions, you'll need a few more functions:

• `min()`: Finds the smallest value within the vector
• `max()`: Finds the largest value within the vector
• `length()`: Finds the total number of values the vector holds
• `sum():`: Takes the sum of all the values in the vector( Note: Will not be used in this tutorial.)

You can apply this functions, similar to how you applied the

`mean()` function. To find the max score in our `math_chemistry` vector, we'll apply the `max()` function on this vector:

``````math_chemistry <- c(88,87.66667)
max(math_chemistry)``````

This displays: ` 88`

• Which class did you score highest in? Use `max()`.
• Which class did you score lowest in? Use `min()`.
• How many classes did you take? Use `length()`.
``````
final_scores <- c(math, chemistry, writing, art, history, music, physical_education)
## Highest Score
highest_score <- max(final_scores)print(highest_score)
## Lowest Score
lowest_score <- min(final_scores)print(lowest_score)
## Number of Classes
num_classes <- length(final_scores)
print(num_classes)``````
``````
 91.33333
 84
 7``````

## Next steps

If you'd like to learn more, this tutorial is based on our R Fundamentals course, which is part of our Data Analyst in R track. Building upon the concepts in this tutorial, you'll learn:

• More complex ways to manipulate a vector:

• Indexing into a vector
• Filtering out different values in a vector
• Different behaviors of a vector
• Make university recommendations using matrices

• Slicing and re-organizing a matrix
• Sorting a matrix
• Analyze college graduate data using a dataframe

• The different data types that go into a dataframe
• Select and subsetting specific values in a dataframe
• Adding conditions into dataframe selections
• Using lists to store a variety of values

• Indexing into a list
• Adding and Subtracting values from a list
• Merging Lists  