R Programming

R Programming Cheat Sheet

This cheat sheet provides a quick reference for essential R programming commands, helping you perform data manipulation, visualization, and statistical analysis with confidence. It covers foundational topics like installing packages and understanding R's data structures, alongside advanced tasks such as building models and applying machine learning techniques.

Each section includes concise syntax and practical examples to illustrate how R commands are used in real-world scenarios. You'll find guidance on working with vectors, lists, matrices, and data frames, performing common data wrangling tasks like filtering and summarizing, and creating visualizations such as histograms, bar plots, and boxplots. The cheat sheet also highlights R's capabilities for statistical analysis with commands like mean, lm, and cor.

Designed for clarity and accessibility, this resource is ideal for data analysts, statisticians, and programmers seeking to enhance their workflows in R. Whether you're exploring data, developing algorithms, or building reproducible reports, this cheat sheet ensures you can quickly apply R's powerful tools to your projects.


Have the Dataquest R Programming Cheat Sheet at your fingertips when you need it!

Table of Contents

Basics
Basics

INSTALL.PACKAGES, LIBRARY, ASSIGNMENT (<-), PRINT, CLASS

Data Structures
Data Structures

C, LIST, MATRIX, DATA.FRAME, DF$A, DF

Data Manipulation
Data Manipulation

FILTER, SELECT, MUTATE, SUMMARIZE, ARRANGE

Data Visualization
Data Visualization

PLOT, BARPLOT, HIST, BOXPLOT

Statistics & Probability
Statistics & Probability

MEAN, MEDIAN, SD, COR, LM

Programming
Programming

IF, FOR, WHILE, FUNCTION, APPLY

Machine Learning
Machine Learning

MATRICES, LINEAR MODEL, VISUALIZE, RESIDUALS

File IO
File IO

READ.CSV, WRITE.CSV, READRDS, SAVERDS, LIST.FILES

Basics

Basics

    Syntax for

    How to use

    Explained

    Install Package

    install.packages("dplyr")

    Installs the dplyr package.

    Load Package

    library(dplyr)

    Loads the dplyr package into the current R session.

    Assignment

    x <- 5

    Assigns value 5 to the variable x.

    Print Output

    print(x)

    Prints the value of x to the console.

    Literals and Data Types

    TRUE, 125, 12.5, "Hello"

    Examples of logical, integer, numeric, and character literals in R.

    Extracting Numbers from Strings

    library(readr)
    data_frame <- mutate(data_frame, column = parse_number(column))

    Uses parse_number to extract numeric values from string columns.

    Basic String Indexing

    str_sub("Dataquest is awesome", 1, 9)

    Extracts “Dataquest” as a substring by specifying start and end indices.

    Data Structures

    Data Structures

      Syntax for

      How to use

      Explained

      Create Vector

      c(1, 2, 3)

      Combines elements into a vector.

      Create List

      list(a=1, b="two")

      Creates a list with named elements.

      Create Matrix

      matrix(1:6, nrow=2)

      Creates a matrix with 2 rows and 3 columns.

      Create Data Frame

      data.frame(a=1:3, b=4:6)

      Creates a data frame with columns a and b.

      Access Element

      df$a | df[1, 1]

      Performs a logical OR operation between a column and a specific element.

      Loading stringr package

      library(stringr)

      Loads the stringr library to work with strings in R.

      Opening a JSON File

      f <- fromJSON('filename.json')

      Loads a JSON file into an R dataframe using the jsonlite package.

      Creating a List

      new_list <- list("data scientist", c(50000,40000), "programming experience")

      Defines a list containing diverse data types.

      Data Manipulation

      Data Manipulation

        Syntax for

        How to use

        Explained

        Filter Rows

        filter(df, a > 2)

        Filters rows where column a is greater than 2.

        Select Columns

        select(df, a, b)

        Selects specific columns by name.

        Mutate Columns

        mutate(df, c = a + b)

        Adds a new column c as sum of a and b.

        Summarize Data

        summarize(df, avg=mean(a))

        Calculates mean of column a and returns as avg.

        Arrange Rows

        arrange(df, desc(a))

        Sorts rows by column a in descending order.

        Importing Data

        data <- read_csv("name_of_file_with_data.csv")

        Imports dataset into R using the read_csv function from readr.

        Summing Values Across Rows

        df %>% mutate(new_column_name = rowSums(.[1:3]))

        Sums specified columns for each row and adds as a new column.

        Summing Values Across Columns

        df %>% bind_rows(tibble(total = colSums(across(everything()))))

        Sums specified rows for each column and adds as a new row.

        Importing CSV files

        dataframe <- read_csv("name_of_the_dataset.csv")

        Read CSV files into R using readr's read_csv() for efficient data import.

        Data Visualization

        Data Visualization

          Syntax for

          How to use

          Explained

          Creating a Basic Plot

          data %>% ggplot()

          Initialize a basic ggplot2 chart without specifying any aesthetics.

          Creating Subplots

          data %>% ggplot(aes(x = variable_1, y = variable_2)) 
          + geom_line() + facet_wrap(~variable_3)

          Plots subsets of data in separate facets.

          Creating Bar Chart

          data_frame %>% ggplot(aes(x = variable_1, 
          y = variable_2)) + geom_col()

          Create a bar chart using ggplot2, mapping variables to x and y axes.

          Plotting multiple columns

          data %>% ggplot(aes(x = variable_1)) + 
          geom_line(aes(y = variable_2)) + 
          geom_line(aes(y = variable_3))

          Plots multiple columns on the same axes using ggplot2.

          Scatterplots

          ggplot(data = uber_trips, aes(x = distance, y = cost)) 
          + geom_point()

          Generate scatterplots to visualize bivariate relationships in ggplot2.

          Scatterplots with Labels

          ggplot(data = df, aes(x = predictor, y = response)) 
          + geom_point() 
          + scale_y_continuous(labels = scales::comma)

          Create scatterplots with y-axis labels formatted using commas instead of scientific notation.

          Scatterplot with Comma Labels

          ggplot(data = df, aes(x = predictor, y = response)) 
          + scale_y_continuous(labels = scales::comma) 
          + geom_point()

          Plots a scatterplot with y-axis labels in comma format.

          Scatterplot with Groups

          ggplot(data = df, aes(x = predictor, y = response)) + geom_point() 
          + facet_wrap(~ categorical_variable, ncol = 2)

          Creates scatterplots of response vs predictor, grouped by a categorical variable.

          Scatterplot with Groups

          ggplot(data = df, aes(x = predictor, y = response)) 
          + geom_point() 
          + facet_wrap(~categorical_variable, ncol = 2)

          Creates scatterplots of response vs predictor, grouped by a categorical variable.

          Vertical Bar Chart

          ggplot(data = df, aes(x = col)) + geom_bar()

          Creates a vertical bar chart to visualize counts of data.

          Grouped Bar Plot

          ggplot(data = df, aes(x = col_1, fill = col_2)) 
          + geom_bar(position = "dodge")

          Creates a grouped bar plot to compare frequency distributions of categorical variables.

          Statistics & Probability

          Statistics & Probability

            Syntax for

            How to use

            Explained

            Mean

            mean(x)

            Calculates the mean of vector x.

            Median

            median(x)

            Calculates the median of vector x.

            Weighted Mean

            mean <- weighted.mean(x = distribution, w = weights)

            Computes the weighted mean of a numerical vector using specific weights.

            Standard Deviation

            sd(x)

            Calculates the standard deviation of x.

            Correlation

            cor(x, y)

            Calculates correlation between x and y.

            Linear Model

            lm(y ~ x, data=df)

            Fits a linear regression model.

            Types of Variables

            # Example Variables: Age (Quantitative), Gender (Qualitative)

            Classify variables as Quantitative (numerical) or Qualitative (categorical).

            P-Value Decision Threshold

            if (p_value < 0.05) {
               print('Reject null hypothesis')
            } else {
               print('Fail to reject null hypothesis')
            }

            Decide on hypothesis rejection using a common p-value threshold of 0.05.

            Chi-Squared Distribution

            pchisq(3.84, df = 1)

            Calculates the cumulative probability for a chi-squared distribution with specific degrees of freedom.

            Chi-Squared Test

            pchisq(q = 10, df = 5)

            Calculate cumulative probability for a chi-squared statistic of 10 with 5 degrees of freedom.

            Multi-category Chi-squared Test

            data <- table(income$sex, income$high_income)

            Performs a chi-squared test on the given contingency table.

            Computing Mode in R

            compute_mode <- function(vector)
            {counts_df <- tibble(vector) %>%
            group_by(vector) %gt;%
            summarise(frequency=n()) %gt;%
            arrange(desc(frequency));
            counts_df$vector[1]}

            Defines a function to calculate the mode of a given vector using dplyr functions.

            Calculate Z-score

            z_score <- function(value, vector)
            { (value - mean(vector)) / sd(vector)}

            This calculates the Z-score for a value relative to a vector's distribution.

            Chi-Squared Distribution

            pchisq(3.84, df = 1)

            Calculates the cumulative probability for a chi-squared distribution with specific degrees of freedom.

            Simulate Coin Toss

            set.seed(1)
            coin_toss <- function() {
               if (runif(1) <= 0.5) {
                 'HEADS'
               } else {
                 'TAILS'
               }
            }

            Simulates a random coin toss using R's uniform random numbers.

            Addition Rule for Probability

            P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

            Formula to calculate probabilities of unions of events, adjusting for overlap in non-exclusive cases.

            Independent Events

            P(A ∩ B) = P(A) * P(B)

            Probability of independent events occurs as product of individual probabilities.

            Product Rule in Experiments

            total_outcomes <- a * b

            Calculate the total outcomes for two independent experiments using the product rule.

            Uniform Distribution

            # Assuming all outcomes have an equal chance
            outcomes <- c(1, 2, 3, 4, 5, 6)
            probabilities <- rep(1/6, 6)result <- paste('Outcome:', outcomes, 'Probability:', probabilities)
            print(result)

            Demonstrates a uniform distribution for a dice roll, where outcomes equally likely.

            Conditional Probability Calculation

            P_A_given_B <- P_A_and_B / P_B

            Compute P(A|B) given the probability of A and B, and probability of B.

            Conditional Probability

            P_A_given_B <- length(intersect(A, B)) / length(B)

            Compute P(A∣B) using set cardinalities.

            Conditional Probability Definition

            P_A_given_B <- 1 - P_Ac_given_B

            Conditional probabilities are interrelated; P(A|B) and its complement P(Ac|B) can be calculated mutually.

            Independence

            P_A_and_B <- P_A * P_B

            Defines independent events: joint probability equals product of individual probabilities.

            Programming

            Programming

              Syntax for

              How to use

              Explained

              If Statement

              if (x > 0)
                  print("positive")

              Executes code if condition is true.

              For Loop

              for (i in 1:3)
                  print(i)

              Iterates over a sequence.

              While Loop

              while (x < 5)
                  x <- x + 1

              Repeats code while the x < 5 condition is true.

              Syntax for functions

              function_name <- function(input) {
                  # Code to manipulate the input
                  return(output)
              }

              Defines a reusable function structure in R.

              Define Function

              f <- function(a, b) a + b

              Defines a function with two arguments.

              Apply Function

              apply(m, 1, sum)

              Applies a function over rows/columns of a matrix.

              Exponentiation

              3^5

              Calculates 3 raised to the power of 5.

              Creating Dates

              ymd('20/04/21')

              Converts a string into a Date object using 'year-month-day'.

              Creating Dates from Strings

              ymd("20/04/21")

              Converts a string to a date object using the specified format.

              Define Window Frame

              ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

              Defines a window frame including one row before and after the current row for computations.

              Machine Learning

              Machine Learning

                Syntax for

                How to use

                Explained

                Fitting a Linear Model

                lm_fit <- lm(response ~ predictor, data = df)

                Fit a linear regression model with a response and a predictor variable.

                Visualize Residuals

                library(ggplot2)
                ggplot(data.frame(residuals = lm_fit$residuals), aes(x = residuals)) + geom_histogram()

                Visualize the distribution of residuals to check the linear model's fit.

                Hyperparameter Grid Search

                knn_grid <- expand.grid(k = 1:20)
                knn_model <- train(
                    tidy_price ~ accommodates + bathrooms + bedrooms, 
                    data = training_data,
                    method = "knn", 
                    trControl = train_control, 
                    preProcess = c("center", "scale"), 
                    tuneGrid = knn_grid
                )
                plot(knn_model)

                Performs grid search to optimize k for k-NN model and visualizes results.

                Naive Bayes Algorithm

                P(Spam|w1,...,wn) ∝ P(Spam) * ΠiP(wi|Spam)

                Classifies messages as spam using conditional probabilities.

                File IO

                File IO

                  Syntax for

                  How to use

                  Explained

                  Read CSV

                  read.csv("file.csv")

                  Reads a CSV file into a data frame.

                  Write CSV

                  write.csv(df, "file.csv")

                  Writes a data frame to a CSV file.

                  Read RDS

                  readRDS("file.rds")

                  Reads an RDS file into R.

                  Write RDS

                  saveRDS(df, "file.rds")

                  Saves an object as an RDS file.

                  List Files

                  list.files()

                  Lists files in the current directory.