Attempting to analyze data that is completely messy can be a daunting task, if not impossible. Most of the datasets you come across will require some amount of cleaning before you can start analyzing and making sense of the data.
As you progress through your data analyst or data scientist career, 80% of your work will be cleaning data so your analysis can become easier and/or possible. This lesson will teach you what you need to know to clean your data using string manipulation and relational data.
Throughout this lesson and subsequent lessons, you will gradually grow your skills to ensure you are prepared to land your first job in data!
In this lesson, you will practice string manipulation and learn to work with relational data as you perform data cleaning in R to prepare six data frames for analysis. As you start this lesson, you will learn about tidy data, which is structuring data so that it is optimally organized for analysis.
As you proceed through the lesson, you'll learn how additional string manipulation methods such as splitting and subsetting strings. You will also learn relational data concepts such as inner joins, outer joins, and keys to create a single data frame from multiple related dataframes.
While learning about string manipulation and relational data, you'll work with New York City high school data and start to analyze what factors influence SAT scores the most. With each concept, you'll be using our code running system with answer checking so you can ensure you've mastered each concept before moving on to the next concept.
1. Importing the Cleaned NYC Schools Data into R
2. Tidy Data and Efficient Analysis
3. Parsing Numbers from Strings
4. Extracting Numeric Data From Strings: Creating New Variables
5. Splitting Strings
6. Subsetting strings
7. Relational Data: Keys and Joins
8. Inner Joins
9. Outer Joins
10. Using Joins to Create A Single Data Frame
11. Next Steps