Attempting to analyze data that is completely messy can be a daunting task, if not impossible. Most of the datasets you come across will require some amount of cleaning before you can start analyzing and making sense of the data.
As you progress through your data analyst or data scientist career, 80% of your work will be cleaning data so your analysis can become easier and/or possible. This lesson will teach you what you need to know to reshape your data and compute correlations in R between columns in your dataframe.
Throughout this lesson and subsequent lessons, you will gradually grow your skills to ensure you are prepared to land your first job in data! In this lesson, you will continue learning techniques for data cleaning in R and analysis as you work with real-world data. As you start this lesson, you will learn how to visualize relationships between variables using scatter plots.
As you proceed through the lesson, you will learn how to reshape data for visualization and analysis. You will also learn how to do a correlation analysis and interpret correlation matrices to identify interesting relationships between columns.
While learning about correlations and reshaping data, you'll work with New York City high school data and start to analyze what factors influence SAT scores the most. With each concept, you'll be using our code running system with answer checking so you can ensure you've mastered each concept before moving on to the next concept.
1. Analyzing New York City Public Schools Data
2. Visualizing Relationships Between Variables Using Scatter Plots
3. Reshaping Data for Visualization
4. Gathering Data into Columns
5. Comparing the Strength of Relationships Among Pairs of Variables
6. Correlation Analysis: Measuring the Strength of Relationships Between Variables
7. Creating and Interpreting Correlation Matrices
8. Identifying Interesting Relationships
9. Next Steps