Machine Learning Project Walkthrough: Data Cleaning
In this first lesson of the machine learning project course, you will explore the first part of the data science life cycle: data cleaning. Data cleaning is a skill that data analysts and data scientists alike report they use most often when doing any data science work. This lesson is designed to build upon the data cleaning skills you have learned in our Python Data Cleaning Advanced course.
To facilitate exploring the data cleaning aspect of the data science life cycle, we will be working with financial lending data from Lending Club. Lending Club is a marketplace for personal loans that matches borrowers who are seeking a loan with investors looking to lend money and make a return.
Throughout this lesson, you will be focusing on examining each column within the loan dataset and using pandas to drop columns that are not relevant to building a model to predict if a borrower will pay off their loan on time or not.
As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.
2. Introduction to the data
3. Reading in to Pandas
4. First group of columns
5. First group of columns
6. Second group of features
7. Second group of features
8. Third group of features
9. Third group of features
10. Target column
11. Binary classification
12. Binary classification
13. Removing single value columns
14. Next steps