Machine Learning Project Walkthrough: Data Cleaning

In this first lesson of the machine learning project course, you will explore the first part of the data science life cycle: data cleaning. Data cleaning is a skill that data analysts and data scientists alike report they use most often when doing any data science work. This lesson is designed to build upon the data cleaning skills you have learned in our Python Data Cleaning Advanced course.

To facilitate exploring the data cleaning aspect of the data science life cycle, we will be working with financial lending data from Lending Club. Lending Club is a marketplace for personal loans that matches borrowers who are seeking a loan with investors looking to lend money and make a return.

Throughout this lesson, you will be focusing on examining each column within the loan dataset and using pandas to drop columns that are not relevant to building a model to predict if a borrower will pay off their loan on time or not.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there’s no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you’ve fully mastered each concept before learning the next concept.


  • Learn about data cleaning for machine learning tasks.
  • Learn what feature selection is for machine learning.

Lesson Outline

  1. Introduction
  2. Introduction to the data
  3. Reading in to Pandas
  4. First group of columns
  5. First group of columns
  6. Second group of features
  7. Second group of features
  8. Third group of features
  9. Third group of features
  10. Target column
  11. Binary classification
  12. Binary classification
  13. Removing single value columns
  14. Next steps

Get started for free

No credit card required.

Or With

By creating an account you agree to accept our terms of use and privacy policy.