MISSION 133

Machine Learning Project Walkthrough: Data Cleaning

In this first lesson of the machine learning project course, you will explore the first part of the data science life cycle: data cleaning. Data cleaning is a skill that data analysts and data scientists alike report they use most often when doing any data science work. This lesson is designed to build upon the data cleaning skills you have learned in our Python Data Cleaning Advanced course.

To facilitate exploring the data cleaning aspect of the data science life cycle, we will be working with financial lending data from Lending Club. Lending Club is a marketplace for personal loans that matches borrowers who are seeking a loan with investors looking to lend money and make a return.

Throughout this lesson, you will be focusing on examining each column within the loan dataset and using pandas to drop columns that are not relevant to building a model to predict if a borrower will pay off their loan on time or not.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn about data cleaning for machine learning tasks.
  • Learn what feature selection is for machine learning.

Mission Outline

1. Introduction
2. Introduction to the data
3. Reading in to Pandas
4. First group of columns
5. First group of columns
6. Second group of features
7. Second group of features
8. Third group of features
9. Third group of features
10. Target column
11. Binary classification
12. Binary classification
13. Removing single value columns
14. Next steps

machine-learning-project

Course Info:

Intermediate

The median completion time for this course is 6.6 hours. View Details

This course requires a premium subscription and includes three Machine Learning Project Walkthroughs.  It is the 25th course in the Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside