In the data cleaning for machine learning lesson, you removed all of the columns that contained redundant information, weren't useful for modeling, required too much processing to make useful, or leaked information from the future.
In this lesson of this machine learning project course, we'll prepare the data for machine learning by focusing on handling missing values, converting categorical columns to numeric columns, and removing any other extraneous columns we encounter throughout this process.
Handling missing data before training and testing a machine learning model on a data set is important because the mathematics underlying most machine learning models assumes the data is numerical and contains no missing values.
To facilitate exploring the feature preparation aspect of the data science life cycle, we will be working with financial lending data from Lending Club. Lending Club is a marketplace for personal loans that matches borrowers who are seeking a loan with investors looking to lend money and make a return.
As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.
2. Handling missing values
3. Text columns
4. Converting text columns
5. First 5 categorical columns
6. The reason for the loan
7. Categorical columns
8. Dummy variables
9. Next steps