In the previous lesson on preparing features for machine learning, we prepared a dataset by removing columns that had data leakage issues, contained redundant information, or required additional processing to turn into useful features. We also cleaned features that had formatting issues, and converted categorical columns to dummy variables. Our goal of preparing the dataset is to generate features from the data, which can be fed into a machine learning algorithm. The algorithm will make predictions about whether or not a loan will be paid off on time.

In this lesson of this machine learning project course, we will use our clean dataset to build and train machine learning models that make accurate predictions about our data. We will use the error metrics we learned about in the Logistic Regression lesson to assess the quality of our model. We will also use an algorithm called Random Forest to work with nonlinear data and learn complex conditionals.

To facilitate building machine learning models and making predictions, we will be working with financial lending data from Lending Club. Lending Club is a marketplace for personal loans that matches borrowers who are seeking a loan with investors looking to lend money and make a return.

  • Learn how to choose an error metric.
  • Learn how to train and test your model using common machine learning algorithms.

Lesson Outline

  1. Recap
  2. Picking an error metric
  3. Picking an error metric
  3. Class imbalance
  5. Class imbalance
  4. Logistic Regression
  5. Cross Validation
  6. Penalizing the classifier
  9. Penalizing the classifier
  7. Manual penalties
  8. Random forests
  9. Next Steps

