MISSION 94

Introduction to Random Forests

Over the past three lessons of this decision trees course, we’ve learned about decision trees, and looked at ways to reduce overfitting. The most powerful tool for reducing decision tree overfitting is called the random forest algorithm. In this lesson, we'll learn how to construct and apply random forests.

In this lesson, you will discover why random forests are the most powerful tool for reducing decision tree overfitting. You’ll learn concepts such as ensemble algorithms and how they work as well as how a random forest uses bagging to make ensemble more effective by introducing variation into each decision tree model.

While exploring when to use decision trees and when to use them most effectively, you'll continue to work with United States Census data from 1994 in efforts to try and predict if someone makes above or below 50,000 USD per year based on factors such as marital status, age, type of work, and other data reported.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn how to ensemble decision trees to improve prediction quality.
  • Learn how to introduce variation with bagging.
  • Learn how to reduce overfitting with random forests.

Mission Outline

1. Introduction
2. Combining Model Predictions With Ensembles
3. Combining Our Predictions
4. Combining Our Predictions
5. Why Ensembling Works
6. Introducing Variation With Bagging
7. Selecting Random Features
8. Random Subsets in scikit-learn
9. Practice Putting it All Together
10. Tweaking Parameters to Increase Accuracy
11. Reducing Overfitting
12. When to Use Random Forests
13. Takeaways

decision-trees

Course Info:

Intermediate

The median completion time for this course is 6.4 hours. View details

This course requires a premium subscription. This course has four missions and one guided project.  It is the 22nd course in the Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside