MISSION 92

Applying Decision Trees

Over the past two lessons of this decision trees course, we learned about how decision trees are constructed. We used a modified version of ID3, which is a bit simpler than the most common tree building algorithms, C4.5 and CART. The basics are the same, however, so we can apply what we learned about how decision trees work to any tree construction algorithm. In this lesson, we'll learn about when to use decision trees, and how to use them most effectively.

In the previous lessons, we covered on implementing decision trees by hand. While having conceptual knowledge about how a decision tree works and knowing how to build one is handy, there's bound to be some robust implementation available so you don't have the build the decision tree from scratch. In this case, we can use the scikit-learn package to construct a decision tree and use it to make predictions.

While exploring when to use decision trees and when to use them most effectively, you'll continue to work with United States Census data from 1994 in efforts to try and predict if someone makes above or below 50,000 USD per year based on factors such as marital status, age, type of work, and other data reported.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn how to train a decision tree model using Scikit-learn.
  • Learn how to evaluate error using AUC.
  • Learn how to reduce overfitting with decision trees.

Mission Outline

1. Introduction to the Data Set
2. Using Decision Trees With scikit-learn
3. Splitting the Data into Train and Test Sets
4. Evaluating Error With AUC
5. Computing Error on the Training Set
6. Decision Tree Overfitting
7. Reducing Overfitting With a Shallower Tree
8. Tweaking Parameters to Adjust AUC
9. Tweaking Tree Depth to Adjust AUC
10. Underfitting in Simplistic Trees
11. The Bias-Variance Tradeoff
12. Exploring Decision Tree Variance
13. Pruning Leaves to Prevent Overfitting
14. Knowing When to Use Decision Trees
15. Takeaways

decision-trees

Course Info:

Intermediate

The median completion time for this course is 6.4 hours. View details

This course requires a premium subscription. This course has four missions and one guided project.  It is the 22nd course in the Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside