In the previous courses of the data scientist path, we discussed machine learning as well as a couple different algorithms and when to use a specific algorithm. We also learned about two types of machine learning: supervised and unsupervised.

In this lesson and this course, we'll learn about another supervised machine learning algorithm known as the decision tree algorithm. It enables us to automatically construct a decision tree that tells us what outcomes we should predict in certain situations. One of the major advantages of decision trees is that they can pick up nonlinear interactions between variables in the data that linear regression can't detect.

While learning in this decision trees course, we'll be looking at census data for individuals in the United States. Throughout this lesson, you will learn about concepts such as information gain and entropy while building a model to predict how much money individuals make.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.


  • Learn the basics of decision trees
  • Learn the importance of entropy and information gain.

Lesson Outline

1. Introduction
2. Overview of the Data Set
3. Converting Categorical Variables
4. Splitting Data
5. Creating Splits
6. Decision Trees as Flows of Data
7. Splitting Data to Make Predictions
8. Overview of Data Set Entropy
9. Overview of Data Set Entropy
10. Information Gain
11. Information Gain
12. Finding the Best Split
13. Build the Whole Tree
14. Next Steps
15. Takeaways