Introduction to Decision Trees

In the previous courses of the data scientist path, we discussed machine learning as well as a couple different algorithms and when to use a specific algorithm. We also learned about two types of machine learning: supervised and unsupervised.

In this lesson and this course, we’ll learn about another supervised machine learning algorithm known as the decision tree algorithm. It enables us to automatically construct a decision tree that tells us what outcomes we should predict in certain situations. One of the major advantages of decision trees is that they can pick up nonlinear interactions between variables in the data that linear regression can’t detect.

While learning in this decision trees course, we’ll be looking at census data for individuals in the United States. Throughout this lesson, you will learn about concepts such as information gain and entropy while building a model to predict how much money individuals make.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there’s no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you’ve fully mastered each concept before learning the next concept.

Objectives

  • Learn the basics of decision trees
  • Learn the importance of entropy and information gain.

Lesson Outline

  1. Introduction
  2. Overview of the Data Set
  3. Converting Categorical Variables
  4. Splitting Data
  5. Creating Splits
  6. Decision Trees as Flows of Data
  7. Splitting Data to Make Predictions
  8. Overview of Data Set Entropy
  9. Overview of Data Set Entropy
  10. Information Gain
  11. Information Gain
  12. Finding the Best Split
  13. Build the Whole Tree
  14. Next Steps
  15. Takeaways

Get started for free

No credit card required.

Or With

By creating an account you agree to accept our terms of use and privacy policy.