While exploring logistic regression, we briefly mentioned overfitting and the problems it can cause. In this lesson, we'll explore how to identify overfitting and what you can do to avoid it. To explore overfitting, we'll use a dataset about cars that contains seven numerical features that could have an effect on a car's fuel efficiency.

In this lesson, we will discuss two observable sources of error in a model that we can indirectly control: bias and variance. We'll also discuss overfitting at a deeper level and explore a good way to detect if a model is showing signs overfitting and look at an example of a model that is showing signs of overfitting. You will also get introduced to related terminology that you'll see in other literature as you read more about overfitting. For more information about the bias-variance trade-off, you can read more about it in our blog post here.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn how to detech overfitting for a model.
  • Learn how to understand the bias-variance tradeoff.

Lesson Outline

1. Introduction
2. Bias and Variance
3. Bias-variance tradeoff
4. Multivariate models
5. Cross validation
6. Plotting cross-validation error vs. cross-validation variance
7. Conclusion
8. Next steps
9. Takeaways