In the evaluating model performance lesson, we learned about train/test validation, a simple technique for testing a machine learning model's accuracy on new data that the model wasn't trained on. 

In this lesson, we'll focus on more robust techniques such as holdout validation and k-fold cross validation. In holdout validation, we usually use a 50/50 split instead of the 75/25 split from train/test validation. This way, we remove the number of observations as a potential source of variation in our model performance.

In K-fold cross validation, a larger proportion of the data is taken advantage of during training while still rotating through different subsets of the data to avoid the issues of train/test validation. 

We can use scikit-learn to perform k-fold cross validation and select the proper number of folds quickly to assess the quality of a model.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn how cross-validation lets us more accurately understand model performance.
  • Learn the difference between holdout and k-fold cross validation.
  • Learn how to perform cross-valudation in scikit-learn.

Lesson Outline

1. Introduction
2. Holdout Validation
3. K-Fold Cross Validation
4. First iteration
5. Function for training models
6. Performing K-Fold Cross Validation Using Scikit-Learn
7. Exploring Different K Values
8. Bias-Variance Tradeoff
9. Next steps
10. Takeaways