MISSION 187

Model Selection and Tuning

In the lesson on feature selection, we worked to optimize our predictions for our machine learning model for Kaggle by creating and selecting the features used to train our model. 

Kaggle is a site where people create algorithms and compete against machine learning practitioners around the world. Your algorithm wins the competition if it's the most accurate on a particular data set. Using Kaggle and this Kaggle Fundamentals course, you will have a fun way to practice your machine learning skills. 

In this lesson, we're going to focus on optimizing the model itself to boost the accuracy of our predictions. To do this, we'll look at a process known as model selection. Model selection is important because it helps to select the algorithm which gives the best predictions for your data.

In this lesson and lessons to follow, we'll continue working with RMS Titanic passenger data to predict which passengers survived the Titanic disaster.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn how the k-nearest neighbors and random forest algorithms work.
  • Learn about hyperparameters and how to select the hyperparameters that give the best prediction.
  • Learn how to compare differrent algorithms to improve the accuracy of your predictions.

Mission Outline

1. Introducing Model Selection
2. Training a Baseline Model
3. Training a Model using K-Nearest Neighbors
4. Exploring Different K Values
5. Automating Hyperparameter Optimization with Grid Search
6. Submitting K-Nearest Neighbors Predictions to Kaggle
7. Introducing Random Forests
8. Tuning our Random Forests Model with GridSearch
9. Submitting Random Forest Predictions to Kaggle
10. Next Steps
11. Takeaways

kaggle-fundamentals

Course Info:

Intermediate

The median completion time for this course is 5.9 hours. View Details

This course requires a premium subscription, and includes three missions and one guided project.  It is the 28th course in the Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside