Feature Preparation, Selection and Engineering

In the previous lesson, we made our first submission to Kaggle. Kaggle is a site where people create algorithms and compete against machine learning practitioners around the world. Your algorithm wins the competition if it's the most accurate on a particular data set. Using Kaggle and this Kaggle Fundamentals course, you will have a fun way to practice your machine learning skills. 

In this lesson, we're going to focus on working with the features used in the model to boost the accuracy of our predictions. To do this, we'll start by looking at feature selection. Feature selection is important because it helps to exclude features which are not good predictors, or features that are closely related to each other.

In this lesson and lessons to follow, we'll continue working with RMS Titanic passenger data to predict which passengers survived the Titanic disaster.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.


  • Learn how to determine which features in your model are the most-relevant to your predictions.
  • Learn ways to reduce the number of features used to train your model and avoid overfitting.
  • Learn techniques to create new features to improve the accuracy of your model.

Mission Outline

1. Introduction
2. Preparing More Features
3. Determining the Most Relevant Features
4. Training a model using relevant features.
5. Submitting our Improved Model to Kaggle
6. Engineering a New Feature Using Binning
7. Engineering Features From Text Columns
8. Finding Correlated Features
9. Final Feature Selection using RFECV
10. Training A Model Using our Optimized Columns
11. Submitting our Model to Kaggle
12. Next Steps
13. Takeaways


Course Info:


The median completion time for this course is 5.9 hours. View Details

This course requires a premium subscription, and includes three missions and one guided project.  It is the 28th course in the Data Scientist in Python path.


Take a Look Inside

(function(d) { d.addEventListener("DOMContentLoaded", function() { var pathname = d.location.pathname.replace(/^[/]|[/]$/g, "").replace("/", "-"); var tags = d.getElementsByTagName("iframe"); var type = pathname.startsWith("course") ? "?course=" : pathname.startsWith("path") ? "?path=" : null; if (type) { var i; for (i = 0; i < tags.length; i++) { if (tags[i].src.indexOf("signup#iframe") !== -1) { tags[i].src = tags[i].src.replace("#iframe", "") + type + pathname + "#iframe"; } } } }, false); })(document);