In the machine learning workflow, once we've selected the model we want to use (such as a linear regression model), selecting the appropriate features for that model is the next important step. In this lesson, we'll explore how to use correlation between features and the target column, correlation between features, and variance of features to select features. 

You'll learn concepts such as correlation to help you identify features, or attributes, that are good predictors of the value in the target column. You'll also learn how to rescale all the features to help ensure that some features aren't weighted more heavily than others.

While exploring how to select features for a well-performing model, we'll be working with a dataset that describes characteristics of houses sold between 2006 and 2010 in the city of Ames (located in the American state of Iowa).

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn to choose appropriate features for your model.
  • Learn to generate a heatmap for your data.

Lesson Outline

1. Missing Values
2. Correlating Feature Columns With Target Column
3. Correlation Matrix Heatmap
4. Train And Test Model
5. Removing Low Variance Features
6. Final Model
7. Next Steps
8. Takeaways