MISSION 239

Processing and Transforming Features

To understand how linear regression for machine learning works, we've stuck to using features from the training data set that contained no missing values and were already in a convenient numeric representation. In this lesson, we'll explore how to transform some of the remaining features so we can use them in our model. Broadly, the process of processing and creating new features is known as feature engineering. Feature engineering is a bit of an art and having knowledge in the specific domain (in this case real estate) can help you create better features

In this lesson, we'll explore what dummy coding is and when you should use it as well ass why it's a critical technique for dealing with categorical data types. we'll also what it means to impute values in a dataset as well as discuss when it makes sense to impute values. 

At first, we'll focus only on columns that contain no missing values but still aren't in the proper format to use in a linear regression model. In the latter half of this lesson, we'll explore some ways to deal with missing values.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn to transform the training set for a machine learning model.
  • Learn the basics of feature engineering.
  • Learn how to deal with missing data.

Mission Outline

1. Introduction
2. Categorical Features
3. Dummy Coding
4. Transforming Improper Numerical Features
5. Missing Values
6. Imputing Missing Values
7. Next Steps
8. Takeaways

linear-regression-for-machine-learning

Course Info:

Intermediate

The median completion time for this course is 7.23 hours. View Details

This course requires a premium subscription and includes five missions and one guided project.  It is the 11th course in the Data Analyst in R path.

START LEARNING FREE

Take a Look Inside