MISSION 140

Multivariate K-Nearest Neighbors

In the evaluating model performance lesson, we used a k-nearest neighbors algorithm that used just one feature to predict optimal rental price. While choosing different features may improve the accuracy of a model, it doesn't reflect the true power of the k-nearest neighbors algorithm. Eventually, you'll find that to create more accurate models, you need to use multiple features. 

In this lesson, we'll learn how to combine multiple attributes in our model to more accurately predict Airbnb rental prices. You'll learn how to handle missing values, and we’ll cover some important considerations for missing values. We’ll also cover concepts such as normalizing columns, and why it’s useful for the modeling process. In addition, you will build your understanding of Euclidean Distance for the univariate case (one feature), and extend it to the multivariate case (multiple features).

Near the end of this mission, we’ll introduce you to the most popular machine learning Python library, `scikit-learn`. This library contains functions for all major machine learning algorithms and a simple, unified workflow, which allows data scientists to be incredibly productive when training and testing different models on a new dataset.

As with all our courses, you will be asked to apply what you’re learning in our in-browser app, which will also check your answers so you can ensure you've fully mastered each concept.

Objectives

  • Learn how to use multiple variables in machine learning models.
  • Learn how to prepare columns by normalizing and handling missing values.

Mission Outline

1. Recap
2. Removing features
3. Handling missing values
4. Normalize columns
5. Euclidean distance for multivariate case
6. Introduction to scikit-learn
7. Fitting a model and making predictions
8. Calculating MSE using Scikit-Learn
9. Using more features
10. Using all features
11. Next steps
12. Takeaways

machine-learning-fundamentals

Course Info:

Beginner

The median completion time for this course is 7 hours. View Details

This course requires a premium subscription. This course includes five missions and one guided project.  It is the 17th course in the Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside