Multivariate K-Nearest Neighbors
In the evaluating model performance lesson, we used a k-nearest neighbors algorithm that used just one feature to predict optimal rental price. While choosing different features may improve the accuracy of a model, it doesn’t reflect the true power of the k-nearest neighbors algorithm. Eventually, you’ll find that to create more accurate models, you need to use multiple features.
In this lesson, we’ll learn how to combine multiple attributes in our model to more accurately predict Airbnb rental prices. You’ll learn how to handle missing values, and we’ll cover some important considerations for missing values. We’ll also cover concepts such as normalizing columns, and why it’s useful for the modeling process. In addition, you will build your understanding of Euclidean Distance for the univariate case (one feature), and extend it to the multivariate case (multiple features).
Near the end of this lesson, we’ll introduce you to the most popular machine learning Python library, `scikit-learn`. This library contains functions for all major machine learning algorithms and a simple, unified workflow, which allows data scientists to be incredibly productive when training and testing different models on a new dataset.
As with all our courses, you will be asked to apply what you’re learning in our in-browser app, which will also check your answers so you can ensure you’ve fully mastered each concept.
- Learn how to use multiple variables in machine learning models.
- Learn how to prepare columns by normalizing and handling missing values.
- Removing features
- Handling missing values
- Normalize columns
- Euclidean distance for multivariate case
- Introduction to scikit-learn
- Fitting a model and making predictions
- Calculating MSE using Scikit-Learn
- Using more features
- Using all features
- Next steps