MISSION 484

Multivariate K-Nearest Neighbors

At the beginning of the course, we learned about the k-nearest neighbors algorithm and chose to use one feature to predict the rent price. As we learned the caret library, we actually used two features to predict tidy_price. We didn't go into the specifics of using two features, but in this mission we'll explore how adding new features to the algorithm changes how it classifies new listings.

Deciding a proper rental price for a listing is a complex task, requiring much more information than how many people a listing can accommodate. It could be helpful to incorporate more information into creating a prediction since rental listings have many qualities to them, qualities that could be shared.

By adding more features, we'll work to keep distances between similar listings small while increasing distances of listings that are not similar to ours. These two changes together help increase prediction quality.

Objectives

  • Learn to add more features to our model with caret
  • Learn about the role of Euclidean distance in machine learning

Mission Outline

  1. Generalizing the distance formula
  2. More data cleaning
  3. Handling missing values
  4. Normalization
  5. Fitting multiple models
  6. Comparing model performance
  7. Long format data
  8. group_by() and summarize()
  9. The power of piping
  10. Next steps
  11. Takeaways
machine-learning-fundamentals

Course Info:

Intermediate

The median completion time for this course is 10 hours. View Details

This course requires a premium subscription. This course includes five missions and one guided project.  It is in the Data Analyst in R path.

START LEARNING FREE

Take a Look Inside