**MISSION 30**

# An Introduction to K-Nearest Neighbors

In this exploring topics course, we will explore additional topics in data science that have not been covered in the Data Scientist path but are important to know: Naive Bayes and k-nearest neighbors.

In this lesson, we'll learn about a specific machine learning technique called k-nearest neighbors. Machine learning is a process of discovering patterns in existing data to make a prediction. We will be learning about k-nearest neighbors to identify the NBA player who is most like LeBron James.

As opposed to utilizing scikit-learn to implement k-nearest neighbors, you will have the opportunity to implement it on your own and see what is happening behind the scenes so you will have a greater understanding of the algorithm. Understanding what's going on behind the scenes is critical so you can understand why a certain function gives any output, especially when it comes to machine learning.

In addition to identifying the NBA player who is most like LeBron James, we will also walk you through the entire machine learning workflow — from selecting a feature to testing the model — so you can get an idea of what it would be like to explore a problem that a data scientist solves.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser — there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next.

#### Objectives

#### Lesson Outline

1. Introduction to the Data

2. Understanding the kNN Algorithm

3. Finding Similar Rows With Euclidean Distance

4. Normalizing Columns

5. Finding the Nearest Neighbor

6. Generating Training and Testing Sets

7. Using sklearn

8. Computing Error