**MISSION 40**

# K-means Clustering

In the last module, we discussed unsupervised machine learning and briefly learned about a clustering algorithm that groups data points together and identifies patterns from the groups of data: K-Means clustering. While we just covered the tip of the iceberg with K-Means in the previous module, we'll explore K-Means clustering as we explore how NBA players are different from one another.

K-Means clustering is a popular centroid-based clustering algorithm that we will use. The “K” in K-Means refers to the number of clusters we want to segment our data into. The key part with K-Means (and most unsupervised machine learning techniques) is that we have to specify what “k” is. There are advantages and disadvantages to this, but one advantage is that we can pick the “k” that makes the most sense for our use case.

Not only knowing how to implement the algorithm on a dataset, but understanding what is happening behind the scenes is critical so you can understand what assumptions are being made and can quickly debug. That said, In addition to learning about K-Means clustering, you will also learn about the mathematics that goes on when you implement K-Means using a machine learning library, such as scikit-learn.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser — there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

#### Objectives

#### Mission Outline

1. Clustering NBA Players

2. Point Guards

3. Points Per Game

4. Assist Turnover Ratio

5. Visualizing the Point Guards

6. Clustering players

7. The Algorithm

8. Visualize Centroids

9. Setup (continued)

10. Step 1 (Euclidean Distance)

11. Step 1 (Continued)

12. Visualizing Clusters

13. Step 2

14. Repeat Step 1

15. Repeat Step 2 and Step 1

16. Challenges of K-Means

17. Conclusion

18. Takeaways