In this module, you will use the k-means clustering machine learning algorithm to get familiar with the basics of clustering. k-means clustering uses Euclidean distance to form clusters of similar data points. You will learn about the k-means class from scikit-learn to perform clustering to understand different U.S. senators based on how they voted.
In past courses, we've looked at regression and classification. These are both types of supervised machine learning. In supervised learning, you train an algorithm to predict an unknown variable from known variables. Another major type of machine learning is called unsupervised learning. In unsupervised learning, we aren't trying to predict anything. Instead, we're finding patterns in data.
One of the main unsupervised learning techniques is called clustering. We use clustering when we're trying to explore a dataset, and understand the connections between the various rows and columns. Clustering is a key way to explore unknown data, and a very commonly used machine learning technique.
As you work through each concept, you’ll get to apply what you’ve learned from within your browser — there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next.
1. Clustering overview
2. The dataset
3. Exploring the data
4. Distance between Senators
5. Initial clustering
6. Initial clustering
7. Exploring the clusters
8. Exploring Senators in the wrong cluster
9. Plotting out the clusters
10. Finding the most extreme
11. Next steps