In the previous courses of this path, we learned how to perform basic data analysis and data visualization. We also learned about some fundamental statistical metrics like the mean and the median, and we plotted histograms, bar graphs or line plots.

In the second lesson of our Statistics Fundamentals course, we’ll focus on the details around gathering data for analysis. As usual, we'll work with a real-world dataset. Before we dive into the technical details and start playing with the data, we begin with getting a sense of what statistics is.

You will start this mission by learning how to differentiate between a population and a sample, one of the foundational concepts of statistics. You will also learn cluster sampling, stratified sampling, and proportional stratified sampling to select data from a population when conducting statistical tests for your data science work. 

Not only will you discover how to perform the sampling methods, you will also learn what each method is and how it works so you can make smart decisions when sampling data.

Knowing how and what to sample can be very useful. If you wish to learn about different sampling methods and how to strategically pick data points to look at, this mission is definitely the place to start!

As you work through each concept, you’ll apply what you’ve learned from within your browser; there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you've fully mastered each concept before moving on to the next.


  • Learn more about sampling.
  • Learn how to perform stratified sampling and cluster sampling.

Lesson Outline

1. Stratified Sampling
2. Creating Strata
3. Creating and Analyzing Strata with `dplyr`
4. Proportional Stratified Sampling
5. Many Proportional Stratified Samples
6. Alternative Approach
7. Choosing the Right Strata
8. Cluster Sampling
9. Sampling in Data Science Practice
10. Descriptive and Inferential Statistics
11. Next Steps
12. Takeaways