In the previous lessons of this Statistics Fundamentals course, we focused on the details around collecting data, on understanding its structure and how it's measured. 

Collecting data is just the starting point in a data analysis workflow. We rarely collect data just for the sake of collecting it. We collect data to analyze it, and we analyze it for different purposes:

  • To describe phenomena in the world (science)
  • To make better decisions (industries)
  • To improve systems (engineering)
  • To describe different aspects of our society (data journalism, for example)

Throughout this lesson, our focus will be on learning the details behind this form of simplifying data. One way to simplify a dataset is to select a variable, count how many times each unique value occurs, and represent the frequencies (the number of times a unique value occurs) in a table.

Our capacity to understand a dataset just by looking at it in a table format is limited, and it decreases dramatically as the size of the dataset increases. To be able to analyze data, we need to find ways to simplify it. One way we can do this is to create a frequency distribution table, or, shorter, frequency table or frequency distribution.


  • Learn what frequency distributions are.
  • Learn how to generate frequency distribution tables.
  • Learn how to generate grouped frequency distribution tables.
  • Learn what proportions, percentages, and percentiles are.

Lesson Outline

1. Simplifying Data
2. Frequency Distribution Tables
3. Sorting Frequency Distribution Tables
4. Sorting Tables for Ordinal Variables
5. Proportions and Percentages
6. Percentiles and Percentile Ranks
7. Finding Percentiles with pandas
8. Grouped Frequency Distribution Tables
9. Information Loss
10. Readability for Grouped Frequency Tables
11. Frequency Tables and Continuous Variables
12. Next steps
13. Takeaways