MISSION 99

Chi-squared tests

In this mission, we'll be learning about the chi-squared test for categorical data. This test enables us to determine the statistical significance of observing a set of categorical values.

When looking at two varying distributions, we might know that something looks off. However, we don't quite know how to quantify how different the observed and expected values are. We also don't have any way to determine if there's a statistically significant​​​ difference between the two groups and if we need to investigate further.

This is where a chi-squared test can help. The chi-squared test enables us to quantify the difference between sets of observed and expected categorical values.

In this mission, you will discover the formula for the chi-squared test statistic and build intuition around why and how the chi-squared quantifies the difference between a set of categorical values. You will also learn about p-values, a critical value metric that allows us to determine whether the difference between two categorical values is due to chance or some deeper and meaningful difference. 

We’ll also cover what degrees of freedom are and how they play a role in statistics.

As you work through each concept, you’ll get to apply what you’ve learned using our interactive Python environment and answer-checking, so that you’re getting practice writing Python and getting feedback about your new statistics skills as you learn. 

Objectives

  • Learn to determine the statistical significance of observing a set of categorical values.
  • Learn to generate and work with the chi-squared distribution.
  • Learn the basics of Degrees of Freedom.

Mission Outline

1. Observed and expected frequencies
2. Calculating differences
3. Updating the formula
4. Generating a distribution
5. Statistical significance
6. Smaller samples
7. Sampling distribution equality
8. Degrees of freedom
9. Increasing degrees of freedom
10. Using SciPy
11. Next steps
12. Takeaways

probability-statistics-intermediate

Course Info:

Intermediate

The median completion time for this course is 6.49 hours. ​View Details​​​

This course requires a Basic subscription. It includes six missions, and one guided project. This course is 17th course in the Data Analyst in Python path and Data Scientist in Python path

START LEARNING FREE

Take a Look Inside