MISSION 441

Categorical Data And The Chi-Squared Test

In this mission, we'll be learning about the chi-squared test for categorical data. This test is a statistical hypothesis testing that assumes the observed frequencies for a categorical variable match the expected frequencies for the categorical variable.

When looking at two varying distributions, we might know that something looks off. However, we don't quite know how to quantify how different the observed and expected values are. We also don't have any way to determine if there's a statistically significant difference between the two groups and if we need to investigate further.

This is where a chi-squared test can help. The chi-squared test enables us to quantify the difference between sets of observed and expected categorical values.

In this lesson, you will discover the formula for the chi-squared test statistic and build intuition around why and how the chi-squared quantifies the difference between a set of categorical values. You will also learn about p-values, a critical value metric that allows us to determine whether the difference between two categorical values is due to chance or some deeper and meaningful difference. 

We’ll also cover what degrees of freedom are and how they play a role in statistics.

As you work through each concept, you’ll get to apply what you’ve learned using our interactive Python environment and answer-checking, so that you’re getting practice writing Python and getting feedback about your new statistics skills as you learn. 

Objectives

  • Learn to determine the statistical significance of observing a set of categorical values.
  • Learn to generate and work with the chi-squared distribution.
  • Learn to use R functions for the chi-squared distribution.

Mission Outline

1. Looking At Categorical Data
2. Observed Data vs Expected Data
3. Dealing With Cancellation
4. Some Statistical Insights
5. Developing a Null Hypothesis
6. Importance of Samplle Size
7. Considering More Categories
8. Adjusting the Distribution Under The Null
9. Next steps
10. Takeaways

Course Info:

Intermediate

The median completion time for this course is 6.49 hours. ​View Details​​​

This course requires a Basic subscription. It includes four missions, and one guided project. This course is 10th course in the Data Analyst in R path.

START LEARNING FREE

Take a Look Inside