Multi category chi-squared tests

In the last lesson, we looked at the gender frequencies of people included in a dataset on US income and calculated a chi-squared value indicating how the observed frequencies in a single categorical column varied from the US population as a whole.

In this lesson, we’ll look at how to make this same technique applicable to cross tables that show how two categorical columns interact. In other words, we’ll look at how to apply the chi-squared test across more than one category at a time.

You’ll learn concepts such as expected value and statistical significance. In order to aid you in implementing the chi-squared test across multiple columns, you will use the SciPy library for Python. SciPy is a Python-based ecosystem for mathematics, science, and engineering. While SciPy is an ecosystem, it includes the SciPy library, which provides many user-friendly and efficient routines including the `scipy.stats.chisquare()` function so you can easily compute the chi-squared statistic when you’re doing a multi category chi-squared tests.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there’s no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you’ve fully mastered each concept before learning the next concept.

Objectives

• Learn how to extend chi-squared to multiple categories.
• Learn how to calculate the statistical significance of multi category chi-squared tests.

Lesson Outline

1. Multiple categories
2. Calculating expected values
3. Calculating chi-squared
4. Finding statistical significance
5. Cross tables
6. Finding expected values
7. Caveats
8. Next steps
9. Takeaways