In this lesson and the Building a Spam Filter guided project that follows it, we'll be focusing on an application of conditional probability. In order to understand how a spam filter works, we will be discussing an algorithm called Naive Bayes, which, as the name suggests, is based on Bayes' theorem. 

In this lesson and guided project, we'll focus on building a spam filter specifically directed at preventing mobile phone spam. The filter will be able to analyze new messages and tell whether they are spam or not — this way, we might be able to prevent spam from bothering mobile phone users.

This lesson explores the theoretical aspect of the algorithm and is dedicated to helping you understand how the algorithm works.

With the Naive Bayes algorithm, there are a few variations:

In this lesson, we learn the multinomial Naive Bayes version of the algorithm. Explaining the mathematical differences between the various versions is out of the scope of this lesson, but all the Naive Bayes algorithms build on the (naive) conditional independence assumption that we cover in this lesson.

Even though there is no data set in this lesson, you will still have the opportunity to complete exercises as you go through this lesson. By the end of this lesson, you will be able to feel completely confident with the Naive Bayes Algorithm.

As you work with through learning the Naive Bayes algorithm, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The R environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.


  • Learn multinominal Naive Bayes.
  • Learn the assumption of conditional indepence.
  • Learn about additive smoothing.

Lesson Outline

1. A Spam Filter
2. Naive Bayes Motivation and Overview
3. Using Bayes' Theorem
4. Using Proportionality
5. Classifying One-Word Messages
6. Classifying Multiple Word Messages
7. Conditional Independence
8. A General Equation
9. Edge Case: Words Not In Vocabulary
10. Additive Smoothing
11. Multinomial Naive Bayes
12. Next steps
13. Takeaways