MISSION 432

The Naive Bayes Algorithm

In this lesson and the Building a Spam Filter guided project, we'll be focusing on an application of conditional probability. In order to understand how a spam filter works, we will be discussing an algorrithm called Naive Bayes, which, as the name surggests, is based on Bayes' theorem 

In this lesson and guided project, we'll focus on building a spam filter specifically directed at preventing mobile phone spam. The filter will be able to analyze new messages and tell whether they are spam or not — this way, we might be able to prevent spam from bothering mobile phone users.

This lesson explores the theoretical aspect of the algorithm and is dedicated to helping you understand how the algorithm works.

With the Naive Bayes algorithm, there are a few variations:

In this lesson, we learn the multinomial Naive Bayes version of the algorithm. Explaining the mathematical differences between the various versions is out of the scope of this lesson, but all the Naive Bayes algorithms build on the (naive) conditional independence assumption that we cover in this lesson.

Even though there is no dataset in this lesson, you will still have the opportunity to complete exercises as you go through this lesson. By the end of this lesson, you will be able to feel completely confident with the Naive Bayes Algorithm.

As you work with through learning the navie bayes algorithm, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn multinominal Naive Bayes.
  • Learn the assumption of conditional indepence.
  • Learn about additive smoothing

Mission Outline

1. ​A Spam Filter
2. ​Naive Bayes Overview
3. ​Using Bayes' Algorithm
4. ​Optimizing the Algorithm
5. A One-Word Message
6. Multiple Words
7. The Independence Assumption
8. A General Formula
9. Edge Cases
10. Additive Smoothing
11. Multinomincal Naive Bayes
12. Next steps
13. Takeaways

Course Info:

Intermediate

The median completion time for this course is 6.49 hours. ​View Details​​​

This course requires a Basic subscription. It includes four missions, and one guided project. This course is 18th course in the Data Analyst in Python path.

START LEARNING FREE

Take a Look Inside