Guided Project: Building a Spam Filter with Naive Bayes

In this course, we learned about Conditional Probability and differenct techniques that enable us to better estimate probabilities. We also learned about Bayes' Theorem and the Naive Bayes Algorithm. 

In this guided project, we'll create a spam filter for SMS messages. To do that, we'll use a data set of 5,572 SMS messages that are already classified by humans. The data set was put together by Tiago A. Almeida and José María Gómez Hidalgo, and it can be downloaded from The UCI Machine Learning Repository.

Working on guided projects will give you hands-on experience with real world examples, so we encourage you to not only complete them, but to take the time to really understand the concepts.

These projects are meant to be challenging to better prepare you for the real world, so don't be discouraged if you have to refer back to previous missions. If you haven't worked with Jupyter Notebook before or need a refresher, we recommend completing our Jupyter Notebook Guided Project before continuing.

As with all guided projects, we encourage you to experiment and extend your project, taking it in unique directions to make it a more compelling addition to your portfolio!


  • Use conditional probability concepts in a practical setting.
  • Add business value using conditional probability and Naive Bayes.

Mission Outline

1. Exploring the Data Set
2. Training and Test Set
3. Letter Case and Punctuation
4. Creating the Vocabulary
5. The Final Training Set
6. ​Calculating Constants First
7. ​Calculating Parameters
8. ​Classifying A New Message
9. ​Measuring the Spam Filter's Accuracy
10. Next Steps

Course Info:


The median completion time for this course is 6.49 hours. ​View Details​​​

This course requires a Basic subscription. It includes four missions, and one guided project. This course is 18th course in the Data Analyst in Python path.


Take a Look Inside