In this course, we learned about Conditional Probability and different techniques that enable us to better estimate probabilities. We also learned about Bayes' Theorem and the Naive Bayes Algorithm. 

In this guided project, we'll create a spam filter for SMS messages. To do that, we'll use a data set of 5,572 SMS messages that are already classified by humans. The data set was put together by Tiago A. Almeida and José María Gómez Hidalgo, and it can be downloaded from The UCI Machine Learning Repository.

Working on guided projects will give you hands-on experience with real world examples, so we encourage you to not only complete them, but to take the time to really understand the concepts.

These projects are meant to be challenging to better prepare you for the real world, so don't be discouraged if you have to refer back to previous missions. 

As with all guided projects, we encourage you to experiment and extend your project, taking it in unique directions to make it a more compelling addition to your portfolio!

Objectives

  • Use conditional probability concepts in a practical setting.
  • Add business value using conditional probability and Naive Bayes.

Lesson Outline

1. Exploring the Data Set
2. Training, Cross-Validation and Test Sets
3. Data Cleaning
4. Creating the Vocabulary
5. Calculating Constants First
6. Calculating Parameters
7. Classifying A New Message
8. Accuracy
9. Hyperparameter Tuning and Cross-validation
10. Test Set Performance
11. Next Steps