Introduction to Natural Language Processing

In this lesson, we'll learn some of the basic building blocks of natural language processing. When we feed a computer written text, it has no idea what that text means. In order for a computer to begin making inferences from it, we'll need to convert the text to a numerical representation. This process will enable the computer to intuit grammatical rules, which is more akin to learning a first language.

Generally speaking, natural language processing is the study of enabling computers to understand human languages. This field may involve teaching computers to automatically score essays, infer grammatical rules, or determine the emotions associated with text.

In this lesson, you'll learn important natural language processing concepts such as bag of words, tokenization, stop words and more.

In order to facilitate learning about natural language processing, you will be working with data from Hacker News to predict the number of upvotes an article has received based on its headline.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.


  • Learn the basics of Natural Language Processing.
  • Learn how to use Natural Language Processing to make predictions.

Lesson Outline

1. Introduction
2. Overview of the Data
3. Tokenizing the Headlines
4. Preprocessing Tokens to Increase Accuracy
5. Assembling a Matrix of Unique Words
6. Counting Token Occurrences
7. Removing Columns to Increase Accuracy
8. Splitting the Data Into Train and Test Sets
9. Making Predictions With fit()
10. Calculating Prediction Error
11. Next Steps
12. Takeaways

Take a Look Inside