MISSION 227

Guided Project: Analyzing Wikipedia Pages

Use threads and processes to analyze Wikipedia pages more quickly.

Objectives

  • How to use parallel computing to quickly analyze Wikipedia pages.
  • How to process and strip HTML pages in Python.

Mission Outline

1. Introducing Wikipedia Data
2. Reading In The Data
3. Remove Extraneous Markup
4. Finding Common Tags
5. Finding Common Words
6. Next Steps

improving-code-performance

Course Info:

Optimizing Code Performance On Large Datasets

Intermediate

The median completion time for this course is 4.7 hours.

This course requires a premium subscription and includes four missions, and one guided project.  It is the fourth course in the Data Engineer path.

START LEARNING FREE

Take a Look Inside