MISSION 227

Guided Project: Analyzing Wikipedia Pages

Use threads and processes to analyze Wikipedia pages more quickly.

Objectives

  • How to use parallel computing to quickly analyze Wikipedia pages.
  • How to process and strip HTML pages in Python.

Mission Outline

1. Introducing Wikipedia Data
2. Reading In The Data
3. Remove Extraneous Markup
4. Finding Common Tags
5. Finding Common Words
6. Next Steps

Course Info:

Optimizing Code Performance On Large Datasets

Intermediate

The average completion time for this course is 10-hours.

This course requires a premium subscription and includes four missions, and one guided project.  It is the 4th course in the Data Engineer path.

START LEARNING FREE

Take a Look Inside

Share On Facebook
Share On Twitter
Share On Linkedin
Share On Reddit