MISSION 61

Transformations and Actions

In our Spark online lesson in this course on PySpark, we briefly touched on transformations and actions, and how these two methods affect the execution of code.

In this lesson, we'll dive deeper into how those mechanisms work, and explore a wider range of the functions built into the Spark core.

While learning about transformations and actions, you will clean up and reformat a file that contains the entire text of Shakespeare's play Hamlet. Shakespeare is well-known for his unique writing style and arguably one of the most influential writers in history. Hamlet is one of his most popular plays. After cleaning and reformating the data, you will perform text analysis on the data.

You'll learn such concepts using functions such as `map()`, `flatmap`, as well as other functions to help transform your data using Spark

As you learn transformation and actions, you’ll be able to apply what you’ve learned from within your browser — there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next.

Objectives

  • Learn how to read TSV files into Spark.
  • Learn to apply lambda functions over RDD objects.

Mission Outline

1. Introduction to the Data
2. The Map Method
3. Beyond Lambda Functions
4. The FlatMap Method
5. Filter Using a Named Function
6. Actions
7. Next Steps
8. Takeaways

spark-map-reduce

Course Info:

Intermediate

The median completion time for this course is 6 hours. View Details

This course requires a premium subscription and includes five missions, and one installation tutorial.  It is the 31st course in the Data Scientist In Python path.

START LEARNING FREE

Take a Look Inside