In our Spark online lesson in this course on PySpark, we briefly touched on transformations and actions, and how these two methods affect the execution of code.

In this lesson, we'll dive deeper into how those mechanisms work, and explore a wider range of the functions built into the Spark core.

While learning about transformations and actions, you will clean up and reformat a file that contains the entire text of Shakespeare's play Hamlet. Shakespeare is well-known for his unique writing style and arguably one of the most influential writers in history. Hamlet is one of his most popular plays. After cleaning and reformating the data, you will perform text analysis on the data.

You'll learn such concepts using functions such as `map()`, `flatmap`, as well as other functions to help transform your data using Spark

As you learn transformation and actions, you’ll be able to apply what you’ve learned from within your browser — there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next.

Objectives

  • Learn how to read TSV files into Spark.
  • Learn to apply lambda functions over RDD objects.

Lesson Outline

1. Introduction to the Data
2. The Map Method
3. Beyond Lambda Functions
4. The FlatMap Method
5. Filter Using a Named Function
6. Actions
7. Next Steps
8. Takeaways