COURSE

Spark and Map-Reduce

Learn how to use Apache Spark and the map-reduce technique to clean and analyze large datasets.

This course will teach you to use Apache Spark and some map-reduce techniques to clean and analyze large datasets.

By the end of this course, you'll be able to:

  • Understand the map-reduce framework for breaking down tasks for many computers to run.
  • Understand how to use Spark to process and transform larger, raw files.
  • Implement Spark SQL and Spark DataFrames to make it easy to work with large, unstructured datasets.
spark-map-reduce

Course Info:

Spark and Map-Reduce

Intermediate

The average completion time for this course is 10-hours.

This course requires a premium subscription and includes four paid missions, one challenge, and one installation tutorial.  It is the 31st course in the Data Scientist In Python path.

START LEARNING FREE

Learn to use Spark and Map-Reduce

Introduction to Spark

Learn the basics of Spark by analyzing guests on The Daily Show

Project: Spark Installation and Jupyter Notebook Integration

Learn how to set up PySpark and integrate it with Jupyter Notebook.

Transformations and Actions

Learn more about transformations and actions while cleaning up the text of Hamlet.

Challenge: Transforming Hamlet into a Data Set

Practice using Spark to transform the text of Hamlet into a usable data set.

Spark Dataframes

Learn the basics of Spark DataFrames by working with census data.

Spark SQL

Learn the basics of Spark SQL by working with census data.