Spark and Map-Reduce

Find yourself working with massive data sets regularly? Learn how to use Apache Spark and the map-reduce technique to clean and analyze “big data” in this Apache Spark and PySpark course.

Big data is all around us and Spark is quickly becoming an in-demand Big Data tool that employers want to see in job applicants who’ll have to work with large data sets. If you want to work with cutting-edge, in-demand skills that employers will look fondly upon, taking this introductory Spark course is highly recommended.

In this course, you will learn what Apache Spark is and when it would be advantageous to use. You’ll learn such concepts as Resilient Distributed Datasets (RDDs), Spark SQL, Spark DataFrames, and the difference between pandas and Spark Dataframes.

You will also learn how to install Spark and PySpark, a Python API that allows you to interact with Spark using Python code. We will also walk you through how to integrate PySpark with Jupyter Notebook so you can analyze large datasets from the comfort of a Jupyter notebook.

In this course, you’ll be working with a variety of real-world data sets, including the text of Hamlet, census data, and guest data from The Daily Show.

We also offer a free tutorial on Apache Spark, which you can check out by clicking this link.

By the end of this course, you’ll be able to:

  • Understand the map-reduce framework for breaking down tasks for many computers to run.
  • Understand how to use Spark to process and transform larger, raw files.
  • Implement Spark SQL and Spark DataFrames to make it easy to work with large, unstructured datasets.

Spark and Map-Reduce Lessons List

Introduction to Spark

Learn the basics of Spark by analyzing guests on The Daily Show.

Project: Spark Installation and Jupyter Notebook Integration

Learn how to set up PySpark and integrate it with Jupyter Notebook.

Transformations and Actions

Learn more about transformations and actions while cleaning up the text of Hamlet.

Challenge: Transforming Hamlet into a Data Set

Practice using Spark to transform the text of Hamlet into a usable data set.

Spark DataFrames

Learn the basics of Spark DataFrames by working with census data.

Spark SQL

Learn the basics of Spark SQL by working with census data.