Course

You'll learn how to:

Start this course today

Build hands-on data skills with interactive exercises and projects.

Sign up

About this course

Find yourself working with massive data sets regularly? Learn how to use Apache Spark and the map-reduce technique to clean and analyze “big data” in this Apache Spark and PySpark course.

Big data is all around us and Spark is quickly becoming an in-demand Big Data tool that employers want to see in job applicants who’ll have to work with large data sets. If you want to work with cutting-edge, in-demand skills that employers will look fondly upon, taking this introductory Spark course is highly recommended.

In this course, you will learn what Apache Spark is and when it would be advantageous to use. You’ll learn such concepts as Resilient Distributed Datasets (RDDs), Spark SQL, Spark DataFrames, and the difference between pandas and Spark Dataframes.

You will also learn how to install Spark and PySpark, a Python API that allows you to interact with Spark using Python code. We will also walk you through how to integrate PySpark with Jupyter Notebook so you can analyze large datasets from the comfort of a Jupyter notebook.

In this course, you’ll be working with a variety of real-world data sets, including the text of Hamlet, census data, and guest data from The Daily Show.

We also offer a free tutorial on Apache Spark, which you can check out by clicking this link.

By the end of this course, you’ll be able to:

  • Understand the map-reduce framework for breaking down tasks for many computers to run.
  • Understand how to use Spark to process and transform larger, raw files.
  • Implement Spark SQL and Spark DataFrames to make it easy to work with large, unstructured datasets.

Lessons in this course

Loading lessons....

Thousands of learners have changed their careers with Dataquest

97%

Learners who recommend
Dataquest for career advancement

4.9 stars

Dataquest rating on
G2Crowd and SwitchUp

$30k

Average salary boost
for learners who complete a path

Join a community of 1M+ data learners on Dataquest

1

Sign up for a free account

Get access to hundreds of free lessons.

2

Choose a course or path

Start anywhere, from beginner topics to advanced concepts.

3

Learn with hands-on exercises

Learn with real data and build your experience.

Apply your skills

Create projects, build your portfolio, and build your career.

Sign up today

or