Big data is all around us, and Spark is quickly becoming an in-demand Big Data tool that employers want to see.
In this course, you’ll learn the advantages of Apache Spark. You’ll learn concepts such as Resilient Distributed Datasets (RDDs), Spark SQL, Spark DataFrames, and the difference between pandas and Spark DataFrames.
You’ll also learn how to install Spark and PySpark, a Python API that allows you to interact with Spark using Python code. You’ll learn how to integrate PySpark with Jupyter Notebook so you can analyze large datasets.
Best of all, you’ll learn by doing — you’ll practice and get feedback directly in the browser. You’ll work with a variety of real-world datasets, including the text of Hamlet, census data, and guest data from The Daily Show.
Dataquest has helped thousands of people start new careers in data. If you put in the work and follow our course, you'll master data skills and grow your career.
We believe so strongly in our courses that we offer a full satisfaction guarantee. If you complete a career course on Dataquest and aren't satisfied with your outcome, we'll give you a refund.
Learn a proven method to acquire and use data skills.Read Article
Read more about what learners think of Dataquest.Read Article
A step by step guide to learn and master Python.Read Article
65 different Python practice exercises you can start today.Read Article
Learners who recommend
Dataquest for career advancement
Dataquest rating on
G2Crowd and SwitchUp
Average salary boost
for learners who complete a path