MISSION 264

Pipeline Tasks

In the previous mission, we learned about functional programming. We briefly spoke about the requirements of tasks, and how a combination of tasks combine so we can push forward with building a data pipeline.

In this mission, we will build on the functional programming concepts we learned, and construct a real pipeline from scratch. The goal of our pipeline will be to take log lines from a text file, and create a summary CSV file of unique HTTP request types and their associated counts. Each line from the text file is from an NGINX log file.

To begin learning how to construct a real pipeline from scratch, we're going to learn about special iterable types in Python called generators. Then, we will use these generators to build a high-performing data pipeline.

Learning how to construct a data pipeline is a critical task for any data engineer. Because it's so critical and difficult to understand how to build a data pipeline without any hands-on practice, you’ll get to apply what you’ve learned from within your browser; there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you've fully mastered each concept before learning the next.

Objectives

  • Learn to build tasks functions in a pipeline.
  • Learn how to write real-world functions using the functional paradigm.

Mission Outline

1. Overview
2. Generators in Python
3. Generator Comprehension
4. Manipulating Generators in Tasks
5. Data Cleaning in Parse Log
6. Write to CSV
7. Chaining Iterators
8. Counting Unique Request Types
9. Task Reusability
10. Next Steps
11. Takeaways

building-a-data-pipeline

Course Info:

Intermediate

The median completion time for this course is 6.3 hours. ​View Details​​​

This course requires a premium subscription. This course has four missions, and one guided project.  It is the seventh course in the Data Engineer Path.

START LEARNING FREE

Take a Look Inside