In the previous lesson, we learned about functional programming. We briefly spoke about the requirements of tasks, and how a combination of tasks combine so we can push forward with building a data pipeline.
In this lesson, we will build on the functional programming concepts we learned, and construct a real pipeline from scratch. The goal of our pipeline will be to take log lines from a text file, and create a summary CSV file of unique HTTP request types and their associated counts. Each line from the text file is from an NGINX log file.
To begin learning how to construct a real pipeline from scratch, we’re going to learn about special iterable types in Python called generators. Then, we will use these generators to build a high-performing data pipeline.
Learning how to construct a data pipeline is a critical task for any data engineer. Because it’s so critical and difficult to understand how to build a data pipeline without any hands-on practice, you’ll get to apply what you’ve learned from within your browser; there’s no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you’ve fully mastered each concept before learning the next.
- Learn to build tasks functions in a pipeline.
- Learn how to write real-world functions using the functional paradigm.
- Generators in Python
- Generator Comprehension
- Manipulating Generators in Tasks
- Data Cleaning in Parse Log
- Write to CSV
- Chaining Iterators
- Counting Unique Request Types
- Task Reusability
- Next Steps