In the previous mission, we learned about functional programming. We briefly spoke about the requirements of tasks, and how a combination of tasks combine so we can push forward with building a data pipeline.
In this mission, we will build on the functional programming concepts we learned, and construct a real pipeline from scratch. The goal of our pipeline will be to take log lines from a text file, and create a summary CSV file of unique HTTP request types and their associated counts. Each line from the text file is from an NGINX log file.
To begin learning how to construct a real pipeline from scratch, we're going to learn about special iterable types in Python called generators. Then, we will use these generators to build a high-performing data pipeline.
Learning how to construct a data pipeline is a critical task for any data engineer. Because it's so critical and difficult to understand how to build a data pipeline without any hands-on practice, you’ll get to apply what you’ve learned from within your browser; there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you've fully mastered each concept before learning the next.
2. Generators in Python
3. Generator Comprehension
4. Manipulating Generators in Tasks
5. Data Cleaning in Parse Log
6. Write to CSV
7. Chaining Iterators
8. Counting Unique Request Types
9. Task Reusability
10. Next Steps