The Dataquest Download
Level up your data and AI skills, one newsletter at a time.
Hello, Dataquesters!
Here’s what we have in store for you in this edition:
Top tutorial: Say goodbye to “it worked on my machine” problems. Learn to set up PostgreSQL with Docker. Read the blog
Community highlights: Peak traffic analysis, predicting heart disease, and Python plotting tips. Join the discussion
Resource spotlight: Learn how to transform and analyze large-scale data with core RDD operations. Read the blog
Say Goodbye to “It Worked on My Machine” Problems
Tired of environment issues derailing your projects? This tutorial shows how Docker can help you build consistent, portable setups for your data tools. Learn how to spin up a PostgreSQL database inside a container, connect to it, persist data, and manage everything with ease. No permanent installs required.
From the Community
Peak Patterns in Urban Traffic: Ifeoma’s project features a clear title, focused analysis, clean code, and concise conclusions that make it easy to follow and insightful.
Predicting Heart Disease: Steve’s well-structured project combines strong EDA, visualizations, and thoughtful reflections on model results and limitations.
New Community Moderator Intern: Linky has been promoted to Community Moderator Intern. Learn more about the internship program and how you can get involved.
Estimating Memory Usage by Data: Anna shares a smart way to estimate memory needs before loading large datasets, helping you manage resources more efficiently.
Removing Horizontal Grid Lines in Python: Linky explains how and when to remove horizontal grid lines in matplotlib bar plots for cleaner visuals.
DQ Resources
Work with RDDs in PySpark: Learn how to transform and analyze large-scale data with core RDD operations, understand DAGs for optimized processing, and discover when RDDs still make sense in modern workflows. Learn more
Automate and Monitor ETL Pipelines Locally (Part I): Build a fully functional ETL pipeline running locally with Apache Airflow and Docker. Automate data tasks, monitor them through a visual UI, and quickly identify and fix any issues. No more manual runs or missed jobs. Learn more
Launch a Scalable, Cloud-Hosted ETL Pipeline (Part II): Deploy your ETL workflow to the cloud using AWS. Production-ready Airflow setup that includes cloud storage (S3), a relational database (RDS), IAM roles, and secure infrastructure, built to scale and run reliably. Learn more
What We're Reading
Data Science on Google Cloud: This high-level guide walks you through the entire data science workflow—ingestion, processing, modeling, and activation. Learn how tools like Dataflow, Vertex AI, and Looker support each phase and drive real-world results.
Which Python Vowel Check Is Fastest: Think checking for vowels is simple? Think again. Professor Austin Henley puts 11 different Python strategies to the test, from loops to regex to recursion, to find the fastest approach. The results might surprise you.
Claude 4’s Leaked Prompt Reveals AI’s Guardrails: A leaked system prompt from Claude 4 reveals how tightly these models are controlled. From formatting rules to behavior limits, this post uncovers how AI personalities are engineered behind the scenes.
Give 20%, Get $20: Time to Refer a Friend!
Give 20% Get $20
Now is the perfect time to share Dataquest with a friend. Gift a 20% discount, and for every friend who subscribes, earn a $20 bonus. Use your bonuses for digital gift cards, prepaid cards, or donate to charity. Your choice! Click here
High-fives from Vik, Celeste, Anna P, Anna S, Anishta, Bruno, Elena, Mike, Daniel, and Brayan.