Celebrate 11 Years of Dataquest – Free Access Week + Unlimited Learning
The Dataquest Download
Level up your data and AI skills, one newsletter at a time.
Hello, Dataquesters!
Here’s what we have for you in this edition:
Webinar Recordings: The 2-part series on predicting tech salaries using data from the 2023 Stack Overflow Developer Survey is now available. Clean, model, and interpret real-world salary data. Watch now
Top Read: Create your own data lab using Docker—an isolated, professional-grade workspace to run tools like PySpark and PostgreSQL without breaking your system. Read the blog
From the Community: Explore Hacker News posting trends, learn AI-powered study strategies, see how others visualize their data, and pick up tips on paired t-tests, data engineering, and Python functions. Join the discussion
What We’re Reading: Smart ways to combine AI with learning, OpenAI’s new policy updates, and a fun look at why language models love em dashes (a little too much). Learn more
Webinar Recordings
Learn how to predict tech salaries using real data from the 2023 Stack Overflow Developer Survey. In this hands-on walkthrough, you’ll clean and prepare a massive survey dataset (80K+ responses), engineer features, and build a regression model to uncover what actually impacts developer pay.
- Watch Part 1: Data cleaning, feature engineering, and LLM-powered debugging
Watch Part 2: Model training, evaluation, and salary insights
We recommend starting with the Machine Learning in Python path (also free this week) and build a strong foundation before diving into the project.

Running data tools like PySpark or PostgreSQL can get messy when your local setup conflicts with dependencies or operating system versions. That’s where a Docker-based lab environment comes in. It gives you a clean, isolated space to experiment safely while mirroring real-world data engineering workflows.
In this tutorial, you’ll learn how to create a self-contained development lab for your Dataquest tutorials using Docker. You’ll see how this setup keeps your system clean, ensures every library works with the correct version, and saves you from hours of troubleshooting environment errors.
By the end, you’ll have a dedicated workspace that runs consistently across Windows, macOS, and Linux—just like the environments professional data teams use for reliable, reproducible work.
From the Community
Exploring Hacker News Data: Melanie analyzed Hacker News posts to determine the best time to post. This is an excellent beginner data analysis project, with clean code, helpful comments, and a clear narrative that perfectly answers the research question.
Learning Data with AI: Neha encourages you to share how you’re using AI to learn data skills faster or build smarter workflows. Real examples, use cases, exact prompts, tools, or step-by-step processes are all welcome. What tasks do you still prefer to do manually?
Show & Tell Your Data Story: Raisa shared her approach to creating insightful visuals for her current project. She invites you to do the same. Share a dashboard, notebook, script, or chart you’re proud of to showcase your hard work and inspire your peers.
Paired T-Tests and Descriptive Analysis in Healthcare: Sam asks for best practices in coding paired t-tests and descriptive analyses for clinical research projects. Suggestions on code efficiency, reproducibility, and clearer presentation are welcome.
Tracking Learning Progress Across Platforms and Resources: Tarun is looking for strategies to organize, visualize, and evaluate his learning journey in a more structured way. Check out the great suggestions shared in the thread, and contribute your own methods for tracking progress.
Learning Data Engineering with Dataquest—Where to Start: Suheb asks for advice on the best order to study topics on Dataquest if he wants to focus on data engineering rather than data analysis.
Returning Multiple Values from a Python Function: James explains and demonstrates why a tuple is often the preferred way to return multiple values from a Python function.
What We're Reading
The Best Way to Use AI for Learning: AI is changing how we learn, but using it effectively takes more than asking for quick summaries. This article shows how to combine AI tools with visual note-taking and reflection to deepen understanding and retain complex ideas longer.
OpenAI Updates Usage Policies: OpenAI rolled out a unified policy covering all its tools under one rulebook. The update reinforces existing principles like no harm, deception, or exploitation, and clarifies that AI can’t be used to give licensed legal or medical advice without human oversight. Asking questions? Still fine. Running an AI-only law firm? Not fine.
Why Language Models Use So Many Em-Dashes: Language models lean heavily on em-dashes so much that humans who use them worry about being mistaken for AI. Surprisingly, it’s tough to prompt models to avoid them, and researchers aren’t sure why this habit persists.
Give 20%, Get $20: Time to Refer a Friend!
Give 20% Get $20
Now is the perfect time to share Dataquest with a friend. Gift a 20% discount, and for every friend who subscribes, earn a $20 bonus. Use your bonuses for digital gift cards, prepaid cards, or donate to charity. Your choice! Click here
High-fives from Vik, Celeste, Anna P, Anna S, Anishta, Bruno, Elena, Mike, Daniel, and Brayan.