The Dataquest Download

Level up your data and AI skills, one newsletter at a time.

Each week, the Dataquest Download brings the latest behind-the-scenes developments at Dataquest directly to your inbox. Discover our top tutorial of the week to boost your data skills, get the scoop on any course changes, and pick up a useful tip to apply in your projects. We also spotlight standout projects from our students and share their personal learning journeys.

Hello, Dataquesters!

Here’s what we have for you in this edition:

Webinar Recordings: The 2-part series on predicting tech salaries using data from the 2023 Stack Overflow Developer Survey is now available. Clean, model, and interpret real-world salary data. Watch now

Top Read: Create your own data lab using Docker—an isolated, professional-grade workspace to run tools like PySpark and PostgreSQL without breaking your system. Read the blog

From the Community: Explore Hacker News posting trends, learn AI-powered study strategies, see how others visualize their data, and pick up tips on paired t-tests, data engineering, and Python functions. Join the discussion

What We’re Reading: Smart ways to combine AI with learning, OpenAI’s new policy updates, and a fun look at why language models love em dashes (a little too much). Learn more

Webinar Recordings

Learn how to predict tech salaries using real data from the 2023 Stack Overflow Developer Survey. In this hands-on walkthrough, you’ll clean and prepare a massive survey dataset (80K+ responses), engineer features, and build a regression model to uncover what actually impacts developer pay.

  • Watch Part 1: Data cleaning, feature engineering, and LLM-powered debugging
    Watch Part 2: Model training, evaluation, and salary insights

We recommend starting with the Machine Learning in Python path (also free this week) and build a strong foundation before diving into the project.

Running data tools like PySpark or PostgreSQL can get messy when your local setup conflicts with dependencies or operating system versions. That’s where a Docker-based lab environment comes in. It gives you a clean, isolated space to experiment safely while mirroring real-world data engineering workflows.

In this tutorial, you’ll learn how to create a self-contained development lab for your Dataquest tutorials using Docker. You’ll see how this setup keeps your system clean, ensures every library works with the correct version, and saves you from hours of troubleshooting environment errors.

By the end, you’ll have a dedicated workspace that runs consistently across Windows, macOS, and Linux—just like the environments professional data teams use for reliable, reproducible work.

From the Community

Exploring Hacker News Data: Melanie analyzed Hacker News posts to determine the best time to post. This is an excellent beginner data analysis project, with clean code, helpful comments, and a clear narrative that perfectly answers the research question.

Learning Data with AI: Neha encourages you to share how you’re using AI to learn data skills faster or build smarter workflows. Real examples, use cases, exact prompts, tools, or step-by-step processes are all welcome. What tasks do you still prefer to do manually?

Show & Tell Your Data Story: Raisa shared her approach to creating insightful visuals for her current project. She invites you to do the same. Share a dashboard, notebook, script, or chart you’re proud of to showcase your hard work and inspire your peers.

Paired T-Tests and Descriptive Analysis in Healthcare: Sam asks for best practices in coding paired t-tests and descriptive analyses for clinical research projects. Suggestions on code efficiency, reproducibility, and clearer presentation are welcome.

Tracking Learning Progress Across Platforms and Resources: Tarun is looking for strategies to organize, visualize, and evaluate his learning journey in a more structured way. Check out the great suggestions shared in the thread, and contribute your own methods for tracking progress.

Learning Data Engineering with Dataquest—Where to Start: Suheb asks for advice on the best order to study topics on Dataquest if he wants to focus on data engineering rather than data analysis.

Returning Multiple Values from a Python Function: James explains and demonstrates why a tuple is often the preferred way to return multiple values from a Python function.

What We're Reading

The Best Way to Use AI for Learning: AI is changing how we learn, but using it effectively takes more than asking for quick summaries. This article shows how to combine AI tools with visual note-taking and reflection to deepen understanding and retain complex ideas longer.

OpenAI Updates Usage Policies: OpenAI rolled out a unified policy covering all its tools under one rulebook. The update reinforces existing principles like no harm, deception, or exploitation, and clarifies that AI can’t be used to give licensed legal or medical advice without human oversight. Asking questions? Still fine. Running an AI-only law firm? Not fine.

Why Language Models Use So Many Em-Dashes: Language models lean heavily on em-dashes so much that humans who use them worry about being mistaken for AI. Surprisingly, it’s tough to prompt models to avoid them, and researchers aren’t sure why this habit persists.

Give 20%, Get $20: Time to Refer a Friend!

Give 20% Get $20

Now is the perfect time to share Dataquest with a friend. Gift a 20% discount, and for every friend who subscribes, earn a $20 bonus. Use your bonuses for digital gift cards, prepaid cards, or donate to charity. Your choice! Click here

High-fives from Vik, Celeste, Anna P, Anna S, Anishta, Bruno, Elena, Mike, Daniel, and Brayan.

2025-11-05

What really drives developer salaries?

Predict tech salaries, build a Docker lab for data work, explore AI learning tips, and see standout community projects this week. Read More
2025-10-29

Learn AI. Build with AI. Think with AI.

Explore embeddings for smarter AI search, see data projects from SQL to fintech apps, and learn how design shapes trust in charts. Read More
2025-10-26

Speed up your coding with Claude Code

Discover Claude Code for faster data work, explore RAG and AI “false memories,” and join community threads on projects and ML tips. Read More

Learn faster and retain more.
Dataquest is the best way to learn