The Dataquest Download

Level up your data and AI skills, one newsletter at a time.

Each week, the Dataquest Download brings the latest behind-the-scenes developments at Dataquest directly to your inbox. Discover our top tutorial of the week to boost your data skills, get the scoop on any course changes, and pick up a useful tip to apply in your projects. We also spotlight standout projects from our students and share their personal learning journeys.

Hello, Dataquesters!

Here’s what we have for you in this edition:

Webinar Recordings: The 2-part series on predicting tech salaries using data from the 2023 Stack Overflow Developer Survey is now available. Clean, model, and interpret real-world salary data. Watch now

Top Read: Create your own data lab using Docker—an isolated, professional-grade workspace to run tools like PySpark and PostgreSQL without breaking your system. Read the blog

From the Community: Explore Hacker News posting trends, learn AI-powered study strategies, see how others visualize their data, and pick up tips on paired t-tests, data engineering, and Python functions. Join the discussion

What We’re Reading: Smart ways to combine AI with learning, OpenAI’s new policy updates, and a fun look at why language models love em dashes (a little too much). Learn more

Webinar Recordings

Learn how to predict tech salaries using real data from the 2023 Stack Overflow Developer Survey. In this hands-on walkthrough, you’ll clean and prepare a massive survey dataset (80K+ responses), engineer features, and build a regression model to uncover what actually impacts developer pay.

  • Watch Part 1: Data cleaning, feature engineering, and LLM-powered debugging
    Watch Part 2: Model training, evaluation, and salary insights

We recommend starting with the Machine Learning in Python path (also free this week) and build a strong foundation before diving into the project.

Running data tools like PySpark or PostgreSQL can get messy when your local setup conflicts with dependencies or operating system versions. That’s where a Docker-based lab environment comes in. It gives you a clean, isolated space to experiment safely while mirroring real-world data engineering workflows.

In this tutorial, you’ll learn how to create a self-contained development lab for your Dataquest tutorials using Docker. You’ll see how this setup keeps your system clean, ensures every library works with the correct version, and saves you from hours of troubleshooting environment errors.

By the end, you’ll have a dedicated workspace that runs consistently across Windows, macOS, and Linux—just like the environments professional data teams use for reliable, reproducible work.

From the Community

Exploring Hacker News Data: Melanie analyzed Hacker News posts to determine the best time to post. This is an excellent beginner data analysis project, with clean code, helpful comments, and a clear narrative that perfectly answers the research question.

Learning Data with AI: Neha encourages you to share how you’re using AI to learn data skills faster or build smarter workflows. Real examples, use cases, exact prompts, tools, or step-by-step processes are all welcome. What tasks do you still prefer to do manually?

Show & Tell Your Data Story: Raisa shared her approach to creating insightful visuals for her current project. She invites you to do the same. Share a dashboard, notebook, script, or chart you’re proud of to showcase your hard work and inspire your peers.

Paired T-Tests and Descriptive Analysis in Healthcare: Sam asks for best practices in coding paired t-tests and descriptive analyses for clinical research projects. Suggestions on code efficiency, reproducibility, and clearer presentation are welcome.

Tracking Learning Progress Across Platforms and Resources: Tarun is looking for strategies to organize, visualize, and evaluate his learning journey in a more structured way. Check out the great suggestions shared in the thread, and contribute your own methods for tracking progress.

Learning Data Engineering with Dataquest—Where to Start: Suheb asks for advice on the best order to study topics on Dataquest if he wants to focus on data engineering rather than data analysis.

Returning Multiple Values from a Python Function: James explains and demonstrates why a tuple is often the preferred way to return multiple values from a Python function.

What We're Reading

The Best Way to Use AI for Learning: AI is changing how we learn, but using it effectively takes more than asking for quick summaries. This article shows how to combine AI tools with visual note-taking and reflection to deepen understanding and retain complex ideas longer.

OpenAI Updates Usage Policies: OpenAI rolled out a unified policy covering all its tools under one rulebook. The update reinforces existing principles like no harm, deception, or exploitation, and clarifies that AI can’t be used to give licensed legal or medical advice without human oversight. Asking questions? Still fine. Running an AI-only law firm? Not fine.

Why Language Models Use So Many Em-Dashes: Language models lean heavily on em-dashes so much that humans who use them worry about being mistaken for AI. Surprisingly, it’s tough to prompt models to avoid them, and researchers aren’t sure why this habit persists.

Give 20%, Get $20: Time to Refer a Friend!

Give 20% Get $20

Now is the perfect time to share Dataquest with a friend. Gift a 20% discount, and for every friend who subscribes, earn a $20 bonus. Use your bonuses for digital gift cards, prepaid cards, or donate to charity. Your choice! Click here

High-fives from Vik, Celeste, Anna P, Anna S, Anishta, Bruno, Elena, Mike, Daniel, and Brayan.

2025-12-10

Stop embedding entire docs (do this instead)

Learn document chunking for better vector search and explore community tips from Excel dashboards to pandas fixes, plus insights on AI and ML. Read More
2025-12-04

Learn How to Use ChromaDB for Scalable Semantic Search

Learn scalable semantic search with ChromaDB, explore community projects in Excel, Power BI, and chatbots, and read insights on AI work. Read More
2025-11-27

What it takes to build real-world ETL systems

Learn to build an Airflow pipeline with live Amazon data, explore community projects in BI and ML, and read insights on AI coding, NLV, and LangChain. Read More

Learn faster and retain more.
Dataquest is the best way to learn