The Dataquest Download

Level up your data and AI skills, one newsletter at a time.

Each week, the Dataquest Download brings the latest behind-the-scenes developments at Dataquest directly to your inbox. Discover our top tutorial of the week to boost your data skills, get the scoop on any course changes, and pick up a useful tip to apply in your projects. We also spotlight standout projects from our students and share their personal learning journeys.

Hello, Dataquesters!

Here’s what we have for you in this edition:

Top Read: Learn how to use ChromaDB for scalable semantic search. Load thousands of arXiv embeddings, use ANN indexing, and keep queries fast at real-world sizes. Read the blog

From the Community: Excel-powered sales cleaning and reporting, a Power BI learning report with actionable insights, and a deep dive on building more stable conversational chatbots. Join the discussion

What We’re Reading: MIT’s Project Iceberg on AI automation across 151M jobs, a three-year ChatGPT retrospective, chatting with chess games using Python plus LLMs, and the 1B-token recipe for better pre-training mixes. Learn more

In the previous embeddings tutorial, we built a semantic search system that matched research papers by meaning rather than keywords. It worked well for 500 papers. But that approach relied on brute-force comparisons, which slow down dramatically as your dataset grows. At 5,000 papers, performance drops. At 50,000 or 500,000, it becomes unusable.

This tutorial shows how ChromaDB solves that scaling problem. You’ll load thousands of arXiv embeddings, build a vector database with ANN indexing, and run semantic searches that stay fast even with large collections. It’s your next step toward production-ready search systems.

From the Community

Sales Data Cleaning and Manufacturer Analysis: In his portfolio-level individual project, Israel used advanced Excel formulas to transform a highly unstructured dataset into a clean, analysis-ready table and achieved a dynamic end-to-end workflow for automated sales reporting.

Dataquest Learning Report: Nisha’s Power BI project explores the data from multiple angles and features high-quality, insightful visualizations of Dataquest lesson completions, along with actionable recommendations to improve completion rates.

Developing and Training AI-powered Conversational Chatbots: Kritika examines the key factors that shape the performance of AI-driven conversational chatbots and contribute to more stable behavior in real-world interactions.

What We're Reading

MIT’s Project Iceberg: Mapping out  the entire U.S. workforce—151M jobs, 32,000 skills, and it found that 11.7% of jobs are already automatable with current AI tools. This is the clearest data-driven case yet for why reskilling and upskilling matter now.

Three Years of ChatGPT—A Retrospective (2022–2025): This article breaks down the current AI landscape: what’s working, where the challenges lie, and why productivity gains haven’t fully arrived (yet).

Talk to Your Chess Games with Python + LLMs: PyBay 2025 recently published all their sessions on YouTube covering a wide range of Python-related topics. In this video, the presenter showcases his program that connects ChatGPT to chess, allowing you to chat to understand why certain chess engine lines work or don’t work.

The 1 Billion Token Challenge—Finding the Perfect Pre-training Mix: This article breaks down how the right blend of training data can boost model quality without inflating dataset size. It highlights a practical strategy for mixing sources that produces strong results with far less data, which makes it a compelling read for anyone interested in how modern LLMs are really built.

Give 20%, Get $20: Time to Refer a Friend!

Give 20% Get $20

Now is the perfect time to share Dataquest with a friend. Gift a 20% discount, and for every friend who subscribes, earn a $20 bonus. Use your bonuses for digital gift cards, prepaid cards, or donate to charity. Your choice! Click here

High-fives from Vik, Celeste, Anna P, Anna S, Anishta, Bruno, Elena, Mike, Daniel, and Brayan.

2026-02-25

Top Data Analysis Tools Employers Look For

Learn the 10 data analysis tools that actually help you get hired, plus community insights and thoughtful reads on AI, Python, and LangChain. Read More
2026-02-19

The 40 questions interviewers actually ask

Learn 40+ data analyst interview questions with code, explore Community SQL and BI projects, and read practical career insights for 2026. Read More
2026-02-19

30 Questions Asked in a Data Engineer Interview

Learn 30+ entry-level data engineering interview questions with SQL, Python, ETL, and system design, watch an AI assistant build, and join Community fixes. Read More

Learn faster and retain more.
Dataquest is the best way to learn