The Dataquest Download

Level up your data and AI skills, one newsletter at a time.

Each week, the Dataquest Download brings the latest behind-the-scenes developments at Dataquest directly to your inbox. Discover our top tutorial of the week to boost your data skills, get the scoop on any course changes, and pick up a useful tip to apply in your projects. We also spotlight standout projects from our students and share their personal learning journeys.

Hello, Dataquesters!

Here’s what we have for you in this edition:

Top Read: How to chunk long documents for high-quality semantic search. Compare strategies, measure recall vs. relevance, and pick the right approach for your content. Read now

Webinar Recording: Build a TensorFlow model to predict IPO listing gains, from data exploration to preprocessing and modeling. Watch now

From the Community: An interactive Excel e-commerce dashboard, practical fixes for pandas’ SettingWithCopyWarning, a meteorology tracking question in Python, and a smart note-taking workflow using VS Code Jupyter notebooks. Join the discussion

What We’re Reading: Three lessons that turn metrics into action, leadership takeaways on AI and observability on Google Cloud, and how agentic AI is reshaping enterprise work. Learn more

Embeddings worked smoothly when each document was a short abstract. But full research papers, technical docs, and long-form guides are too large to embed as a single vector, and that’s where search quality starts to break unless you chunk intelligently.

In this tutorial, you’ll learn how to split long documents into chunks that work well for vector search. You’ll implement multiple chunking strategies, evaluate them systematically, and understand the tradeoffs between recall, relevance, and performance. By the end, you’ll know how to choose a chunking approach that fits your content and your search goals.

Webinar Recording

Watch now and learn how to build a deep learning model using TensorFlow to predict listing gains, applying skills in data exploration, visualization, preprocessing, and modeling.

From the Community

E-commerce Analytics Interactive Excel Dashboard: Israel’s individual Excel project stands out for its impressive variety of visualizations that effectively present global product sales performance and allow analysis from multiple perspectives.

Dealing with SettingWithCopyWarning in Pandas: Alla shared two useful resources on how to resolve the SettingWithCopyWarning issue when working on Python projects and modifying data in pandas DataFrames or Series.

Python-based Mesoscale Convective System Tracker Application: Femi, a Python beginner, asks a domain-specific question in meteorology about how to employ a Python tool for tracking mesoscale convective systems.

Using a VSCode Jupyter Notebook to Retain Learned Concepts: Tomaz shared his approach to learning data science by taking notes in a Jupyter Notebook within Visual Studio Code, helping him keep track of what he has learned and document his daily work.

What We're Reading

Three Machine Learning Lessons: Many teams sit on mountains of data yet still feel unsure about what to do next. This piece reveals how small shifts in mindset and practice can turn ordinary metrics into insights that actually move people to act.

The Future of AI, LLMs, and Observability on Google Cloud: Discover 7 key insights for leaders from our discussion with Google’s Director of AI, Dr. Ali Arsanjani, and Datadog’s VP of Engineering, Sajid Mehmood.

Inside the Agentic AI shift: A new report by Thoughtworks and WIRED that explores how enterprises are using AI agents to drive real results, manage risks, and stay ahead in the next wave of AI disruption.

Give 20%, Get $20: Time to Refer a Friend!

Give 20% Get $20

Now is the perfect time to share Dataquest with a friend. Gift a 20% discount, and for every friend who subscribes, earn a $20 bonus. Use your bonuses for digital gift cards, prepaid cards, or donate to charity. Your choice! Click here

High-fives from Vik, Celeste, Anna P, Anna S, Anishta, Bruno, Elena, Mike, Daniel, and Brayan.

2025-12-10

Stop embedding entire docs (do this instead)

Learn document chunking for better vector search and explore community tips from Excel dashboards to pandas fixes, plus insights on AI and ML. Read More
2025-12-04

Learn How to Use ChromaDB for Scalable Semantic Search

Learn scalable semantic search with ChromaDB, explore community projects in Excel, Power BI, and chatbots, and read insights on AI work. Read More
2025-11-27

What it takes to build real-world ETL systems

Learn to build an Airflow pipeline with live Amazon data, explore community projects in BI and ML, and read insights on AI coding, NLV, and LangChain. Read More

Learn faster and retain more.
Dataquest is the best way to learn