The Dataquest Download

Level up your data and AI skills, one newsletter at a time.

Each week, the Dataquest Download brings the latest behind-the-scenes developments at Dataquest directly to your inbox. Discover our top tutorial of the week to boost your data skills, get the scoop on any course changes, and pick up a useful tip to apply in your projects. We also spotlight standout projects from our students and share their personal learning journeys.

Hello, Dataquesters!

Here’s what we have for you in this edition:

Top Read: Take your embedding skills further by learning how to measure semantic similarity and build a functional AI-powered search engine—no keyword matching required. Read now

From the Community: Explore multidisciplinary NLP research in Telugu, traffic-pattern analysis, clean-code best practices, domain-specific R projects, and discussions on data type pitfalls and global healthcare innovation. Share your ideas on Python projects for beginners and interview prep forums. Join the discussion

What We’re Reading: LLM poisoning explained, plus a practical talk on building a personalized “Spotify Wrapped” using Elasticsearch and time-series analysis. Learn more

In our previous tutorial, we generated embeddings for 500 arXiv papers—now it’s time to put them to work. In this tutorial, you’ll learn how to measure meaning, not keywords, by comparing vectors using three essential similarity metrics. Then you’ll build a functional semantic search engine that ranks papers based on relevance, not literal matches. If you want to move from “I have embeddings” to “I can build real AI search systems,” this is your next step.

From the Community

The Metrical Poetry in Telugu: Boddu shared a research project featuring a new Python library and a website, implementing the metrical poetry in the Telugu language (India). The project is an excellent example of multidisciplinarity, combining computer science and linguistics to explore the characteristics of a language.

Beating the Queue—Exploring the I-94 Dataset: Melanie’s project showcases a meaningful and eye-catching title, an in-depth exploration of the effects of weather conditions, weekdays, and holiday seasons on traffic, an easy-to-read narrative, and a concise summary of the key factors that help predict traffic patterns.

Forums for Data Science Interview Discussions: Sagar is looking for dedicated communities and platforms where data scientists actively share their interview experiences and lessons learned, to make interview preparation easier and to get a sense of what to expect from companies.

Looking for Project Ideas for Python Beginners: Suheb is asking for ideas on small and simple Python projects that one can build right after learning the basics of Python, to practice new skills and strengthen understanding.

Writing Clean and Readable Python Code: Join your peers in a discussion about best practices for writing Python code that is not only technically correct but also easy to read, understand, and maintain—both for its author and for current and future colleagues.

Data Science Resources and Real-World Projects: Artur shared a collection of helpful data science resources (t-test functions, linear regression, and Quarto), along with three real-world, domain-specific research projects in R on bioinformatics topics that he has personally worked on.

Python Data Type Conversion Pitfalls: Check out this discussion to explore the kinds of mistakes that can occur when converting data in Python from one type to another (such as strings to numbers or floats to integers), and how to prevent such issues.

Building in Healthcare, Longevity Science, and Workforce Development: Venkatesh is working on ambitious global projects aimed at making healthcare affordable for all, combating aging, and creating opportunities for underrepresented entrepreneurs—and is open to collaboration.

What We're Reading

LLM poisoning: Researchers are discovering that even small changes to a model’s training data can secretly influence how it behaves. This article explains how “LLM poisoning” works, why it matters for the safety of AI systems, and what steps might prevent it in the future.

Building my own (accurate!) Spotify Wrapped: EuroPython 2025 conference recently made all their sessions available for viewing on YouTube. In this session, the speaker creates her own version of “Spotify Wrapped” using Elasticsearch to analyze her own musical trends and insights using the user generated data. From queries, filters, aggregations, visualizations, and time series analysis, she explores how search analytics can be used for everyday cases.

Give 20%, Get $20: Time to Refer a Friend!

Give 20% Get $20

Now is the perfect time to share Dataquest with a friend. Gift a 20% discount, and for every friend who subscribes, earn a $20 bonus. Use your bonuses for digital gift cards, prepaid cards, or donate to charity. Your choice! Click here

High-fives from Vik, Celeste, Anna P, Anna S, Anishta, Bruno, Elena, Mike, Daniel, and Brayan.

2025-11-20

Build a real semantic search engine

Learn semantic similarity, build an AI search engine, explore community NLP and traffic insights, and read fresh takes on LLM poisoning. Read More
2025-11-12

Real data workflows: Airflow, TensorFlow, and more

Build an Airflow pipeline, explore community dashboards and projects, and read about AI, LangChain, and reinforcement learning. Read More
2025-11-05

What really drives developer salaries?

Predict tech salaries, build a Docker lab for data work, explore AI learning tips, and see standout community projects this week. Read More

Learn faster and retain more.
Dataquest is the best way to learn