The Dataquest Download
Level up your data and AI skills, one newsletter at a time.
Hello, Dataquesters!
Here’s what we have in store for you in this edition:
Top Read: Learn how to combine SQL and Python in PySpark to analyze decades of U.S. Census data with both flexibility and speed. Learn more
Project Walkthrough: Watch how to uncover traffic patterns on I-94 using real-world Python data viz. Watch now
From the Communtity: See how learners are visualizing life expectancy trends, predicting insurance costs, and choosing the right charts. Join the discussion
New Resources: Analyze Crunchbase startup funding data using Python and SQL. Learn efficient data processing and turn large CSVs into a fast, queryable SQLite database. Start the project
Use SQL or Python? With PySpark, You Don’t Have to Choose
Spark SQL lets you run familiar SQL queries on massive datasets using PySpark, without sacrificing performance. In this tutorial, you’ll analyze U.S. Census data across four decades, learn how to register DataFrames as SQL views, and build end-to-end analysis pipelines using both SQL and DataFrame syntax. Ideal for data pros who want flexibility without giving up speed.
Project Walkthrough
In this project, you’ll step into the role of a data analyst and uncover what causes heavy traffic on Minnesota’s I-94 highway. In this hands-on session, you’ll go through real-world traffic data using Python, pandas, matplotlib, and seaborn to visualize patterns across time, weather, and more.
From the Community
Life Expectancy and GDP Variation: Aleks used Power BI to explore global data from 1800 to 2010, creating powerful visuals that bring historical trends to life.
Predicting Insurance Costs: Steve’s machine learning project stands out for its detailed analysis, well-commented steps, stunning plots, and thoughtful exploration of model results.
Choosing the Right Chart for Your Data: Linky shared a helpful guide on how to select the best type of chart for different data tasks.
Using sqlite3 in Python: Linky explains how to use Python’s built-in sqlite3 library to run SQL queries and load results into pandas DataFrames in Jupyter Notebook.
Using the n() Function in R: Raisa gives a clear explanation of how to correctly apply the n() function within dplyr verbs in R.
DQ Resources
Analyzing Startup Fundraising Deals from Crunchbase: Learn how to process large CSV files efficiently by chunking, optimizing memory, and using encoding strategies, then turn the data into a fast, queryable SQLite database. Learn more
Answering Business Questions Using SQL: Learn how to analyze a digital music store’s data using SQL. This hands-on project with the Chinook database walks through real-world SQL tasks like tracking sales, evaluating employees, and spotting growth opportunities using CTEs, subqueries, and more. Learn more
Customer Segmentation Using K-Means Clustering: Discover how to group customers by behavior and demographics using K-means clustering. Learn to identify key segments, unlock actionable insights, and target your marketing more effectively. Learn more
What We're Reading
AI-Assisted Container Deployments with Amazon ECS: Learn how Amazon Q and the MCP Server bring AI into container deployment, offering automation tips for those diving into cloud-based workflows.
Interactive Data Exploration with Rerun: Explore a new Python library, Rerun, that makes it easier to debug and visualize computer vision pipelines with OpenCV, complete with code walkthroughs.
LLMs Don’t Just Predict the Next Word: This opinionated yet insightful read argues that LLMs are evolving into goal-driven agents, thanks to RLHF and instruction tuning. Worth a look if you’re learning how AI models really work.
Give 20%, Get $20: Time to Refer a Friend!
Give 20% Get $20
Now is the perfect time to share Dataquest with a friend. Gift a 20% discount, and for every friend who subscribes, earn a $20 bonus. Use your bonuses for digital gift cards, prepaid cards, or donate to charity. Your choice! Click here
High-fives from Vik, Celeste, Anna P, Anna S, Anishta, Bruno, Elena, Mike, Daniel, and Brayan.