How much have I spent on Amazon? That’s a scary question, but if you want to know the answer, here’s how you can find it…and a lot more!

Pandas is a Python library that can make data analysis much simpler. In this tutorial, we’ll use Python and pandas to analyze video game data.

Use this tutorial to learn how to create your first Jupyter Notebook, important terminology, and how easily notebooks can be shared and published online.

Learn how Dataquest’s philosophy sets our platform apart from other data science learning tools, and what we’ve learned from years of teaching data science.

If you’ve already mastered the basics of iterating through Python lists, take it to the next level and learn to use for loops in pandas, numpy, and more!

Learn about machine learning in Python and build your very first ML model from scratch to predict Airbnb prices using k-nearest neighbors.

Learn data cleaning for a machine learning project by cleaning and preparing loan data from LendingClub for a predictive analytics project.

Learn to do some text analysis in this Python tutorial, and test hypotheses using confidence intervals to insure your conclusions are significant.

Learn text classification using linear regression in Python using the spaCy package in this free machine learning tutorial.

Data cleaning might not be the reason you got interested in data science, but if you’re going to be a data scientist, no skill is more crucial. Learn how to clean data with Python and pandas in our new course.

you’re doing data science in Python, notebooks are a powerful tool. This free Jupyter Notebooks tutorial has will help you get the best out of Jupyter.

At Dataquest, we strongly advocate portfolio projects as a means of getting a first data science job. In this blog post, we’ll walk you through an example portfolio project. The project is part of our Statistics Intermediate: Averages and Variability course, and it assumes familiarity with: Sampling (populations, samples, sample representativity) Frequency distributions Box plots […]

At Dataquest, we strongly advocate portfolio projects as a means of getting your first data science job. In this blog post, we’ll walk you through an example portfolio project. The project is part of our Statistics Fundamentals course, and it assumes some familiarity with: Sampling (simple random sampling, populations, samples, parameters, statistics) Variables Frequency distributions […]

In celebration of Women’s History Month, I wanted to better understand the scale of the Women’s Marches that occurred in January 2017. Shortly after the marches, Vox published a map visualizing the estimated turnout across the entire country. This map is excellent at displaying: locations with the highest relative turnouts hubs and clusters of where […]

Pandas plotting methods provide an easy way to plot pandas objects. Often though, you’d like to add axis labels, which involves understanding the intricacies of Matplotlib syntax. Thankfully, there’s a way to do this entirely using pandas. Let’s start by importing the required libraries: import pandas as pd import numpy as np import matplotlib.pyplot as […]

In this tutorial, we walk through several methods of combining data tables (concatenation) using pandas and Python, working with labor market data.

In this tutorial, we’ll learn to work with Excel files in Python using pandas — everything from setting up your computer to moving and visualizing data.

In this tutorial, learn how to use regular expressions and the pandas library to manage large data sets during data analysis.

Kaggle is a site where people create algorithms and compete against machine learning practitioners around the world. Your algorithm wins the competition if it’s the most accurate on a particular data set. Kaggle is a fun way to practice your machine learning skills. This tutorial is based on part of our free, four-part course: Kaggle […]

The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works well when: the data fits in memory (a few gigabytes but not terabytes) the data is relatively static (doesn’t need to be loaded into memory every minute because the data has changed) […]