A collection of the best places to find free data sets for data visualization, data cleaning, machine learning, and data processing projects.
Understand Dataquest’s teaching principles and make an informed choice about using the platform on your journey learning data science.
Public education, at least in the US, only ever exposes you to one way of learning—top-down hierarchy, and assignments. While some people thrive in this environment, I am not one of them. After college, I was thrown into the world with a 2.1 GPA, no real skills, and no idea of what to do next. […]
I learned machine learning through competing in Kaggle competitions. I entered my first competitions in 2011, with almost no data science knowledge. I soon ended up in fifth place out of a hundred or so in a stock trading competition. Over the next year, I won several competitions on automated essay scoring and bond price […]
Data science is one of the most buzzed about fields right now, and data scientists are in extreme demand. And with good reason — data scientists are doing everything from creating self-driving cars to automatically captioning images. Given all the interesting applications, it makes sense that data science is a very sought-after career. Data science […]
If you’ve ever wanted to learn python online with streaming data, or data that changes quickly, you may be familiar with the concept of a data pipeline. Data pipelines allow you transform data from one representation to another through a series of steps. Data pipelines are a key part of data engineering, which we teach […]
Yes, you read correctly — this post will only give you 1 tip. I know most posts like this have 5 or more tips. I once saw a post with 15 tips, but I may have been daydreaming at the time. You’re probably wondering what makes this 1 tip so special. “Vik”, you may ask, […]
This is the fifth and final post in a series of posts on how to build a Data Science Portfolio. In the previous posts in our portfolio series, we talked about how to build a storytelling project, how to create a data science blog, how to create a machine learning project, and how to construct […]
When I launched Dataquest a little under two years ago, one of the first things I did was write a blog post about why. At the time, if you wanted to become a data scientist, you were confronted with dozens of courses on sites like edX or Coursera with no easy path to getting a […]
You’ve done it. You just spent months learning how to analyze data and make predictions. You’re now able to go from raw data to well structured insights in a matter of hours. After all that effort, you feel like it’s time to take the next step, and get your first data science job. Unfortunately for […]
We covered a lot of ground in Part 1 of our pandas tutorial. We went from the basics of pandas DataFrames to indexing and computations. If you’re still not confident with Pandas, you might want to check out the Dataquest pandas Course. In this tutorial, we’ll dive into one of the most powerful aspects of […]
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Pandas builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis […]
Don’t miss our FREE NumPy cheat sheet at the bottom of this post NumPy is a commonly used Python data analysis package. By using NumPy, you can speed up your workflow, and interface with other packages in the Python ecosystem, like scikit-learn, that use NumPy under the hood. NumPy was originally developed in the mid […]
In this post, you’ll learn to query, update, and create SQLite databases in Python. And learn how to use the pandas package to speed up your workflow.