How to Generate FiveThirtyEight Graphs in Python

If you read data science articles, you may have already stumbled upon FiveThirtyEight’s content. Naturally, you were impressed by their awesome visualizations. You wanted to make your own awesome visualizations and so asked Quora and Reddit how to do it. You received some answers, but they were rather vague. You still can’t get the graphs done yourself. In this post, we’ll help you. Using Python’s matplotlib and pandas, we’ll see that it’s rather easy to... »
Author's profile picture Alexandru Olteanu in tutorials

Machine Learning Fundamentals: Predicting Airbnb Prices

Machine learning is easily one of the biggest buzzwords in tech right now. Over the past three years Google searches for “machine learning” have increased by over 350%. But understanding machine learning can be difficult — you either use pre-built packages that act like ‘black boxes’ where you pass in data and magic comes out the other end, or you have to deal with high level maths and linear algebra. This tutorial is designed to... »
Author's profile picture Josh Devlin in tutorials

What's New in v1.29: New Mission Interface, PayPal and more!

Our version 1.29 release is here and includes lots of new features to help enhance your learning experience. Over the past few months we’ve been tirelessly talking to students like you to learn how we can improve the mission interface. With this release, we are unveiling the results of this hard work. Other big changes in 1.29 include: Improved feedback for incorrect answers We now accept PayPal so international students can subscribe more easily Four... »
Author's profile picture Josh Devlin in updates

Python Cheat Sheet for Data Science: Intermediate

The printable version of this cheat sheet The tough thing about learning data is remembering all the syntax. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help you out! This cheat sheet is the companion to our Python Basics Data Science Cheat Sheet If you’re interested in learning Python, we have a free Python Programming:... »
Author's profile picture Josh Devlin in resources and guides

How to get your first job as a data scientist.

Many aspiring data scientists focus on doing Kaggle competitions as a way to build their portfolios. Kaggle is an excellent way to practice, but it should only be one of many avenues you use to work on data science projects. This is because Kaggle competitions only focus on a narrow part of data science work. To be more specific: Kaggle mostly deals with machine learning, which is only one aspect of Data Science. When you... »
Author's profile picture Josh Devlin in careers

SQL Intermediate: PostgreSQL, Subqueries and more!

If you’re in the early phases of learning SQL and have completed one or more introductory-level courses, you’ve probably learned most of the basic fundamentals and possibly even some high-level database concepts. As you prepare to embark on the next phase of learning SQL, it’s important to not only understand SQL itself, but also the engine that makes it all possible: the database. In most introductory-level courses, you’ll typically use some sort of embedded database... »
Author's profile picture Eric Sundby in tutorials, postgresql, and sql

Using pandas with large data

Tips for reducing memory usage by up to 90% When working using pandas with small data (under 100 megabytes), performance is rarely a problem. When we move to larger data (100 megabytes to multiple gigabytes), performance issues can make run times much longer, and cause code to fail entirely due to insufficient memory. While tools like Spark can handle large data sets (100 gigabytes to multiple terabytes), taking full advantage of their capabilities usually requires... »
Author's profile picture Josh Devlin in tutorial

Python Cheat Sheet for Data Science: Basics

The printable version of this cheat sheet It’s common when first learning Python for Data Science to have trouble remembering all the syntax that you need. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help you out! This cheat sheet is the companion to our Python Intermediate Data Science Cheat Sheet If you’re interested in... »
Author's profile picture Josh Devlin in resources and guides

Should I learn Python 2 or 3?

Image Credit: DigitalOcean One of the biggest sources of confusion and misinformation for people wanting to learn Python is which version they should learn. Should I learn Python 2.x or Python 3.x? Indeed, this is one of the questions we are asked most often at Dataquest, where we teach Python as part of our Data Science curriculum. This post gives some context behind the question, explains the pespective, and tells you which version you should... »
Author's profile picture Josh Devlin in python, data, and science

Understanding SettingwithCopyWarning in pandas

SettingWithCopyWarning is one of the most common hurdles people run into when learning pandas. A quick web search will reveal scores of Stack Overflow questions, GitHub issues and forum posts from programmers trying to wrap their heads around what this warning means in their particular situation. It’s no surprise that many struggle with this; there are so many ways to index pandas data structures, each with its own particular nuance, and even pandas itself does... »
Author's profile picture Benjamin Pryke in python

Web Scraping with Python and BeautifulSoup

To source data for data science projects, you’ll often rely on SQL and NoSQL databases, APIs, or ready-made CSV data sets. The problem is that you can’t always find a data set on your topic, databases are not kept current and APIs are either expensive or have usage limits. If the data you’re looking for is on an web page, however, then the solution to all these problems is web scraping. In this tutorial we’ll... »
Author's profile picture Alexandru Olteanu in tutorials and python

The tips and tricks I used to succeed on Kaggle

I learned machine learning through competing in Kaggle competitions. I entered my first competitions in 2011, with almost no data science knowledge. I soon ended up in fifth place out of a hundred or so in a stock trading competition. Over the next year, I won several competitions on automated essay scoring and bond price prediction, and placed well in others. Kaggle competitions require a unique blend of skill, luck, and teamwork to win. The... »
Author's profile picture Vik Paruchuri in data, science, and kaggle

What's the difference between a data analyst, scientist and engineer?

Data is increasingly shaping the systems that we interact with every day. Whether you’re using Siri, searching Google, or browsing your Facebook feed, you’re consuming the results of data analysis. Given its transformational ability, it’s no wonder that so many data-related roles have been created in the past few years. The responsibilities of these roles range from predicting the future, to finding patterns in the world around you, to building systems that manipulate millions of... »
Author's profile picture James Lee in careers

SQL Basics: Working with Databases

SQL, pronounced “sequel” (or ess-cue-ell, if you prefer), is a very important tool for data scientists to have in their repertoire. You may well have heard the name and wondered what it is, how it works and whether you should learn it. To put it simply, SQL (Structured Query Language) is the language of databases and almost all companies use databases to store their data. Because of this, no matter whether you prefer to use... »
Author's profile picture James Coe in tutorials, sqlite, and sql