Pandas Cheat Sheet - Python for Data Science

Pandas is arguably the most important Python package for data science. Not only does it give you lots of methods and functions that make working with data easier, but it has been optimized for speed which gives you a significant advantage compared with working with numeric data using Python’s built-in functions. The printable version of this cheat sheet It’s common when first learning pandas to have trouble remembering all the functions and methods that you... »
Author's profile picture Josh Devlin in resources and guides

1 tip for effective data visualization in Python

Yes, you read correctly – this post will only give you 1 tip. I know most posts like this have 5 or more tips. I once saw a post with 15 tips, but I may have been daydreaming at the time. You’re probably wondering what makes this 1 tip so special. “Vik”, you may ask, “I’ve been reading posts that have 7 tips all day. Why should I spend the time and effort to read... »
Author's profile picture Vik Paruchuri in tutorials

What is Data Engineering?

This is the first in a series of posts on Data Engineering. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page. From helping cars drive themselves to helping Facebook tag you in photos, data science has attracted a lot of buzz recently. Data scientists have become extremely sought after, and for good reason – a skilled data scientist... »
Author's profile picture Vik Paruchuri in careers

How to present your data science portfolio on Github

This is the fifth and final post in a series of posts on how to build a Data Science Portfolio. In the previous posts in our portfolio series, we talked about how to build a storytelling project, how to create a data science blog, how to create a machine learning project, and how to construct a portfolio. In this post, we’ll discuss how to present and share your portfolio. You’ll learn how to showcase your... »
Author's profile picture Vik Paruchuri in tutorials, python, portfolio, and project

The Six Elements of the Perfect Data Science Learning Tool

When I launched Dataquest a little under two years ago, one of the first things I did was write a blog post about why. At the time, if you wanted to become a data scientist, you were confronted with dozens of courses on sites like edX or Coursera with no easy path to getting a job. I saw many promising students give up on learning data science because they got stuck in a loop of... »
Author's profile picture Vik Paruchuri in motivation and making-of

Machine Learning Walkthrough Part One: Preparing the Data

Cleaning and preparing data is a critical first step in any machine learning project. In this blog post, Dataquest student Daniel Osei’s takes us through examining a dataset, selecting columns for features, exploring the data visually and then encoding the features for machine learning. This post is based on a Dataquest ‘Monthly Challenge’, where our students are given a free-form task to complete. After first reading about Machine Learning on Quora in 2015, Daniel became... »
Author's profile picture Josh Devlin in tutorials

How to get a data science job

You’ve done it. You just spent months learning how to analyze data and make predictions. You’re now able to go from raw data to well structured insights in a matter of hours. After all that effort, you feel like it’s time to take the next step, and get your first data science job. Unfortunately for you, this is where the process starts to get much harder. There’s no clear path to go from having data... »
Author's profile picture Vik Paruchuri in careers

Pandas Tutorial: Data analysis with Python: Part 2

We covered a lot of ground in Part 1 of our pandas tutorial. We went from the basics of pandas DataFrames to indexing and computations. If you’re still not confident with Pandas, you might want to check out the Dataquest pandas Course. In this tutorial, we’ll dive into one of the most powerful aspects of pandas – its grouping and aggregation functionality. With this functionality, it’s dead simple to compute group summary statistics, discover patterns,... »
Author's profile picture Vik Paruchuri in tutorials and python

What's New in v1.10: Answer diffs, Improved Q&A!

Along with our two new data visualization courses (Exploratory Data Visualization and Storytelling Through Data Visualization) our latest release includes two major features designed to make your life easier - enhanced Q&A and answer diffing. Introducing: Output & Variable Diffing When you’re learning to code, it can be frustrating to be stuck on an exercise and not know what’s wrong. Our new variable and answer diffing is here to help: This new feature: Shows you... »
Author's profile picture Josh Devlin in updates

Python Web Scraping Tutorial using BeautifulSoup

When performing data science tasks, it’s common to want to use data found on the internet. You’ll usually be able to access this data in csv format, or via an Application Programming Interface(API). However, there are times when the data you want can only be accessed as part of a web page. In cases like this, you’ll want to use a technique called web scraping to get the data from the web page into a... »
Author's profile picture Vik Paruchuri in tutorials and python

I barely graduated college, and that's okay

I didn’t do very well in high school. My grade point average was around a 2.5 out of 4. I did well in some subjects that I was interested in, like math, computer science, and history, but everything else was a wash. The less homework a class required me to do, the better my grade ended up being. In most classes I ended up watching the wall clock slowly tick towards the time when we... »
Author's profile picture Vik Paruchuri in updates

What's New in Dataquest v1.9: Console, hotkeys and more!

Whenever you send us feedback or an ideas for a feature, we read and catalogue your suggestions. We then use this to help planning features and improvements for Dataquest. Today we’re excited to launch two of our most-requested features: Hotkeys and a Python Console. Introducing the Python console Many of you have told us that you’d like to be able to explore the datasets while you work through the missions. To help you do this,... »
Author's profile picture Josh Devlin in updates

Pandas Tutorial: Data analysis with Python: Part 1

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Pandas builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis and visualization work. In this introduction, we’ll use Pandas to analyze data on video game reviews from IGN, a popular... »
Author's profile picture Vik Paruchuri in tutorials and python

NumPy Tutorial: Data analysis with Python

NumPy is a commonly used Python data analysis package. By using NumPy, you can speed up your workflow, and interface with other packages in the Python ecosystem, like scikit-learn, that use NumPy under the hood. NumPy was originally developed in the mid 2000s, and arose from an even older package called Numeric. This longevity means that almost every data analysis or machine learning package for Python leverages NumPy in some way. In this tutorial, we’ll... »
Author's profile picture Vik Paruchuri in tutorials, python, and numpy

28 Jupyter Notebook tips, tricks and shortcuts

This post is based on a post that originally appeared on Alex Rogozhnikov’s blog, ‘Brilliantly Wrong’. We have expanded the post and will continue to do so over time - if you have a suggestion please let us know in the comments. Thanks to Alex for graciously letting us republish his work here. Jupyter Notebook Jupyter notebook, formerly known as the IPython notebook, is a flexible tool that helps you create readable analyses, as you... »
Author's profile picture Josh Devlin in resources and guides