Python Cheat Sheet for Data Science: Basics

The printable version of this cheat sheet It’s common when first learning Python for Data Science to have trouble remembering all the syntax that you need. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help you out! This cheat sheet is the companion to our Python Intermediate Data Science Cheat Sheet If you’re interested in... »
Author's profile picture Josh Devlin in resources and guides

Should I learn Python 2 or 3?

Image Credit: DigitalOcean One of the biggest sources of confusion and misinformation for people wanting to learn Python is which version they should learn. Should I learn Python 2.x or Python 3.x? Indeed, this is one of the questions we are asked most often at Dataquest, where we teach Python as part of our Data Science curriculum. This post gives some context behind the question, explains the pespective, and tells you which version you should... »
Author's profile picture Josh Devlin in python, data, and science

Understanding SettingwithCopyWarning in pandas

SettingWithCopyWarning is one of the most common hurdles people run into when learning pandas. A quick web search will reveal scores of Stack Overflow questions, GitHub issues and forum posts from programmers trying to wrap their heads around what this warning means in their particular situation. It’s no surprise that many struggle with this; there are so many ways to index pandas data structures, each with its own particular nuance, and even pandas itself does... »
Author's profile picture Benjamin Pryke in python

Web Scraping with Python and BeautifulSoup

To source data for data science projects, you’ll often rely on SQL and NoSQL databases, APIs, or ready-made CSV data sets. The problem is that you can’t always find a data set on your topic, databases are not kept current and APIs are either expensive or have usage limits. If the data you’re looking for is on an web page, however, then the solution to all these problems is web scraping. In this tutorial we’ll... »
Author's profile picture Alexandru Olteanu in tutorials and python

The tips and tricks I used to succeed on Kaggle

I learned machine learning through competing in Kaggle competitions. I entered my first competitions in 2011, with almost no data science knowledge. I soon ended up in fifth place out of a hundred or so in a stock trading competition. Over the next year, I won several competitions on automated essay scoring and bond price prediction, and placed well in others. Kaggle competitions require a unique blend of skill, luck, and teamwork to win. The... »
Author's profile picture Vik Paruchuri in data, science, and kaggle

What's the difference between a data analyst, scientist and engineer?

Data is increasingly shaping the systems that we interact with every day. Whether you’re using Siri, searching Google, or browsing your Facebook feed, you’re consuming the results of data analysis. Given its transformational ability, it’s no wonder that so many data-related roles have been created in the past few years. The responsibilities of these roles range from predicting the future, to finding patterns in the world around you, to building systems that manipulate millions of... »
Author's profile picture James Lee in careers

SQL Basics: Working with Databases

SQL, pronounced “sequel” (or ess-cue-ell, if you prefer), is a very important tool for data scientists to have in their repertoire. You may well have heard the name and wondered what it is, how it works and whether you should learn it. To put it simply, SQL (Structured Query Language) is the language of databases and almost all companies use databases to store their data. Because of this, no matter whether you prefer to use... »
Author's profile picture James Coe in tutorials, sqlite, and sql

Getting Started with Kaggle: House Prices Competition

Founded in 2010, Kaggle is a Data Science platform where users can share, collaborate, and compete. One key feature of Kaggle is “Competitions”, which offers users the ability to practice on real world data and to test their skills with, and against, an international community. This guide will teach you how to approach and enter a Kaggle competition, including exploring the data, creating and engineering features, building models, and submitting predictions. We’ll use Python 3... »
Author's profile picture Adam Massachi in tutorials, python, and kaggle

What's New in v1.19: Multiscreen, Concepts, Dataset Preview and More!

Our version 1.19 release includes new features designed to improve your learning experience. The first thing you may notice is a new look. We’ve made some design tweaks, including a new mission-text font which we think you’ll agree makes everything easier to read. Other big changes in v1.19 include: Multiscreen, so you can work more efficiently in-mission. Concepts, designed to help you take your learning to the next level. In-Mission Dataset Preview and Download. You... »
Author's profile picture Josh Devlin in updates

How to become a data scientist

Data science is one of the most buzzed about fields right now, and data scientists are in extreme demand. And with good reason – data scientists are doing everything from creating self-driving cars to automatically captioning images. Given all the interesting applications, it makes sense that data science is a very sought-after career. Data science is applied in many field, including in developing self-driving cars. If you’re reading this post, I’m assuming that you’d like... »
Author's profile picture Vik Paruchuri in resources and guides

NumPy Cheat Sheet - Python for Data Science

NumPy is the library that gives Python its ability to work with data at speed. Originally, launched in 1995 as ‘Numeric,’ NumPy is the foundation on which many important Python data science libraries are built, including Pandas, SciPy and scikit-learn. The printable version of this cheat sheet It’s common when first learning NumPy to have trouble remembering all the functions and methods that you need, and while at Dataquest we advocate getting used to consulting... »
Author's profile picture Josh Devlin in resources and guides

Turbocharge Your Data Acquisition using the data.world Python Library

When working with data, a key part of your workflow is finding and importing data sets. Being able to quickly locate data, understand it and combine it with other sources can be difficult. One tool to help with this is data.world, where you can search for, copy, analyze, and download data sets. In addition, you can upload your data to data.world and use it to collaborate with others. In this tutorial, we’re going to show... »
Author's profile picture Josh Devlin in python, tutorials, and project

Building An Analytics Data Pipeline In Python

If you’ve ever wanted to work with streaming data, or data that changes quickly, you may be familiar with the concept of a data pipeline. Data pipelines allow you transform data from one representation to another through a series of steps. Data pipelines are a key part of data engineering, which we teach in our new Data Engineer Path. A common use case for a data pipeline is figuring out information about the visitors to... »
Author's profile picture Vik Paruchuri in python and tutorials

What's New in v1.14: Data Engineering Path & Performance Improvements!

Our latest Dataquest release has over 20 new features, including many major performance improvements and the launch of our much-anticipated data engineering path. New Path: Data Engineering The first course in our Data Engineering Path is here! Data Engineering is a broad field which includes: Working with Big Data Architecting distributed systems Creating reliable pipelines Combining data sources Collaborating with data science teams and building the right solutions for them. If you’d like to find... »
Author's profile picture Josh Devlin in updates