Eric: “I wanted something practical”

Eric Sales De Andrade came to Dataquest via Quora. “I read a response from Vik and he seemed to know what he was writing about.” At the time, he worked in data mining— “But it was just putting stuff in a database. I wanted to get real with data.” He had originally tried DataCamp and a machine […]

Mike: “I wanted to grow my skills”

Mike Roberts didn’t plan on becoming a data scientist. After getting a degree in physics, he gave art management and professional poker playing a shot before becoming a BI analyst. He joined Dataquest to beef up those BI skills, but quickly learned he could switch to a much more interesting role within data science. For […]

Christian: “Consistency is Key”

After working in Business Intelligence, Christian L’Heureux took a break from data science. When he returned, as part of his MBA program, he found a changed world. He knew he needed to get up to speed quickly on the new landscape — especially Python. After trying DataCamp and CodeAcademy, he found Dataquest. “I liked the way […]

Using Box Plots to Explore Women’s Height Data

I’ve recently been working on the Digital Panopticon, a digital history project that has brought together (and created) massive amounts of data about British prisoners and convicts in the long 19th century, including several datasets which include heights for women. Adult height is strongly influenced by environmental factors in childhood, one of the most important […]

Five Reasons for Historians to Learn R

In which I do some cheerleading for the R Project for Statistical Computing. 1. You’re almost certain to find it worth the effort Often, in the endless “should academics learn to code” debate, it’s not clear to newcomers what you can actually use this code for once you’ve invested a lot of time in learning […]

Learning From Bank Data: Women Across the World

This post looks at the World Bank World Development Indicators (WDI). This massive collection has data in several categories: demographic, education, work, poverty, health. It includes both country-level data and various aggregates by different criteria: geographical regions, income levels, etc. The UK Data Service has a useful guide as well as access to the data. […]

Jupyter Notebook for Beginners: A Tutorial

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. A notebook integrates code and its output into a single document that combines visualisations, narrative text, mathematical equations, and other rich media. The intuitive workflow promotes iterative and rapid development, making notebooks an increasingly popular choice at the heart […]

Python Regular Expressions Cheat Sheet

The tough thing about learning data is remembering all the syntax. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help you out! This cheat sheet is based on Python 3’s documentation on regular expressions. If […]

Visualizing Women’s Marches: Part 2

This post is the second in a series on visualizing the Women’s Marches from January 2017. In the first post, we explored the intensive data collection and data cleaning process necessary to produce clean pandas dataframes. Data Enrichment Because we eventually want to be able to build maps visualizing the marches, we need latitude and […]

Exploring Women’s Army Auxiliary Corps Data

Today I want to go on an excursion in “catalogues as data“. The UK National Archives’ Discovery catalogue is an excellent resource for this activity, because a) it has a lot of records that have document descriptions at ‘item’ or ‘piece’ level in the catalogue, containing quite structured information (like dates, places, occupations) that can […]

Visualizing Women’s Marches: Part 1

In celebration of Women’s History Month, I wanted to better understand the scale of the Women’s Marches that occurred in January 2017. Shortly after the marches, Vox published a map visualizing the estimated turnout across the entire country. This map is excellent at displaying: locations with the highest relative turnouts hubs and clusters of where […]

5 Ways to Find Interesting Data Sets

Editor’s note: This post was written as part of a collaboration with Enigma, a public data company. Author India Kerle is a data curator at Enigma. There are a canon of open datasets used widely in data science projects — you’ve likely come across something making use of the Iris Flower classic or New York’s […]

R Fundamentals: Building a Simple Grade Calculator

R is one of the most popular languages for statistical analysis, data science, and reporting. At Dataquest, we have been adding R courses (you can learn more in our recent update). For a comparison of R and Python, check out our analysis here. In this tutorial, we’ll teach you the basics of R by building […]

Data Science Terms and Jargon: A Glossary

Getting started in data science can be overwhelming, especially when you consider the variety of concepts and techniques a data scienctist needs to master in order to do her job effectively. Even the term “data science” can be somewhat nebulous, and as the field gains popularity it seems to lose definition. To help those new […]

How to Install and Configure Docker Swarm on Ubuntu

Docker Swarm is a clustering tool that turns a group of Docker hosts into a single virtual server. Docker Swarm ensures availability and high performance for your application by distributing it over the number of Docker hosts inside a cluster. Docker Swarm also allows you to increase the number of container instance for the same […]

Fixing Education: Motivation and Freedom

Public education, at least in the US, only ever exposes you to one way of learning—top-down hierarchy, and assignments. While some people thrive in this environment, I am not one of them. After college, I was thrown into the world with a 2.1 GPA, no real skills, and no idea of what to do next. […]

10 Data Science Projects You Can Join Today

Editor’s note: This post was written as part of a collaboration with data.world, a site for sharing and hosting data. Authors Shannon Peifer and Gabriela Swider are on the data.world team. Finding the right data can be difficult. And even once you have it, how do you collaborate with others to make sense of it? […]

Introduction to AWS for Data Scientists

These days, many businesses use cloud based services; as a result various companies have started building and providing such services. Amazon began the trend, with Amazon Web Services (AWS). While AWS began in 2006 as a side business, it now makes $14.5 billion in revenue each year. Other leaders in this area include: Google—Google Cloud […]

Introduction to Functional Programming in Python

Most of us have been introduced to Python as an object-oriented language; a language exclusively using classes to build our programs. While classes, and objects, are easy to start working with, there are other ways to write your Python code. Languages like Java can make it hard to move away from object-oriented thinking, but Python […]

How to Write a Bootcamp Review that Actually Helps People

Editor’s note: This post was written as part of a collaboration with SwitchUp, an online platform for researching and reviewing technology learning programs. Erica Freedman is a Content and Client Services Specialist at SwitchUp. Data Science is a rapidly growing industry. From university programs to week-long cohorts, it can be difficult to decide where to […]

Want a Job in Data? Learn This.

Why mastering a 50-year-old programming language is the key to getting a data science job. SQL is old. There, I said it. I first heard about SQL in 1997. I was in high school, and as part of a computing class we were working with databases in Microsoft Access. The computers we used were outdated, […]

Write for Dataquest

The Dataquest blog is read by over 100,000 readers each month — this is an opportunity for you to get your work seen, and grow your platform. You don’t have to be a professional writer to join our Community Writers Program. We will work with you to revise, refine, and publish your post to share […]

Introduction to Python Ensembles

Stacking models in Python efficiently Ensembles have rapidly become one of the hottest and most popular methods in applied machine learning. Virtually every winning Kaggle solution features them, and many data science pipelines have ensembles in them. Put simply, ensembles combine predictions from different models to generate a final prediction, and the more models we […]

Postgres Internals: Building a Description Tool

In previous blog posts, we have described the Postgres database and ways to interact with it using Python. Those posts provided the basics, but if you want to work with databases in production systems, then it is necessary to know how to make your queries faster and more efficient. To understand what efficiency means in […]

Priya: “Dataquest helped me to help others”

Priya Iyer decided to learn data science so that she could better help people. Her startup, Tulalens, operated for two years and raised $100k towards helping women in urban slums. Tulelens helped the women launch small businesses that sold iron-rich foods, and shared information on iron-deficiency anemia. Priya and her partners realized that they could be […]

Luiz: “Dataquest helped me learn on my own schedule”

Luiz Zanini was working as a Mechatronics Engineer when he decided he needed a radical career change. He was frustrated by corporate life, and knew he’d be happier as a programmer. When researching new career paths, Data Scientist stood out—he really liked Python and dreamt of a digital nomad lifestyle. “I made a list of the […]

Learning Curves for Machine Learning

Diagnose Bias and Variance to Reduce Error When building machine learning models, we want to keep error as low as possible. Two major sources of error are bias and variance. If we managed to reduce these two, then we could build more accurate models. But how do we diagnose bias and variance in the first […]

Adding Axis Labels to Plots With pandas

Pandas plotting methods provide an easy way to plot pandas objects. Often though, you’d like to add axis labels, which involves understanding the intricacies of Matplotlib syntax. Thankfully, there’s a way to do this entirely using pandas. Let’s start by importing the required libraries: import pandas as pd import numpy as np import matplotlib.pyplot as […]

Pandas Concatenation Tutorial

You’d be hard pressed to find a data science project which doesn’t require multiple data sources to be combined together. Often times, data analysis calls for appending new rows to a table, pulling additional columns in, or in more complex cases, merging distinct tables on a common key. All of these tricks are handy to […]

Using Excel with pandas

Excel is one of the most popular and widely-used data tools; it’s hard to find an organization that doesn’t work with it in some way. From analysts, to sales VPs, to CEOs, various professionals use Excel for both quick stats and serious data crunching. With Excel being so pervasive, data professionals must be familiar with […]

Regular Expressions for Data Scientists

As data scientists, diving headlong into huge heaps of data is part of the mission. Sometimes, this includes massive corpuses of text. For instance, suppose we were asked to figure out who’s been emailing whom in the scandal of the Panama Papers — we’d be sifting through 11.5 million documents! We could do that manually […]

Setting Up the PyData Stack on Windows

The speed of modern electronic devices allows us to crunch large amounts of data at home. However, these devices require the right software in order to reach peak performance. Luckily, it’s now easier than ever to set up your own data science environment. One of the most popular stacks for data science is PyData, a […]

How to Start a Data Science Meetup

Meetups are great tools, you’re able to meet people in the field, keep up on industry news, and learn how to ‘talk the talk.’ Before I started attending meetups I wasn’t aware of just how much I didn’t know and still had to learn, let alone what was missing in how I wrote code and […]

Share On Facebook
Share On Twitter
Share On Linkedin
Share On Reddit