Tag Archives for " Pandas "

Data Science Portfolio Project: Where to Advertise an E-learning Product

At Dataquest, we strongly advocate portfolio projects as a means of getting a first data science job. In this blog post, we’ll walk you through an example portfolio project. The project is part of our Statistics Intermediate: Averages and Variability course, and it assumes familiarity with: Sampling (populations, samples, sample representativity) Frequency distributions Box plots […]

Data Science Portfolio Project: Is Fandango Still Inflating Ratings?

At Dataquest, we strongly advocate portfolio projects as a means of getting your first data science job. In this blog post, we’ll walk you through an example portfolio project. The project is part of our Statistics Fundamentals course, and it assumes some familiarity with: Sampling (simple random sampling, populations, samples, parameters, statistics) Variables Frequency distributions […]

jupyter-notebooks-tutorial

Jupyter Notebook for Beginners: A Tutorial

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. A notebook integrates code and its output into a single document that combines visualisations, narrative text, mathematical equations, and other rich media. The intuitive workflow promotes iterative and rapid development, making notebooks an increasingly popular choice at the heart […]

Visualizing Women’s Marches: Part 1

In celebration of Women’s History Month, I wanted to better understand the scale of the Women’s Marches that occurred in January 2017. Shortly after the marches, Vox published a map visualizing the estimated turnout across the entire country. This map is excellent at displaying: locations with the highest relative turnouts hubs and clusters of where […]

Adding Axis Labels to Plots With pandas

Pandas plotting methods provide an easy way to plot pandas objects. Often though, you’d like to add axis labels, which involves understanding the intricacies of Matplotlib syntax. Thankfully, there’s a way to do this entirely using pandas. Let’s start by importing the required libraries: import pandas as pd import numpy as np import matplotlib.pyplot as […]

Kaggle Fundamentals: The Titanic Competition

Kaggle is a site where people create algorithms and compete against machine learning practitioners around the world. Your algorithm wins the competition if it’s the most accurate on a particular data set. Kaggle is a fun way to practice your machine learning skills. This tutorial is based on part of our free, four-part course: Kaggle […]

SQL Fundamentals

The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works well when: the data fits in memory (a few gigabytes but not terabytes) the data is relatively static (doesn’t need to be loaded into memory every minute because the data has changed) […]

Machine Learning Fundamentals: Predicting Airbnb Prices

Machine learning is easily one of the biggest buzzwords in tech right now. Over the past three years, Google searches for “machine learning” have increased by over 350%. But understanding machine learning can be difficult — you either use pre-built packages that act like ‘black boxes’ where you pass in data and magic comes out […]

Pandas Cheat Sheet — Python for Data Science

Pandas is arguably the most important Python package for data science. Not only does it give you lots of methods and functions that make working with data easier, but it has been optimized for speed which gives you a significant advantage compared with working with numeric data using Python’s built-in functions. It’s common when first […]