Tag Archives for " Tutorials "

Advanced Jupyter Notebooks: A Tutorial

Lying at the heart of modern data science and analysis, Jupyte project lifecycle. Whether you’re rapidly prototyping ideas, demonstrating your work, or producing fully fledged reports, notebooks can provide an efficient edge over IDEs or traditional desktop applications. Following on from “Jupyter Notebook for Beginners: A Tutorial“, this guide will take you on a journey […]

11 Design Tips for Data Visualization

Here’s an important fact that’s easy to forget: our data is only as helpful as it is understandable. Most of the time, that means creating some kind of data visualization. And while a simple bar graph might cut it for internal work, making your data both visually understandable and visually attractive can help it get […]

Python Dictionary Tutorial

Python offers a variety of data structures to hold our information — the dictionary being one of the most useful. Python dictionaries quick, easy to use, and flexible. As a beginning programmer, you can use this Python tutorial to become familiar with dictionaries and their common uses so that you can start incorporating them immediately into […]

Basic Statistics in Python: Probability

When studying statistics, you will inevitably have to learn about probability. It is easy lose yourself in the formulas and theory behind probability, but it has essential uses in both working and daily life. We’ve previously discussed some basic concepts in descriptive statistics; now we’ll explore how statistics relates to probability. Prerequisites: Similar to the […]

Python Generators

Python generators are a powerful, but misunderstood tool. They’re often treated as too difficult a concept for beginning programmers to learn — creating the illusion that beginners should hold off on learning generators until they are ready. I think this assessment is unfair, and that you can use generators sooner than you think. In this […]

Jupyter Notebook for Beginners: A Tutorial

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. A notebook integrates code and its output into a single document that combines visualisations, narrative text, mathematical equations, and other rich media. The intuitive workflow promotes iterative and rapid development, making notebooks an increasingly popular choice at the heart […]

R Fundamentals: Building a Simple Grade Calculator

R is one of the most popular languages for statistical analysis, data science, and reporting. At Dataquest, we have been adding R courses (you can learn more in our recent update). For a comparison of R and Python, check out our analysis here. In this tutorial, we’ll teach you the basics of R by building […]

How to Install and Configure Docker Swarm on Ubuntu

Docker Swarm is a clustering tool that turns a group of Docker hosts into a single virtual server. Docker Swarm ensures availability and high performance for your application by distributing it over the number of Docker hosts inside a cluster. Docker Swarm also allows you to increase the number of container instance for the same […]

Introduction to Functional Programming in Python

Most of us have been introduced to Python as an object-oriented language; a language exclusively using classes to build our programs. While classes, and objects, are easy to start working with, there are other ways to write your Python code. Languages like Java can make it hard to move away from object-oriented thinking, but Python […]

Pandas Concatenation Tutorial

You’d be hard pressed to find a data science project which doesn’t require multiple data sources to be combined together. Often times, data analysis calls for appending new rows to a table, pulling additional columns in, or in more complex cases, merging distinct tables on a common key. All of these tricks are handy to […]

Using Excel with pandas

Excel is one of the most popular and widely-used data tools; it’s hard to find an organization that doesn’t work with it in some way. From analysts, to sales VPs, to CEOs, various professionals use Excel for both quick stats and serious data crunching. With Excel being so pervasive, data professionals must be familiar with […]

Regular Expressions for Data Scientists

As data scientists, diving headlong into huge heaps of data is part of the mission. Sometimes, this includes massive corpuses of text. For instance, suppose we were asked to figure out who’s been emailing whom in the scandal of the Panama Papers — we’d be sifting through 11.5 million documents! We could do that manually […]

Setting Up the PyData Stack on Windows

The speed of modern electronic devices allows us to crunch large amounts of data at home. However, these devices require the right software in order to reach peak performance. Luckily, it’s now easier than ever to set up your own data science environment. One of the most popular stacks for data science is PyData, a […]

SQL Fundamentals

The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works well when: the data fits in memory (a few gigabytes but not terabytes) the data is relatively static (doesn’t need to be loaded into memory every minute because the data has changed) […]

Loading Data into Postgres using Python and CSVs

An introduction to Postgres with Python Data storage is one of (if not) the most integral parts of a data system. You will find hundreds of articles online detailing how to write insane SQL analysis queries, how to run complex machine learning algorithms on petabytes of training data, and how to build statistical models on […]

How to Generate FiveThirtyEight Graphs in Python

If you read data science articles, you may have already stumbled upon FiveThirtyEight’s content. Naturally, you were impressed by their awesome visualizations. You wanted to make your own awesome visualizations and so asked Quora and Reddit how to do it. You received some answers, but they were rather vague. You still can’t get the graphs […]

Machine Learning Fundamentals: Predicting Airbnb Prices

Machine learning is easily one of the biggest buzzwords in tech right now. Over the past three years, Google searches for “machine learning” have increased by over 350%. But understanding machine learning can be difficult — you either use pre-built packages that act like ‘black boxes’ where you pass in data and magic comes out […]

SQL Intermediate: PostgreSQL, Subqueries, and more!

If you’re in the early phases of learning SQL and have completed one or more introductory-level courses, you’ve probably learned most of the basic fundamentals and possibly even some high-level database concepts. As you prepare to embark on the next phase of learning SQL, it’s important to not only understand SQL itself, but also the […]

Web Scraping with Python and BeautifulSoup

To source data for data science projects, you’ll often rely on SQL and NoSQL databases, APIs, or ready-made CSV data sets. The problem is that you can’t always find a data set on your topic, databases are not kept current and APIs are either expensive or have usage limits. If the data you’re looking for […]

SQL Basics: Working with Databases

SQL, pronounced “sequel” (or ess-cue-ell, if you prefer), is a very important tool for data scientists to have in their repertoire. You may well have heard the name and wondered what it is, how it works and whether you should learn it. To put it simply, SQL (Structured Query Language) is the language of databases […]

Building An Analytics Data Pipeline In Python

If you’ve ever wanted to work with streaming data, or data that changes quickly, you may be familiar with the concept of a data pipeline. Data pipelines allow you transform data from one representation to another through a series of steps. Data pipelines are a key part of data engineering, which we teach in our […]

Preparing and Cleaning Data for Machine Learning

Cleaning and preparing data is a critical first step in any machine learning project. In this blog post, Dataquest student Daniel Osei takes us through examining a dataset, selecting columns for features, exploring the data visually and then encoding the features for machine learning. After first reading about Machine Learning on Quora in 2015, Daniel […]

Pandas Tutorial: Data analysis with Python: Part 1

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Pandas builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis […]

NumPy Tutorial: Data analysis with Python

Don’t miss our FREE NumPy cheat sheet at the bottom of this post NumPy is a commonly used Python data analysis package. By using NumPy, you can speed up your workflow, and interface with other packages in the Python ecosystem, like scikit-learn, that use NumPy under the hood. NumPy was originally developed in the mid […]

digitalocean-docker

DigitalOcean and Docker for Data Science

Creating a cloud-based data science environment for faster analysis There are times when working on data science problems with your local machine just doesn’t cut it anymore. Maybe your computer is old, and can’t work with larger datasets. Or maybe you want to be able to access your work from anywhere, and collaborate with others. […]

Show Buttons
Share On Facebook
Share On Twitter
Share On Linkedin
Share On Reddit
Hide Buttons