In this beginner Python tutorial, we’ll take a look at mutable and immutable data types, and learn how to keep dictionaries and lists from being modified by our functions.
Data cleaning might not be the reason you got interested in data science, but if you’re going to be a data scientist, no skill is more crucial. Learn how to clean data with Python and pandas in our new course.
Learn to do a complete data analysis project using only basic Python to find out what genre of apps an app developer should focus on.
Last week, we launched a totally revamped version of our introductory Python course. Now, we’re doing the same for our intermediate Python course. Say hello to Python for Data Science: Intermediate! The new course is available now, and just like the introductory Python course, it’s completely free. This course has been carefully designed to build […]
Python is an amazingly versatile programming language. You can use it to build websites, machine learning algorithms, and even autonomous drones. A huge percentage of programmers in the world use Python, and for good reason. It gives you the power to create almost anything. But — and this is a big but — you have […]
Creating a cloud-based data science environment for faster analysis There are times when working on data science problems with your local machine just doesn’t cut it anymore. Maybe your computer is old, and can’t work with larger datasets. Or maybe you want to be able to access your work from anywhere, and collaborate with others. […]
Editor’s note: This post was updated in May 2018. At Dataquest, we provide an easy to use environment to start learning data science. This environment comes preconfigured with the latest version of Python, well known data science libraries, and a runnable code editor. It allows brand new data scientists, and experienced ones, to start running […]
The Python scientific stack is fairly mature, and there are libraries for a variety of use cases, including machine learning, and data analysis. Data visualization is an important part of being able to explore data and communicate results, but has lagged a bit behind other tools such as R in the past. Luckily, many new […]
At Dataquest, we’ve released an interactive course on Spark, with a focus on PySpark. We explore the fundamentals of Map-Reduce and how to utilize PySpark to clean, transform, and munge data. In this post, we’ll dive into how to install PySpark locally on your own computer and how to integrate it into the Jupyter Notebbok […]
Overview After lots of ground-breaking work led by the UC Berkeley AMP Lab, Apache Spark was developed to utilize distributed, in-memory data structures to improve data processing speeds over Hadoop for most workloads. In this post, we’re going to cover the architecture of Spark and basic transformations and actions using a real dataset. If you […]
Application Program Interfaces, or APIs, are commonly used to retrieve data from remote websites. Sites like Reddit, Twitter, and Facebook all offer certain data through their APIs. To use an API, you make a request to a remote web server, and retrieve the data you need. But why use an API instead of a static […]
Art is a messy business. Over centuries, artists have created everything from simple paintings to complex sculptures, and art historians have been cataloging everything they can along the way. The Museum of Modern Art, or MoMA for short, is considered one of the most influential museums in the world and recently released a dataset of […]
In this post, we’ll be using the K-nearest neighbors algorithm to predict how many points NBA players scored in the 2013-2014 season. Along the way, we’ll learn about euclidean distance and figure out which NBA players are the most similar to Lebron James. If you want to follow along, you can grab the dataset in […]
Python has some powerful tools that enable you to do natural language processing (NLP). In this tutorial, we’ll learn about how to do some basic NLP in Python. Looking at the data We’ll be looking at a dataset consisting of submissions to Hacker News from 2006 to 2015. The data was taken from here. Arnaud […]
The Python Counter Class The Counter class in Python is part of the collections module. Counter provides a fast way to count up the number of unique items that exist in a list. The Counter class can also be extended to represent probability mass functions and suites of bayesian hypotheses. A counter is a map […]
Sentiment analysis is a field dedicated to extracting subjective emotions and feelings from text. One common use of sentiment analysis is to figure out if a text expresses negative or positive feelings. Written reviews are great datasets for doing sentiment analysis because they often come with a score that can be used to train an […]
Clustering is a powerful way to split up datasets into groups based on similarity. A very popular clustering algorithm is K-means clustering. In K-means clustering, we divide data up into a fixed number of clusters while trying to ensure that the items in each cluster are as similar as possible. In this post, we’ll explore […]