Learn text classification using linear regression in Python using the spaCy package in this free machine learning tutorial.
Error metrics are short and useful summaries of the quality of our data. We dive into four common regression metrics and discuss their use cases.
This Python data science tutorial uses a real-world data set to teach you how to diagnose and reduce bias and variance in machine learning.
Editor’s note: This post was updated in May 2018. At Dataquest, we provide an easy to use environment to start learning data science. This environment comes preconfigured with the latest version of Python, well known data science libraries, and a runnable code editor. It allows brand new data scientists, and experienced ones, to start running […]
Overview After lots of ground-breaking work led by the UC Berkeley AMP Lab, Apache Spark was developed to utilize distributed, in-memory data structures to improve data processing speeds over Hadoop for most workloads. In this post, we’re going to cover the architecture of Spark and basic transformations and actions using a real dataset. If you […]
In this post, we’ll be using the K-nearest neighbors algorithm to predict how many points NBA players scored in the 2013-2014 season. Along the way, we’ll learn about euclidean distance and figure out which NBA players are the most similar to Lebron James. If you want to follow along, you can grab the dataset in […]
Python has some powerful tools that enable you to do natural language processing (NLP). In this tutorial, we’ll learn about how to do some basic NLP in Python. Looking at the data We’ll be looking at a dataset consisting of submissions to Hacker News from 2006 to 2015. The data was taken from here. Arnaud […]
Sentiment analysis is a field dedicated to extracting subjective emotions and feelings from text. One common use of sentiment analysis is to figure out if a text expresses negative or positive feelings. Written reviews are great datasets for doing sentiment analysis because they often come with a score that can be used to train an […]
Clustering is a powerful way to split up datasets into groups based on similarity. A very popular clustering algorithm is K-means clustering. In K-means clustering, we divide data up into a fixed number of clusters while trying to ensure that the items in each cluster are as similar as possible. In this post, we’ll explore […]