Learn to do some text analysis in this Python tutorial, and test hypotheses using confidence intervals to insure your conclusions are significant.
Learn text classification using linear regression in Python using the spaCy package in this free machine learning tutorial.
In this beginner Python tutorial, we’ll take a look at mutable and immutable data types, and learn how to keep dictionaries and lists from being modified by our functions.
Poisson Regression can be a really useful tool if you know how and when to use it. In this tutorial we’re going to take a long look at Poisson Regression, what it is, and how R programmers can use it in the real world. Specifically, we’re going to cover:What Poisson Regression actually is and when we […]
Take the first step into image analysis in Python by using k-means clustering to analyze the dominant colors in an image in this free data science tutorial.
In this tutorial, we will learn about the powerful time series tools in the pandas library. Originally developed for financial time series such as daily stock market prices, the robust and flexible data structures in pandas can be applied to time series data in any domain, including business, science, engineering, public health, and many others. […]
In recent weeks, news of the devastating wildfires sweeping parts of the US state of California have featured prominently in the news. While most wildfires are started accidentally by humans, weather conditions like wind and drought can exacerbate fires’ spread and intensity. Improved understanding of historical wildfire trends and causes can inform fire management and […]
Math is like an octopus: it has tentacles that can reach out and touch just about every subject. And while some subjects only get a light brush, others get wrapped up like a clam in the tentacles’ vice-like grip. Data science falls into the latter category. If you want to do data science, you’re going […]
Scikit-learn is a free machine learning library for Python. It features various algorithms like support vector machine, random forests, and k-neighbours, and it also supports Python numerical and scientific libraries like NumPy and SciPy. In this tutorial we will learn to code python and apply Machine Learning with the help of the scikit-learn library, which […]
This post was written by Carolina Bento. She leads Data Analytics teams that empower companies to make data-driven decisions, and currently manages Product Analytics team at eero. This article was originally posted on Medium, and has been reposted with permission. We learn a lot of interesting and useful concepts in school but sometimes it’s not […]
Learn to use Python dictionaries to store, sort, and access data in this in-depth tutorial analyzing craft beer data to master dictionary techniques.
Error metrics are short and useful summaries of the quality of our data. We dive into four common regression metrics and discuss their use cases.
Explore statistics for data science by learning probability is, normal distributions, and the z-score — all within the context of analyzing wine data.
Learn how to do descriptive statistics in Python with this in-depth tutorial that covers the basics (mean, median, and mode) and more advanced topics.
This post is a short overview of a dozen Unix-like operating system command line tools which can be useful for data science tasks. The list does not include any general file management commands (pwd, ls, mkdir, rm, …) or remote session management tools (rsh, ssh, …), but is instead made up of utilities which would […]
Python generators are a powerful, but misunderstood tool. They’re often treated as too difficult a concept for beginning programmers to learn — creating the illusion that beginners should hold off on learning generators until they are ready. I think this assessment is unfair, and that you can use generators sooner than you think. In this […]
The data science life cycle is generally comprised of the following components: data retrieval data cleaning data exploration and visualization statistical or predictive modeling While these components are helpful for understanding the different phases, they don’t help us think about our programming workflow. Often, the entire data science life cycle ends up as an arbitrary […]
Advancing your skills is an important part of being a data scientist. When starting out, you mostly focus on learning a programming language, proper use of third party tools, displaying visualizations, and the theoretical understanding of statistical algorithms. The next step is to test your skills on more difficult data sets. Sometimes these data sets […]