Take the first step into image analysis in Python by using k-means clustering to analyze the dominant colors in an image in this free data science tutorial.
Deep learning is a type of machine learning that’s growing at an almost frightening pace. Nearly every projection has the deep learning industry expanding massively over the next decade. This market research report, for example, expects deep learning to grow 71x in the US and more than that globally over the next ten years. There’s […]
Python offers a variety of data structures to hold our information — the dictionary being one of the most useful. Python dictionaries quick, easy to use, and flexible. As a beginning programmer, you can use this Python tutorial to become familiar with dictionaries and their common uses so that you can start incorporating them immediately into […]
Error metrics are short and useful summaries of the quality of our data. We dive into four common regression metrics and discuss their use cases.
Getting into Machine Learning and AI is not an easy task, but is a critical part of data science programs. Many aspiring professionals and enthusiasts find it hard to establish a proper path into the field, given the enormous amount of resources available today. The field is evolving constantly and it is crucial that we […]
Learn how to do descriptive statistics in Python with this in-depth tutorial that covers the basics (mean, median, and mode) and more advanced topics.
Python generators are a powerful, but misunderstood tool. They’re often treated as too difficult a concept for beginning programmers to learn — creating the illusion that beginners should hold off on learning generators until they are ready. I think this assessment is unfair, and that you can use generators sooner than you think. In this […]
The data science life cycle is generally comprised of the following components: data retrieval data cleaning data exploration and visualization statistical or predictive modeling While these components are helpful for understanding the different phases, they don’t help us think about our programming workflow. Often, the entire data science life cycle ends up as an arbitrary […]
Advancing your skills is an important part of being a data scientist. When starting out, you mostly focus on learning a programming language, proper use of third party tools, displaying visualizations, and the theoretical understanding of statistical algorithms. The next step is to test your skills on more difficult data sets. Sometimes these data sets […]
Ed Hawkins, a climate scientist, tweeted the following animated visualization in 2017 and captivated the world: This visualization shows the deviations from the average temperature between 1850 and 1900. It was reshared millions of times over Twitter and Facebook and a version of it was even shown at the opening ceremony for the Rio Olympics. […]
The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. A notebook integrates code and its output into a single document that combines visualisations, narrative text, mathematical equations, and other rich media. The intuitive workflow promotes iterative and rapid development, making notebooks an increasingly popular choice at the heart […]
These days, many businesses use cloud based services; as a result various companies have started building and providing such services. Amazon began the trend, with Amazon Web Services (AWS). While AWS began in 2006 as a side business, it now makes $14.5 billion in revenue each year. Other leaders in this area include: Google—Google Cloud […]
Most of us have been introduced to Python as an object-oriented language; a language exclusively using classes to build our programs. While classes, and objects, are easy to start working with, there are other ways to write your Python code. Languages like Java can make it hard to move away from object-oriented thinking, but Python […]
Stacking models in Python efficiently Ensembles have rapidly become one of the hottest and most popular methods in applied machine learning. Virtually every winning Kaggle solution features them, and many data science pipelines have ensembles in them. Put simply, ensembles combine predictions from different models to generate a final prediction, and the more models we […]
In previous blog posts, we have described the Postgres database and ways to interact with it using Python. Those posts provided the basics, but if you want to work with databases in production systems, then it is necessary to know how to make your queries faster and more efficient. To understand what efficiency means in […]
This Python data science tutorial uses a real-world data set to teach you how to diagnose and reduce bias and variance in machine learning.