How to Start a Data Science Meetup

Meetups are great tools, you’re able to meet people in the field, keep up on industry news, and learn how to ‘talk the talk.’ Before I started attending meetups I wasn’t aware of just how much I didn’t know and still had to learn, let alone what was missing in how I wrote code and performed my analyses. Attending meetups has helped me more than I expected in my career—both in what I’ve learned, and... »
Author's profile picture Enrique Bustamante in community

Kaggle Fundamentals: The Titanic Competition

Kaggle is a site where people create algorithms and compete against machine learning practitioners around the world. Your algorithm wins the competition if it’s the most accurate on a particular data set. Kaggle is a fun way to practice your machine learning skills. This tutorial is based on part of our free, four-part course: Kaggle Fundamentals. This interactive course is the most comprehensive introduction to Kaggle’s Titanic competition ever made. The course includes a certificate... »
Author's profile picture Josh Devlin in tutorials, python, and kaggle

Five Essential Traits of a Data Scientist

Trillions of pixels have been deployed to answer the question ‘What makes a good data scientist?’ Most of these articles have focused on skills and tools of data science while almost none have discussed the personalities that make good, even great, data scientists. A google search for “data science skills” returns 38 million results; ‘data scientist traits’ yields an anemic 938,000 results. Given the range of free, nearly-free, and paid training on the internet, just... »
Author's profile picture Charles Rice in resources

SQL Fundamentals

The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works well when: the data fits in memory (a few gigabytes but not terabytes) the data is relatively static (doesn’t need to be loaded into memory every minute because the data has changed) only a single person is accessing the data (shared access to memory is difficult) security isn’t important (security is critical... »
Author's profile picture Srini Kadamati in tutorials

Loading Data into Postgres using Python and CSVs

An introduction to Postgres with Python Data storage is one of (if not) the most integral parts of a data system. You will find hundreds of articles online detailing how to write insane SQL analysis queries, how to run complex machine learning algorithms on petabytes of training data, and how to build statistical models on thousands of rows in a database. The only problem is: no one mentions how you get the data stored in... »
Author's profile picture Spiro Sideris in tutorials

Explore Happiness Data Using Python Pivot Tables

One of the biggest challenges when facing a new data set is knowing where to start and what to focus on. Being able to quickly summarize hundreds of rows and columns can save you a lot of time and frustration. A simple tool you can use to achieve this is a pivot table, which helps you slice, filter, and group data at the speed of inquiry and represent the information in a visually appealing way.... »
Author's profile picture Michal Weizman in python

How to Generate FiveThirtyEight Graphs in Python

If you read data science articles, you may have already stumbled upon FiveThirtyEight’s content. Naturally, you were impressed by their awesome visualizations. You wanted to make your own awesome visualizations and so asked Quora and Reddit how to do it. You received some answers, but they were rather vague. You still can’t get the graphs done yourself. In this post, we’ll help you. Using Python’s matplotlib and pandas, we’ll see that it’s rather easy to... »
Author's profile picture Alexandru Olteanu in tutorials

Machine Learning Fundamentals: Predicting Airbnb Prices

Machine learning is easily one of the biggest buzzwords in tech right now. Over the past three years Google searches for “machine learning” have increased by over 350%. But understanding machine learning can be difficult — you either use pre-built packages that act like ‘black boxes’ where you pass in data and magic comes out the other end, or you have to deal with high level maths and linear algebra. This tutorial is designed to... »
Author's profile picture Josh Devlin in tutorials

What's New in v1.29: New Mission Interface, PayPal and more!

Our version 1.29 release is here and includes lots of new features to help enhance your learning experience. Over the past few months we’ve been tirelessly talking to students like you to learn how we can improve the mission interface. With this release, we are unveiling the results of this hard work. Other big changes in 1.29 include: Improved feedback for incorrect answers We now accept PayPal so international students can subscribe more easily Four... »
Author's profile picture Josh Devlin in updates

Python Cheat Sheet for Data Science: Intermediate

The printable version of this cheat sheet The tough thing about learning data is remembering all the syntax. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help you out! This cheat sheet is the companion to our Python Basics Data Science Cheat Sheet If you’re interested in learning Python, we have a free Python Programming:... »
Author's profile picture Josh Devlin in resources and guides

How to get your first job as a data scientist.

Many aspiring data scientists focus on doing Kaggle competitions as a way to build their portfolios. Kaggle is an excellent way to practice, but it should only be one of many avenues you use to work on data science projects. This is because Kaggle competitions only focus on a narrow part of data science work. To be more specific: Kaggle mostly deals with machine learning, which is only one aspect of Data Science. When you... »
Author's profile picture Josh Devlin in careers

SQL Intermediate: PostgreSQL, Subqueries and more!

If you’re in the early phases of learning SQL and have completed one or more introductory-level courses, you’ve probably learned most of the basic fundamentals and possibly even some high-level database concepts. As you prepare to embark on the next phase of learning SQL, it’s important to not only understand SQL itself, but also the engine that makes it all possible: the database. In most introductory-level courses, you’ll typically use some sort of embedded database... »
Author's profile picture Eric Sundby in tutorials, postgresql, and sql

Using pandas with large data

Tips for reducing memory usage by up to 90% When working using pandas with small data (under 100 megabytes), performance is rarely a problem. When we move to larger data (100 megabytes to multiple gigabytes), performance issues can make run times much longer, and cause code to fail entirely due to insufficient memory. While tools like Spark can handle large data sets (100 gigabytes to multiple terabytes), taking full advantage of their capabilities usually requires... »
Author's profile picture Josh Devlin in tutorial

Python Cheat Sheet for Data Science: Basics

The printable version of this cheat sheet It’s common when first learning Python for Data Science to have trouble remembering all the syntax that you need. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help you out! This cheat sheet is the companion to our Python Intermediate Data Science Cheat Sheet If you’re interested in... »
Author's profile picture Josh Devlin in resources and guides