Vik Paruchuri
Vik Paruchuri
Author Archives: Vik Paruchuri

Python vs R: Head to Head Data Analysis

Which is better for data analysis? There have been dozens of articles written comparing Python and R from a subjective standpoint. This article aims to look at the languages more objectively. We’ll analyze a data set side by side in Python and R, and show what code is needed in both languages to achieve the […]

Fixing Education: Motivation and Freedom

Public education, at least in the US, only ever exposes you to one way of learning—top-down hierarchy, and assignments. While some people thrive in this environment, I am not one of them. After college, I was thrown into the world with a 2.1 GPA, no real skills, and no idea of what to do next. […]

The Tips and Tricks I used to succeed on Kaggle

I learned machine learning through competing in Kaggle competitions. I entered my first competitions in 2011, with almost no data science knowledge. I soon ended up in fifth place out of a hundred or so in a stock trading competition. Over the next year, I won several competitions on automated essay scoring and bond price […]

How to become a data scientist

Data science is one of the most buzzed about fields right now, and data scientists are in extreme demand. And with good reason — data scientists are doing everything from creating self-driving cars to automatically captioning images. Given all the interesting applications, it makes sense that data science is a very sought-after career. Data science […]

Building An Analytics Data Pipeline In Python

If you’ve ever wanted to work with streaming data, or data that changes quickly, you may be familiar with the concept of a data pipeline. Data pipelines allow you transform data from one representation to another through a series of steps. Data pipelines are a key part of data engineering, which we teach in our […]

What is Data Engineering?

This is the first in a series of posts on Data Engineering. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page. From helping cars drive themselves to helping Facebook tag you in photos, data science has attracted a […]

How to get a data science job

You’ve done it. You just spent months learning how to analyze data and make predictions. You’re now able to go from raw data to well structured insights in a matter of hours. After all that effort, you feel like it’s time to take the next step, and get your first data science job. Unfortunately for […]

Pandas Tutorial: Data analysis with Python: Part 1

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Pandas builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis […]

NumPy Tutorial: Data analysis with Python

Don’t miss our FREE NumPy cheat sheet at the bottom of this post NumPy is a commonly used Python data analysis package. By using NumPy, you can speed up your workflow, and interface with other packages in the Python ecosystem, like scikit-learn, that use NumPy under the hood. NumPy was originally developed in the mid […]

How Rob Hipps used Dataquest to land a data analysis job

To highlight how Dataquest has changed people’s lives, we’ve started a new blog series called User Stories where we interview our users to learn more about their personal journey and how we’ve helped them get where they needed to. In this post, we interview Rob Hipps, Data Analyst at 3M. Rob went through a Master’s […]

Learn Python the right way in 5 steps

Python is an amazingly versatile programming language. You can use it to build websites, machine learning algorithms, and even autonomous drones. A huge percentage of programmers in the world use Python, and for good reason. It gives you the power to create almost anything. But — and this is a big but — you have […]

Python for data science: Getting started

Python is becoming an increasingly popular language for data science, and with good reason. It’s easy to learn, has powerful data science libraries, and integrates well with databases and tools like Hadoop and Spark. With Python, we can perform the full lifecycle of data science projects, including reading data in, analyzing data, visualizing data, and […]

How Helena Tan went from Investment Banking to Data Science

To highlight how Dataquest has changed people’s lives, we’ve started a new blog series called User Stories where we interview our users to learn more about their personal journeys. In this post, we interview Helena Tan, Data Analyst at Fitbit. Helena worked in a variety of roles in private equity and investment banking after graduating […]

How Data Scientist Yassine Alouini Keeps His Skills Sharp

To highlight how Dataquest has changed people’s lives, we’ve started a new blog series called User Stories where we interview our users to learn more about their personal journey and how we’ve helped them get where they needed to. In this post, we interview Yassine Alouini, Data Scientist at Qucit. Yassine got into data science […]

DigitalOcean and Docker for Data Science

Creating a cloud-based data science environment for faster analysis There are times when working on data science problems with your local machine just doesn’t cut it anymore. Maybe your computer is old, and can’t work with larger datasets. Or maybe you want to be able to access your work from anywhere, and collaborate with others. […]

Python data visualization: Comparing 7 tools

The Python scientific stack is fairly mature, and there are libraries for a variety of use cases, including machine learning, and data analysis. Data visualization is an important part of being able to explore data and communicate results, but has lagged a bit behind other tools such as R in the past. Luckily, many new […]

Machine Learning with Python: A Tutorial

Machine learning is a field that uses algorithms to learn from data and make predictions. Practically, this means that we can feed data into an algorithm, and use it to make predictions about what might happen in the future. This has a vast range of applications, from self-driving cars to stock price prediction. Not only […]

Tutorial: Introduction to Using APIs in Python

Application Program Interfaces, or APIs, are commonly used to retrieve data from remote websites. Sites like Reddit, Twitter, and Facebook all offer certain data through their APIs. To use an API, you make a request to a remote web server, and retrieve the data you need. But why use an API instead of a static […]


Tutorial: K Nearest Neighbors in Python

In this post, we’ll be using the K-nearest neighbors algorithm to predict how many points NBA players scored in the 2013-2014 season. Along the way, we’ll learn about euclidean distance and figure out which NBA players are the most similar to Lebron James. If you want to follow along, you can grab the dataset in […]

Share On Facebook
Share On Twitter
Share On Linkedin
Share On Reddit