May 10, 2024

How to Learn Data Science

Employment of data scientists is expected to grow 35% from 2022 to 2032. (BLS.gov)

So, how do you learn data science?

Let's discuss five reasons why most learning methods cause people to fail. Then, I will cover the correct learning method, which worked for me as I transitioned from a history teacher to a machine learning engineer.

Why do most data science learners fail?

High barriers to entry

Would-be data science learners often give up because they think learning data science is too expensive. Bootcamps and certification programs can come with huge price tags that scare people off. However, they may not realize that there are more affordable ways to learn.  

Boring curriculum

Affordable courses don't guarantee success. Many online courses contain dry video lectures or recordings of others writing code. This isn’t engaging, so learners end up tuning out. 

As a result, these courses have completion rates of 5% to 15%. Not great odds. 

Life gets in the way

Many data science courses have rigid schedules. Courses begin in a particular week, end in another, and may require live sessions at specific times. 

People who are learning online often have jobs, families, or other obligations. That can make keeping up with those schedules a challenge. 

The difficulty curve is off

If courses are too easy or too hard, most learners will eventually fall off the wagon. One common problem in online data science programs is that they’ve been built with pre-existing courses stuck together. One course might seem too easy, but the next is too hard. This happens because these courses weren’t built together.

Lessons don’t feel relevant

People tend to stick with their studies when they feel like they’re making progress toward their goals. But many data science courses are focused on lectures and drills, rather than doing real-world data science work. This can leave learners feeling like they are treading water. 

The right way to learn data science

Before I founded Dataquest, I taught myself the skills required to go from working in a non-technical job to working as a machine learning engineer

Through that experience — and through the experiences of the hundreds of thousands of learners who’ve gained data science skills on Dataquest over the last eight years — I’ve come to understand a lot about what works and what doesn’t when it comes to learning data science. 

And it’s all built into the Dataquest experience. 

Dataquest’s “learning loop”

For learners to be successful, we need to feel like we’re making progress. The importance of this can’t be understated. We need to feel like we can immediately use the skills we’re learning.

That’s why Dataquest is hands-on. You’ll be writing and running real code and working with real datasets from day one. 

In our side-by-side learning platform, you’ll read about a concept on the left side of the screen, then be challenged to write and run real code to apply what you’ve learned on the right side. 

This simple learning loop repeats through every single one of our courses. You learn something new and apply it to a real data science problem. Each screen builds on the previous screen and leads into the next one.
That means that as you’re learning, you’ll know you have grasped the material because you’re using it to do real data science work. 

You’re not watching lectures. You’re not filling in the blanks or answering multiple-choice questions. You’re writing and running code exactly like you will in a real data science job. 

Tailored course paths

An important part of this approach is that our courses are carefully sequenced to ensure no gaps. One course always leads to the next one, and each has a very specific goal in mind.

For example, here are some of our career paths:

Many of our students love these paths because they contain everything they need to know to obtain a position. Yeah, really! To become a data analyst, every ounce of information you need is within the Data Analyst career path.

Plus, there are no prerequisites. Anybody can do it!

You can get started for free by clicking on any of the links above.

Fun, real-world projects 

While all of our courses get you working hands-on with real data, we also know that it’s critical to synthesize the skills as you learn.

That’s why most of our courses end with guided projects that challenge you to answer real data science questions using the skills you’ve learned in previous courses. 

These projects are fun learning tools that help cement your new knowledge, but they’ll also help you when it’s time for your job search, as you can include them in your project portfolio. (Hiring managers love it when you do this, by the way.)

Example projects in the data science path include: 

How long does it take to learn data science?

Every student is different, so these questions can be tough to answer. Learning on Dataquest is self-paced and some students move much more quickly than others so I will walk through a typical progression to give you a baseline.

First, let's make some conservative assumptions:

  • You’re busy, and can only dedicate around five hours per week to your studies.
  • You have no previous programming experience.
  • You have no math training (beyond high-school algebra).
  • You’ve decided to study on Dataquest to speed up your learning journey.

A lot of learners spend far more than five hours per week studying, so this is a pretty conservative estimate of what you can get done in a year.

If you put in at least five hours a week, we expect that in a year, you could finish the Data Analyst path, or get more than halfway through the Data Scientist path, and would be qualified for a variety of entry-level data analysis and data science jobs.

Let’s take a closer look at what that year would look like, what you would learn on the Data Scientist in Python path, and how you could best take advantage of your learning time.

Data Science Learning Plan

January-February: Learning Python

The first eight weeks of your year would likely be spent learning Python. You might be able to get through our introductory and intermediate Python courses a little bit faster if you rush, but building a solid Python foundation is important for almost everything that comes afterwards. It’s worth taking a little extra time here to be sure you can understand and apply all the concepts.

The good news is that even if you started these eight weeks with zero coding experience, you’re going to end them as a programmer. After these courses, you’ll be able to confidently apply most of the important concepts of Python programming (from basics like functions and for loops to more advanced concepts like regular expressions and list comprehensions), and you’ll also be comfortable working with Jupyter Notebooks, an important tool for data scientists who use Python.

As you learn those skills and techniques, you will also have gotten a great introduction to the fundamentals of data analysis in Python. All of our courses have you working with real-world data, and as part of these courses you’ll get to apply what you’ve learned doing guided projects analyzing what app store profiles lead to more app downloads and what successful Hacker News posts have in common with each other.

These two classes alone won’t be enough to get you a job in data science, but by the end of the eight weeks, you’ll find that you’ve learned enough to do some basic data analysis on your own, and probably code some other things, too! Just these eight weeks would be enough to give you some skills that might help you save some time on analytical tasks in your current job.

These first eight weeks will are also a good time for you to establish a presence in our data science learning community. There, you can get help from your fellow students, as well as our data scientists and our career counselor. If you get stuck, this community is a great way to get yourself unstuck fast, and that's important, particularly early in the learning process.

These courses are the foundation upon which the rest of your data science “house” will be built, so being extra thorough here will pay dividends later. If you don't understand something, ask!

March-May: Data Cleaning, Data Analysis, and Data Visualization

These twelve weeks are where the rubber really starts to meet the road in applying your new Python skills to accomplish typical data science tasks. You’ll go through four courses here, and each one of them is crucial for doing data science.

In the first course, Pandas and NumPy Fundamentals, you’ll learn how to use the pandas library, a crucial tool for real-world data analysis tasks. You’ll also learn about NumPy, another useful Python package, and you’ll learn to make them play nicely together. Then you’ll apply that learning with a guided project analyzing real-world eBay car sales data.

From there, you’ll move into two courses about data visualization. The first, Exploratory Data Visualization, will teach you how to use the matplotlib package together with pandas to do exploratory visualizations that will help you make sense of your data and guide you in your analysis.

The second, Storytelling Through Data Visualization, will teach you more about how to make aesthetic, readable charts using Seaborn to ensure that you know how to communicate your data clearly to others (a crucial skill in any data science job). In these courses, you’ll synthesize what you’ve learned in guided projects analyzing topics like the gender gap in college degrees and geographical flight patterns (all using real-world data, of course).

Finally, you’ll move into two courses on data cleaning, one of the most un-sexy but essential skills in any data scientist’s toolkit. You’ll learn to explore and clean datasets, how to combine multiple datasets into a single, clean source, and work through some guided projects analyzing data from NYC high schools and a survey about Star Wars.

By mid-May (your 20th week), you’ll have acquired many of the foundational data science skills, and you should be well-equipped to start taking on your own data science projects. You might not be ready for a full-time data science job just yet, but you’ll know enough to be able to solve real-world problems with data science in a way that might impact your current job.

For example, Dataquest student Curtly Critchlow was able to take an Excel data analysis nightmare that took him a full week of work each month and turn it into a project that took just a few minutes after he finished our Pandas and NumPy course.

During these weeks, though, you may encounter a psychological phenomenon sometimes referred to as ‘The Dip’. This happens often in the course of learning a new skill; once you get beyond the beginner phase, big gains come a bit more slowly, and the novelty of studying something new has worn off. The result can be a bit of a dip in your natural level of motivation.

But don’t worry: we’ll help you fight the dip! All of our courses use interesting, real-world data to combat this effect by keeping you interested in the analysis, and you’ll be solving different and interesting problems in each course.

May-July: Learning the Command Line and SQL

As we get towards the middle of our year of data science, it’s time to cover some skills that are hugely important for working in data science: operating with the command line and working with SQL.

In the first two courses, you’ll learn to work with the command line. You’ll get comfortable navigating around without the use of a GUI, and working with Python scripts and packages from the command line. Then you’ll move on to more advanced topics, with a focus on processing text in the command line. 

From there, you'll start digging into our three SQL courses. In the first, you’ll learn the basics, like how to explore and analyze data in SQL, and how to use SQLite with Python. Then you’ll move into more intermediate topics like querying across multiple tables, and you’ll begin getting practice answering business questions using SQL. Finally, you’ll dig into the advanced stuff, like PostgreSQL and using database indexes to speed up your SQL queries.

And while your new SQL skills will be crucial for working with most of the databases out there, there are plenty of other data sources you’ll want to work with, so after the SQL courses you’ll move into a course on APIs and web scraping that’ll teach you how to query APIs and scrape data from websites that don’t have APIs.

To cement these skills, you’ll answer some more real-world business questions with SQL, and dive into data from the CIA World Factbook.

At this point in your study, it’s a great time to start thinking about portfolio projects. Having a GitHub or some other portfolio page with compelling projects is key to landing a job in data science, and Dataquest is full of guided projects that you can absolutely use for a portfolio. You’ll have worked through some of these already, so this is a good moment to look back and think about adding some polish to your favorites so you’ve got some cool projects ready when you start applying for jobs.

July-October: Learn Statistics

By this point, you’ll have the programming skills to do a lot of data analysis, but you still need a solid understanding of statistics and probability to be able to get the most of of them, so in the final section of your year of Dataquest, you’ll take a sequence of four courses aimed at giving you a solid stats foundation and helping you apply these concepts in Python.

You’ll start with the basics, like learning different sampling techniques for taking good samples from your data. Then you’ll start looking at distributions, measuring variability, and locating and comparing values with z-scores. Finally, you’ll learn more about probability and dig into advanced topics like significance testing and the chi-squared test.

As usual, as you work through these courses, you’ll be using real-world data to answer interesting questions, like how a bike-sharing company can anticipate rental patterns. And you’ll be able to apply your new skills to cool guided projects like figuring out winning Jeopardy strategies and determining whether a movie ratings site’s ratings are biased.

It might be possible to work through these courses in less than the three months we've allotted, but we'd suggest that you take your time here and really be sure you understand everything. While it's easy to check references if you forget how to do something in Python or SQL, misunderstanding the math that underlies your analysis could have more serious consequences, and it's harder to catch. 

This is a great time to branch out a bit more and start making some connections in the data science community. Our own community is a good place to start if you're not already active there. You may also want to work on building your brand as a data scientist by getting yourself out there in other ways, like by writing a tutorial for the Dataquest blog.

If you're interested in data analyst positions, you can begin applying for jobs at any point during these months. Having Python, SQL, and statistics skills will qualify you for most data analyst positions.

October-December: Dig Into Machine Learning

By this point, you’ll have the programming skills to do a lot of data analysis, and the statistics skills to understand what's happening under the hood. But if you want to work as a data scientist, you need to add one more big skill to your skill set: machine learning.

Over these months, you'll start digging into our machine learning course offerings. You'll start with the fundamentals of machine learning, and then you'll learn about some important calculus and linear algebra concepts that underpin key machine learning algorithms.

You'll likely get through our linear regression course, and depending on your speed, you may make it through subsequent courses like machine learning intermediate, decision trees, and deep learning.

Learning at this relatively slow pace, it's unlikely you'll make it all the way to the end of our Data Scientist path in this time frame, and there are still quite a few courses on cool topics like Natural Language Processing and Adobe Spark in your future.

But once you've gotten to this point, you should be ready to start applying for entry-level data science jobs. You have the programming skills, the statistics knowledge, and a strong foothold in machine learning. And while you should always plan to continue learning, these skills — and the project portfolio you've built while acquiring them — already make you a compelling candidate. 

If you're not sure how to start the job hunt, reading through our Data Science Career Guide would be a great start!

This Is Just the Beginning of Your Data Science Journey

Assuming you’ve stuck to just five hours per week, you're likely to have completed the Data Analyst path and gotten a strong foothold in the machine learning section of the Data Scientist path by the end of your first year of study.

At this point, you’ll be well-qualified to apply for data analyst positions. We have many students like Pol Brigneti, who’ve finished our Data Analyst path and found full-time data analyst positions. If that’s your choice, then you don't need to worry much about machine learning skills, and you can spend the extra three months building cool projects, applying for jobs, and adding new skills to your skill set. (For example, since you've been learning Python, it couldn't hurt to also learn the fundamentals of R just in case you come across jobs that prefer that).

You'll also be ready to start applying for entry-level data science positions and internships too, although there's plenty more to learn in the realm of machine learning and more advanced topics that are covered later in our data science path. 

And remember, this is a pretty conservative estimate. Spending a little more time each week studying will get you further, faster. At around 10 hours per week, we estimate you’d be able to finish the entire Data Scientist path in a year.

Even if you don’t aspire to go all the way through the Data Science path, it pays to keep learning while you’re searching for jobs, and even after you find employment. That’s what Miguel told us after he got his full-time job just halfway through our Data Science path. “Even though I’m starting a job in January, I’m still going to be active, and I’ll keep on studying, because obviously I want to reach other paths.”

“And I still think Dataquest is the best option,” he told us. “If I had to choose only one, I’d choose Dataquest.”

Vik Paruchuri

About the author

Vik Paruchuri

Vik is the CEO and Founder of Dataquest.