A Year Learning Data Science at Dataquest
What does a year of Dataquest actually get you? We sell yearly subscriptions, so that’s a question we get asked a lot. Every student is different, but this article is one answer.
Specifically, this article will give you an idea of what you can expect to learn, and what kind of jobs you might be able to apply for, as you work through a year of learning data science with us.
Of course, one of the advantages of an online course like Dataquest’s is that you can work at your own pace and tailor your study to your background, skipping courses with content you’re already comfortable with. For the purposes of this article, we’re going to make some conservative assumptions:
- You’re busy, and can only dedicate around five hours per week to your studies.
- You have no previous programming experience.
- You have no math training (beyond high-school algebra).
- You’ve signed up for a Premium subscription (which gives you access to all courses, Premium support, office hours consultations, and more).
A lot of our students work faster than that, so this is a pretty conservative estimate of what you can get done in a year. But if you put in at least five hours a week, we expect that in a year, you could finish the Data Analyst path, or get more than halfway through the Data Scientist path, and would be qualified for a variety of entry-level data analysis and data science jobs.
Let’s take a closer look at what that year would look like, what you would learn on the Data Scientist in Python path, and how you could best take advantage of your subscription over the course of the year. (Your experience on the Data Analyst path in Python or R would be very similar, however).
A Year of Dataquest
- January-February (Weeks 1-8): Learn Python
- March-May (Weeks 9-20): Data Cleaning, Data Analysis, and Data Visualization
- May-July (Weeks 21-28): Command Line, Version Control, and Git
- July-October (Weeks 29-40): Learn SQL, APIs, and Web Scraping
- October-December (Weeks 41-50): Statistics for Data Science
- Continuing Your Data Science Journey
The first eight weeks of your year would likely be spent learning Python. You might be able to get through our introductory and intermediate Python courses a little bit faster if you rush, but building a solid Python foundation is important for almost everything that comes afterwards. It’s worth taking a little extra time here to be sure you can understand and apply all the concepts.
The good news is that even if you started these eight weeks with zero coding experience, you’re going to end them as a programmer. After these courses, you’ll be able to confidently apply most of the important concepts of Python programming (from basics like functions and for loops to more advanced concepts like regular expressions and list comprehensions), and you’ll also be comfortable working with Jupyter Notebooks, an important tool for data scientists who use Python.
As you learn those skills and techniques, you will also have gotten a great introduction to the fundamentals of data analysis in Python. All of our courses have you working with real-world data, and as part of these courses you’ll get to apply what you’ve learned doing guided projects analyzing what app store profiles lead to more app downloads and what successful Hacker News posts have in common with each other.
These two classes alone won’t be enough to get you a job in data science, but by the end of the eight weeks, you’ll find that you’ve learned enough to do some basic data analysis on your own, and probably code some other things, too! Just these eight weeks would be enough to give you some skills that might help you save some time on analytical tasks in your current job.
You can take advantage of Premium help and support at any time during your studies, but these first eight weeks will be an especially good time to reach out if you encounter any roadblocks, need a second pair of eyes on your code, or simply want to figure out whether your understanding of a concept is correct. These courses are the foundation upon which the rest of your data science “house” will be built, so being extra thorough here will pay dividends later.
These twelve weeks are where the rubber really starts to meet the road in applying your new Python skills to accomplish typical data science tasks. You’ll go through four courses here, and each one of them is crucial for doing data science.
In the first course, Pandas and NumPy Fundamentals, you’ll learn how to use the
pandas library, a crucial tool for real-world data analysis tasks. You’ll also learn about
NumPy, another useful Python package, and you’ll learn to make them play nicely together. Then you’ll apply that learning with a guided project analyzing real-world eBay car sales data.
From there, you’ll move into two courses about data visualization. The first, Exploratory Data Visualization, will teach you how to use the
matplotlib package together with
pandas to do exploratory visualizations that will help you make sense of your data and guide you in your analysis. The second, Storytelling Through Data Visualization, will teach you more about how to make aesthetic, readable charts using
Seaborn to ensure that you know how to communicate your data clearly to others (a crucial skill in any data science job). In these courses, you’ll synthesize what you’ve learned in guided projects analyzing topics like the gender gap in college degrees and geographical flight patterns (all using real-world data, of course).
Finally, you’ll move into a course on data cleaning, one of the most un-sexy but essential skills in any data scientist’s toolkit. You’ll learn to explore and clean datasets, how to combine multiple datasets into a single, clean source, and work through some guided projects analyzing data from NYC high schools and a survey about Star Wars.
By mid-May (your 20th week), you’ll have acquired many of the foundational data science skills, and you should be well-equipped to start taking on your own data science projects. You might not be ready for a full-time data science job just yet, but you’ll know enough to be able to solve real-world problems with data science in a way that might impact your current job.
For example, Dataquest student Curtly Critchlow was able to take an Excel data analysis nightmare that took him a full week of work each month and turn it into a project that took just a few minutes after he finished our Pandas and NumPy course.
During these weeks, though, you may encounter a psychological phenomenon sometimes referred to as ‘The Dip’. This happens often in the course of learning a new skill; once you get beyond the beginner phase, big gains come a bit more slowly, and the novelty of studying something new has worn off. The result can be a bit of a dip in your natural level of motivation.
But don’t worry: we’ll help you fight the dip! All of our courses use interesting, real-world data to combat this effect by keeping you interested in the analysis, and you’ll be solving different and interesting problems in each course.
These twelve weeks would also be a good time for you to get more involved with our Members-Only Slack community, where you can network and collaborate with other students and get help from peers and data scientists alike. The energy of collective learning can be a great motivator, and by this point, you’ll have learned enough to start helping other students. Teaching others what you’ve learned is a great way to reinforce your own learning.
As we get towards the middle of our year of data science, it’s time to cover some skills that are hugely important for working in data science: operating with the command line and using git to develop projects collaboratively.
In the first two courses, you’ll learn to work with the command line. You’ll get comfortable navigating around without the use of a GUI, and working with Python scripts and packages from the command line. Then you’ll move on to more advanced topics including searches with grep, building shell pipelines, and using some new tools like Jupyter console. You’ll also get some more training in data cleaning using a tool called csvkit. To cement these skills, you’ll work on real-world projects like analyzing years of Hacker News headlines to see what words, domains, and submission times are most likely to result in a highly-upvoted post.
From there, you’ll move into our course on Git and Version Control, where you’ll learn why version control is important, and how you can use git both locally and on Git remotes like Github (the biggest public code repository and a platform for code-sharing and collaboration that’s used by software developers all over the world). You’ll also learn project management techniques like how to merge branches and resolve merge conflicts that will make it easier to work as part of a collaborative data science team. And of course, you’ll get Git installed and get your own Github set up.
At this point in your study, it’s a great time to start thinking about portfolio projects. Having a Github or some other portfolio page with compelling projects is key to landing a job in data science, and Dataquest is full of guided projects that you can absolutely use for a portfolio. You’ll have worked through some of these already, so this is a good moment to look back and think about adding some polish to your favorites so you’ve got some cool projects ready when you start applying for jobs.
Over these twelve weeks, you’ll take four more courses, all of them focused on helping you work with data sources more efficiently.
You’ll start with three SQL courses. In the first, you’ll learn the basics, like how to explore and analyze data in SQL, and how to use SQLite with Python. Then you’ll move into more intermediate topics like querying across multiple tables, and you’ll begin getting practice answering business questions using SQL. Finally, you’ll dig into the advanced stuff, like PostgreSQL and using database indexes to speed up your SQL queries.
And while your new SQL skills will be crucial for working with most of the databases out there, there are plenty of other data sources you’ll want to work with, so after the SQL courses you’ll move into a course on APIs and web scraping that’ll teach you how to query APIs and scrape data from websites that don’t have APIs.
To cement these skills, you’ll answer some more real-world business questions with SQL, and dive into data from the CIA World Factbook.
If you haven’t already, this would be a great time to schedule one-on-one office hours with one of our data scientists and discuss your career plans. This would be a great opportunity to get a second set of eyes on your resume, get some advice about building your portfolio, or just get some input on the types of roles you should apply for.
That might sound premature, but don’t sell yourself short – by this point, you have almost all of the key skills you need for entry-level data analyst positions. And while it’s scary to apply for jobs before you feel ready, the payoff can be massive. That was certainly the case for Miguel Couto, a Dataquest student who applied for jobs about halfway through our Data Scientist path even though he didn’t think he was ready. He ended up getting three full-time job offers and is now working happily as a data analyst.
By this point, you’ll have the programming skills to do a lot of data analysis, but you still need a solid understanding of statistics and probability to be able to get the most of of them, so in the final section of your year of Dataquest, you’ll take a sequence of three courses aimed at giving you a solid stats foundation and helping you apply these concepts in Python.
You’ll start with the basics, like learning different sampling techniques for taking good samples from your data. Then you’ll start looking at distributions, measuring variability, and locating and comparing values with z-scores. Finally, you’ll learn more about probability and dig into advanced topics like significance testing and the chi-squared test.
As usual, as you work through these courses, you’ll be using real-world data to answer interesting questions, like how a bike-sharing company can anticipate rental patterns. And you’ll be able to apply your new skills to cool guided projects like figuring out winning Jeopardy strategies and determining whether a movie ratings site’s ratings are biased.
This is a great time to branch out a bit more and start making some connections in the data science community. Our Members Slack is a great place to start. You may also want to work on building your brand as a data scientist by getting yourself out there in other ways, like by writing a tutorial for the Dataquest blog.
This statistics and probability section is the final one you’ll complete in your year of Dataquest study, assuming you’ve stuck to just five hours per week. It also represents the end of the Data Analyst path. At this point, you’ll be well-qualified to apply for data analyst positions; we have many students like Pol Brigneti, who’ve finished our Data Analyst path and found full-time data analyst positions. If that’s your choice, then you’ve got two extra weeks in the year that you can use to polish projects, tackle a few more guided projects for your portfolio, and start applying for jobs.
There’s still plenty more learning on our Data Science path, too. If you continue studying, in the final two weeks of the year, you’ll get to dig into the hottest topic in data science: machine learning.
And remember, this is a pretty conservative estimate. Spending a little more time each week studying will get you further, faster. At around 10 hours per week, we estimate you’d be able to finish the entire Data Scientist path in a year.
Even if you don’t aspire to go all the way through the Data Science path, it pays to keep learning while you’re searching for jobs, and even after you find employment. That’s what Miguel told us after he got his full-time job just halfway through our Data Science path. “Even though I’m starting a job in January, I’m still going to be active, and I’ll keep on studying, because obviously I want to reach other paths.”
“And I still think Dataquest is the best option,” he told us. “If I had to choose only one, I’d choose Dataquest.”
Charlie is a student of data science, and also a content marketer at Dataquest.