Everyone’s journey to become a data scientist is different, and the learning curve will vary depending on many factors, including time availability, prior knowledge, the tools you use, etc. One learner shares his story about how he became a data scientist in 6 months with Dataquest. Here’s how his journey began:
As the title suggests, this is an analysis project on my Dataquest journey that allowed me to learn Data Science in less than 6 months. I'm really excited to have finished this project just in time before the new year came! What's a better way to send off 2020 than a thorough look-back at my focus of the year?
It has been a year of grief, but in the Dataquest community, I see people from all over the world trying their best to learn and make progress every day. So my incentive to do this project is not only revisiting my journey but to encourage beginners by giving them a peek into the road ahead. But please keep in mind that the time and effort to complete this path is highly relevant to personal situations. I will explain mine later in this article.
This project is also inspired by the people in this community, especially @otavios.s's amazing project I hope this is not a problem, but I scraped the Community. I was introduced to Selenium and ChromeDriver thanks to his project. Yes, I also scraped the DQ website to get the full Data Scientist curriculum and hope it's okay...Before I go into the details of how I went from zero coding skills to data scientist in 6 months, I want to first share my findings.
The questions that get answered in this project:
- How many days did it take for me to finish this path? (timespan, including intervals I didn't spend on studying)
- 175 days. From June 19th, 2020 to December 11th, 2020.
- What's my best learning streak and average learning streak?
- My best learning streak was 20 days, and 6.6875 days on average. From my personal experience, it's important to get into the groove and keep going. I had a week-long break in October, and it took another week to get back to the same learning efficiency as before.
- How much time was spent in total?
- Total hours spent in finishing the path was 306.4 hours. This means if I studied 24/7, the path could be finished in roughly 13 days. Instead, it took me 175 days. I'm sure the robots are laughing at us humans.
- How many hours did I spend on average in the weeks I studied?
- Assuming I studied 5 days out of a week on average, in the 24 weeks I did study, I would have studied for 120 days. This means I spent 3 hours a day studying on Dataquest on average. That sounds about right, but note that it's a rough estimation. Plus I did spend quite some time in the community and reading up curriculum materials, those are not counted in this project.
- What's the average time spent to finish a lesson?
- 111.43 minutes, in other words close to 2 hours. It looks like it takes a dauntingly long time to finish a lesson. But this also includes time spent on guided projects, which are a lot more time consuming than just learning lessons. It's not uncommon to spend days on a guided project. I wish I had more granular data on time spent on each lesson so I can see the average time spent on projects and non-project missions, but I don't know if that data even exists.
- What are the speed bumps in the curriculum?
- Steps 2, 4, 5, 6 took more weeks than others to finish. Among them, Step 2 and 6 have the most number of missions, Step 2 also has the most number of guided projects. That makes Step 4 and 5 the most time-consuming steps of all. Between the two, Step 4 is more time-consuming than Step 5. Which reflects my memory pretty well. In Step 4, the time-consuming part was SQL, and in step 5, it was the courses on probability.
Now, a little context about my personal learning situations:
I started the Data Scientist path in Python on June 19th, 2020, and I finished it on December 11th, 2020. Although I didn't spend a lot of time in the last two weeks, it's mostly spent on finishing two last guided projects (counts as two lessons) and extracurricular projects. That's probably why I didn't get any learning progress emails after the last of November.
I used to be a digital marketing account manager and had almost zero coding skills. I learned Python fundamentals from a data science course on Udemy for a couple of weeks right before I decided to switch to Dataquest.
I finished Andrew Ng's Machine Learning course on Coursera a few weeks before starting the path. I learned basic Octave during that course.
I'm currently unemployed, so I have a lot of spare time for learning.
A closer look at the project
A) Data collection (email parsing and web scraping)
The data I used in this project are collected from two sources:
- The progress data in this project comes from the weekly accomplishment email I get from Dataquest on Mondays if I made enough progress the previous week. It consists of:
- date: receiving date of the email — always a Monday
- missions_completed: number of lessons completed
- missions_increase_pct: percentage increase/decrease compared to last week on the number of lessons completed
- minutes_spent: minutes spent on learning
- minutes_increase_pct: percentage increase/decrease compared to last week on the minutes spent
- learning_streak(days): number of consecutive days spent on learning
- best_streak: best learning streak
- To get the weekly emails, I first created a tag in my Gmail to group the emails I want and then went to Google Takeout to download them. You can choose the file format in the process — what I had downloaded was a .mbox file. Python has a library for parsing this type of file called mailbox. You will find the code used in this project in the GitHub link at the end of the post.
(A screenshot of the weekly accomplishment email is below)
- The curriculum data in this project comes from the Dataquest dashboard for the Data Scientist path. It consists of 8 Steps, 32 courses, and 165 lessons including 22 guided projects in hierarchical order. As mentioned at the beginning of the post, I used Selenium and ChromeDriver for the first time. The dashboard page where the curriculum information resides contains a grid of steps and collapsible lists of courses and missions; there was auto-login and a lot of clicking involved. I will probably write another article on scraping this page later.
B) Data Imputation
The weekly email dataset in this project is very small, with only 16 rows containing data from 16 weeks. But my learning span was in fact 26 weeks. There were weeks where I didn't study at all, but still, for such a small dataset, I can't really afford to lose 10 weeks of data.
Luckily, on the profile page, Dataquest provides the learning curve throughout a path. So I came up with an imputation strategy: fill in the blanks where possible, plot the existing data then compare with the Dataquest generated learning curve, and integrate with my personal experience (e.g., pictures and memories of taking vacations and slacking 🙂 ) to impute the missing number of lessons completed data. Then impute minutes spent based on average minutes spent on a lesson. It's more detailed in the project.
While I think the imputation was pretty successful (in serving the needs in this project), I wish we could have more data on our learning journey from Dataquest.
C) Visualizations in this project:
I used Plotly to plot all the visualizations in this project. I'm pretty happy with the Hours Spent vs Missions Completed plot below. It helped me make quite a few interesting observations and answered the curriculum related questions at the beginning of this post. Again, you can read the details in the GitHub link at the end of the post.To share the plots in posts like this one, I also tried out Chart Studio. The plots below are from the chart studio cloud and embedded using chart studio generated html.
- My learning curve
- Hours spent weekly and the corresponding number of lessons completed and the steps they belong to
- Number of lessons and guided projects in each learning Step
- Full curriculum table of the Data Scientist in Python path on Dataquest
Apart from answering all the questions at the beginning of this project. I also want to add to the beginners of this course that what I've done in this project is more data collecting, data cleaning, and imputation, which you will learn in the first four Steps. That means you will be equipped to do all of this halfway through the data scientist path!
P.S.: If anyone has more questions regarding this project or the DQ data scientist path, feel free to ask me in the comments or reach me at [email protected]. I will try my best to answer your question.
Click here to view the full project.
What Should You Do Next?
Enroll in our data scientist in Python career path to get all the skills you need to land a job in data!
This article was written by Vera Tsien. You can find her on Github.