A Data CEO’s Guide to Becoming a Data Scientist From Scratch
If you want to know how to become a data scientist, then you’re in the right place. I’ve been where you are, and now I want to help.
A decade ago, I was just a college graduate with a history degree. I then became a machine learning engineer, data science consultant, and now CEO of Dataquest.
If I could do everything over, I would follow the steps I’m going to share with you in this article. It would have fast-tracked my career, saved me thousands of hours, and prevented a few gray hairs.
The Wrong and Right Way
When I was learning, I tried to follow various online data science guides, but I ended up bored and without any actual data science skills to show for my time.
The guides were like a teacher at school handing me a bunch of books and telling me to read them all — a learning approach that never appealed to me. It was frustrating and self-defeating.
Over time, I realized that I learn most effectively when I'm working on a problem I'm interested in.
And then it clicked.
Instead of learning a checklist of data science skills, I decided to focus on building projects around real data. Not only did this learning method motivate me, it also mirrored the work I’d do in an actual data scientist role.
I created this guide to help aspiring data scientists who are in the same position I was in. In fact, that’s also why I created Dataquest. Our data science courses are designed to take you from beginner to job-ready in less than 8 months using actual code and real-world projects.
However, a series of courses isn’t enough. You need to know how to think, study, plan, and execute effectively if you want to become a data scientist. This actionable guide contains everything you need to know.
How to Become a Data Scientist:
- Step 1: Question Everything
- Step 2: Learn The Basics
- Step 3: Build Projects
- Step 4: Share Your Work
- Step 5: Learn From Others
- Step 6: Push Your Boundaries
Now, let’s go over each of these one by one.
Step 1: Question Everything
The data science and data analytics field is appealing because you get to answer interesting questions using actual data and code. These questions can range from Can I predict whether a flight will be on time? to How much does the U.S. spend per student on education?
To answer these questions, you need to develop an analytical mindset.
The best way to develop this mindset is to start with analyzing news articles. First, find a news article that discusses data. Here are two great examples: Can Running Make You Smarter? or Is Sugar Really Bad for You?.
Then, think about the following:
- How they reach their conclusions given the data they discuss
- How you might design a study to investigate further
- What questions you might want to ask if you had access to the underlying data
Some articles, like this one on gun deaths in the U.S. and this one on online communities supporting Donald Trump actually have the underlying data available for download. This allows you to explore even deeper. You could do the following:
- Download the data, and open it in Excel or an equivalent tool
- See what patterns you can find in the data by eyeballing it
- Do you think the data supports the conclusions of the article? Why or why not?
- What additional questions do you think you can use the data to answer?
Here are some good places to find data-driven articles:
After a few weeks of reading articles, reflect on whether you enjoyed coming up with questions and answering them. Becoming a data scientist is a long road, and you need to be very passionate about the field to make it all the way.
Data scientists constantly come up with questions and answer them using mathematical models and data analysis tools, so this step is great for understanding whether you'll actually like the work.
If You Lack Interest, Analyze Things You Enjoy
Perhaps you don't enjoy the process of coming up with questions in the abstract, but maybe you enjoy analyzing health or finance data. Find what you're passionate about, and then start viewing that passion with an analytical mindset.
Personally, I was very interested in stock market data, which motivated me to build a model to predict the market.
If you want to put in the months of hard work necessary to learn data science, working on something you’re passionate about will help you stay motivated when you face setbacks.
Step 2: Learn The Basics
Once you've figured out how to ask the right questions, you're ready to start learning the technical skills necessary to answer them. I recommend learning data science by studying the basics of programming in Python.
Python is a programming language that has consistent syntax and is often recommended for beginners. It’s also versatile enough for extremely complex data science and machine learning-related work, such as deep learning or artificial intelligence using big data.
Many people worry about which programming language to choose, but here are the key points to remember:
- Data science is about answering questions and driving business value, not about tools
- Learning the concepts is more important than learning the syntax
- Building projects and sharing them is what you'll do in an actual data science role, and learning this way will give you a head start
Super important note: The goal isn’t to learn everything; it’s to learn just enough to start building projects.
Where You Should Learn
Here are a few great places to learn:
- Dataquest — I started Dataquest to make learning Python for data science or data analysis easier, faster, and more fun. We offer basic Python fundamentals courses, all the way to an all-in-one path consisting of all courses you need to become a data scientist.
- Learn Python the Hard Way — a book that teaches Python concepts from the basics to more in-depth programs.
- The Python Tutorial — a free tutorial provided by the main Python site.
The key is to learn the basics and start answering some of the questions you came up with over the past few weeks browsing articles.
Step 3: Build Projects
As you're learning the basics of coding, you should start building projects that answer interesting questions that will showcase your data science skills.
The projects you build don't have to be complex. For example, you could analyze Super Bowl winners to find patterns.
The key is to find interesting datasets, ask questions about the data, then answer those questions with code. If you need help finding datasets, check out this post for a good list of places to find them.
As you're building projects, remember that:
- Most data science work is data cleaning.
- The most common machine learning technique is linear regression.
- Everyone starts somewhere. Even if you feel like what you're doing isn't impressive, it's still worth working on.
Where to Find Project Ideas
Not only does building projects help you practice your skills and understand real data science work, it also helps you build a portfolio to show potential employers.
Here are some more detailed guides on building projects on your own:
Additionally, most of Dataquest’s courses contain interactive projects that you can complete while you’re learning. Here are just a few examples:
- Prison Break — Have some fun, and analyze a dataset of helicopter prison escapes using Python and Jupyter Notebook.
- Exploring Hacker News Posts — Work with a dataset of submissions to Hacker News, a popular technology site.
- Exploring eBay Car Sales Data — Use Python to work with a scraped dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website.
- Star Wars Survey — Work with Jupyter Notebook to analyze data on the Star Wars movies.
- Analyzing NYC High School Data — Discover the SAT performance of different demographics using scatter plots and maps.
- Predicting the Weather Using Machine Learning — Learn how to prepare data for machine learning, work with time series data, measure error, and improve your model performance.
Add Project Complexity
After building a few small projects, it's time to kick it up a notch! We need to add layers of project complexity to learn more advanced topics. At this step, however, it's crucial to execute this in an area you're interested in.
My interest was the stock market, so all my advanced projects had to do with predictive modeling. As your skills grow, you can make the problem more complex by adding nuances like minute-by-minute prices and more accurate predictions. Check out this article on Python projects for more inspiration.
Step 4: Share Your Work
Once you've built a few data science projects, share them with others on GitHub!
- It makes you think about how to best present your projects, which is what you'd do in a data science role.
- They allow your peers to view your projects and provide feedback.
- They allow employers to view your projects.
Helpful resources about project portfolios:
- How To Present Your Data Science Portfolio on GitHub
- Data Science Portfolios That Will Get You the Job
Start a Simple Blog
Along with uploading your work to GitHub, you should also think about publishing a blog. When I was learning data science, writing blog posts helped me do the following:
- Capture interest from recruiters
- Learn concepts more thoroughly (the process of teaching really helps you learn)
- Connect with peers
Here are some good topics for blog posts:
- Explaining data science and programming concepts
- Discussing your projects and walking through your findings
- Discussing how you’re learning data science
Here’s an example of a visualization I made on my blog many years ago that shows how much each Simpsons character likes the others:
Step 5: Learn From Others
After you've started to build an online presence, it's a good idea to start engaging with other data scientists. You can do this in-person or in online communities. Here are some good online communities:
Here at Dataquest, we have an online community that learners can use to receive feedback on projects, discuss tough data-related problems, and build relationships with data professionals.
Personally, I was very active on Quora and Kaggle when I was learning, which helped me immensely. Engaging in online communities is a good way to do the following:
- Find other people to learn with
- Enhance your profile and find opportunities
- Strengthen your knowledge by learning from others
You can also engage with people in-person through Meetups. In-person engagement can help you meet and learn from more experienced data scientists in your area.
Step 6: Push Your Boundaries
What kind of data scientists to companies want to hire? The ones that find critical insights that save them money or make their customers happier. You have to apply the same process to learning — keep searching for new questions to answer, and keep answering harder and more complex questions.
If you look back on your projects from a month or two ago, and you don’t see room for improvement, you probably aren't pushing your boundaries enough. You should be making strong progress every month, and your work should reflect that.
Here are some ways to push your boundaries and learn data science faster:
- Try working with a larger dataset
- Start a data science project that requires knowledge you don't have
- Try making your project run faster
- Teach what you did in a project to someone else
You’ve Got This!
Studying to become a data scientist or data engineer isn't easy, but the key is to stay motivated and enjoy what you're doing. If you're consistently building projects and sharing them, you'll build your expertise and get the data scientist job that you want.
I haven't given you an exact roadmap to learning data science, but if you follow this process, you'll get farther than you imagined you could. Anyone can become a data scientist if you're motivated enough.
After years of being frustrated with how conventional sites taught data science, I created Dataquest, a better way to learn data science online. Dataquest solves the problems of MOOCs, where you never know what course to take next, and you're never motivated by what you're learning.
Dataquest leverages the lessons I've learned from helping thousands of people learn data science, and it focuses on making the learning experience engaging. At Dataquest, you'll build dozens of projects, and you’ll learn all the skills you need to be a successful data scientist. Dataquest students have been hired at companies like Accenture and SpaceX .
Good luck becoming a data scientist!
Becoming a Data Scientist — FAQs
What are the data scientist qualifications?
Data scientists need to have a strong command of the relevant technical skills, which will include programming in Python or R, writing queries in SQL, building and optimizing machine learning models, and often some "workflow" skills like Git and the command line.
Data scientists also need strong problem-solving, data visualization, and communication skills. Whereas a data analyst will often be given a question to answer, a data scientist is expected to explore the data and find relevant questions and business opportunities that others may have missed.
While it is possible to find work as a data scientist with no prior experience, it's not a common path. Normally, people will work as a data analyst or data engineer before transitioning into a data scientist role.
What are the education requirements for a data scientist?
Most data scientist roles will require at least a Bachelor's degree. Degrees in technical fields like computer science and statistics may be preferred, as well as advanced degrees like Ph.D.s and Master’s degrees. However, advanced degrees are generally not strictly required (even when it says they are in the job posting).
What employers are concerned about most is your skill-set. Applicants with less advanced or less technically relevant degrees can offset this disadvantage with a great project portfolio that demonstrates their advanced skills and experience doing relevant data science work.
What skills are needed to become a data scientist?
Specific requirements can vary quite a bit from job to job, and as the industry matures, more specialized roles will emerge. In general, though, the following skills are necessary for virtually any data science role:
- Programming in Python or R
- Probability and statistics
- Building and optimizing machine learning models
- Data visualization
- Big data
- Data mining
- Data analysis
Every data scientist will need to know the basics, but one role might require some more in-depth experience with Natural Language Processing (NLP), whereas another might need you to build production-ready predictive algorithms.
Is it hard to become a data scientist?
Yes — you should expect to face challenges on your journey to becoming a data scientist. This role requires fairly advanced programming skills and statistical knowledge, in addition to strong communication skills.
Anyone can learn these skills, but you'll need motivation to push yourself through the tough moments.
Choosing the right platform and approach to learning can also help make the process easier.
How long does it take to become a data scientist?
The length of time it takes to become a data scientist varies from person to person. At Dataquest, most of our students report reaching their learning goals in one year or less. How long the learning process takes you will depend on how much time you're able to dedicate to it.
Similarly, the job search process can vary in length depending on the projects you've built, your other qualifications, your professional background, and more.
Is data science a good career choice?
Yes — a data science career is a fantastic choice. Demand for data scientists is high, and the world is generating a massive (and increasing) amount of data every day.
We don't claim to have a crystal ball or know what the future holds, but data science is a fast-growing field with high demand and lucrative salaries.
What is the data scientist career path?
The typical data scientist career path usually begins with other data careers, such as data analysts or data engineers. Then it moves into other data science roles via internal promotion or job changes.
From there, more experienced data scientists can look for senior data scientist roles.
Experienced data scientists with management skills can move into director of data science and similar director and executive-level roles.
What salaries do data scientists make?
Salaries vary widely based on location and the experience level of the applicant. On average, however, data scientists make very comfortable salaries. In 2022, the average data scientist salary is more than $120,000 USD per year in the US.
And other data science roles also command high salaries:
Which certification is best for data science?
Many assume that a data science certification or completion of a data science bootcamp is something that hiring managers are looking for in qualified candidates, but this isn’t true.
Hiring managers are looking for a demonstration of the skills required for the job. And unfortunately, a data analytics or data science certificate isn’t the best showcase of your skills.
The reason for this is simple.
There are dozens of bootcamps and data science certification programs out there. Many places offer them — from startups to universities to learning platforms. Because there are so many, employers have no way of knowing which ones are the most rigorous.
While an employer may view a certificate as an example of an eagerness to continue learning, they won’t see it as a demonstration of skills or abilities. The best way to showcase your skills properly is with projects and a robust portfolio.