If you’re looking for your first job in data science, you’re probably already aware that having a portfolio of projects you can show employers is crucially important. If you’ve got no prior experience in the field, a portfolio is what shows them that you’re actually capable of doing the job.
And while the data science job market is white hot, competition for entry-level positions can be fierce. Edouard Harris, co-founder of data science mentorship firm SharpestMinds, told us that there’s a simple mistake lots of people make with their portfolio when they’re applying for that first data science job.
The Problem with Your Data Science Portfolio
“The problem is that folks, if left to their own devices, will not necessarily work on the right sorts of things,” Edouard said.
“One of the most common things that happens is students will come to us and say, ‘For my project, I would like to do a stock market prediction thing.’ Our response: ‘That's cool. But the problem is that before you, approximately 10 jillion people have had the same idea and done the same thing.”
“Worse is that there are dozens and dozens of [stock market prediction] tutorials online that already exist, with datasets and recommendations for analysis and all that stuff.” So while putting together a stock market prediction project might be great practice, it probably won’t accomplish what you want it to in a portfolio or job application.
“The point of a portfolio,” Edouard said, “is being able to prove that you did work in a way that can be easily verified. If you choose to show off as a piece in your portfolio something that is commonly done and has existing tutorials out there already, it is very difficult for me as a hiring manager to evaluate whether you have actually done a bunch of work or whether you've simply followed along with a generic tutorial.”
The most common tutorials and data sets are well-known among data science students, Edouard says, and they’re well-known among hiring managers, too. When a recruiter sees an similar projects over and over again, they’re going to assume that everyone is following the same tutorial. Since a portfolio is meant to showcase what you can do, filling it with common tutorial projects has the opposite of the intended effect: it leads hiring managers to believe that you can’t do anything beyond following step-by-step directions.
(Incidentally, this is one reason why Dataquest’s guided projects are open-ended and don’t include much hand-holding. We want you to do your own work and take the projects in your own unique directions so that they reflect what you’ve actually learned, not just what you’ve been told to type into a Jupyter Notebook.)
Data Science Project Tips
If you’re looking to impress prospective employers, one tip Edouard offered is to build something that’s web-accessible and visual. “The optimal situation is: you're at a meetup, you're in discussions with someone, you cleverly steer the conversation in the direction of this cool thing that you built. Then you can take out your phone and be like: ‘Check this out.’”
“That sends a really good signal,” Edouard said. In addition to showing that you know how to work with data and present it visually, “it sends a signal that you enough to set up a server. That's a non-trivial amount of work. To make the interface at least pretty enough that a human can kind of use it. These are really valuable things.”
Making your project visual and available on the web means it’s portable and accessible to anyone. Remember, the people hiring data scientists aren’t necessarily data scientists or even programmers themselves. Great code on your Github probably isn’t going to impress a non-technical hiring manager or CEO you meet at a conference, but if you’ve got a simple web interface that allows them to view and maybe interact with some data analysis you’ve done, that’s easy for anyone to understand.
For example, consider this webpage that showcases an analysis of train timings in The Netherlands. Even someone who knows nothing about data science can quickly look at this and understand something about which cities in The Netherlands are the most connected. And while your project may not be this visually attractive (nor does it need to be, although great design never hurts), having something like this that’s web-accessible and visually comprehensible can be a huge asset. Plus, it’s much, much easier to show off at a meetup or conference than a big wall of text and code in a Jupyter Notebook.
Another way to set yourself apart is to use an original dataset. Employers have seen the datasets used for Kaggle competitions in dozens of other projects already, and they’ve probably seen projects using a lot of the other popular public datasets, too. Original data will stand out, and it shows you have a valuable skill: the ability to collect data.
If you’re not sure where to find data, web scraping or finding some APIs in an industry that’s relevant to the jobs you’re applying for is a good place to start (we have a hands-on course that covers APIs and web scraping, by the way).
Working with your own original data also shows that you’re capable of data cleaning. It’s not the sexiest skill, but data cleaning is an important part of every data science job, and if you only work with squeaky-clean datasets provided by third parties, hiring managers may wonder if you’re capable of working with “dirtier” data you need to clean yourself.
If you’re really excited by one particular job, you can even try to do find some company-specific data to analyze for a project. Public data sources like financial statements are available for many companies, or you could also consider mining a company’s social media mentions and doing some sentiment analysis. Even if you don’t end up getting a job with that company, your project is likely to stand out to others in the same industry, since it demonstrates that you already know how to do industry-relevant data analysis work.
If you don’t want to go to that much trouble, there are enough public data sources in the world that you can find all sorts of interesting datasets to work with. Just be sure that before you start, you do a bit of Googling around to be sure your chosen dataset wasn’t used for high-profile tutorial articles, Kaggle competitions, or anywhere else that’s likely to mean hiring managers will have seen it before. You don’t need to be the first person to have ever touched the dataset, but you don’t want the hiring manager to look at your resume and think ‘This again…’ because they’ve seen it so many times before.
Above all, just remember what the point of a portfolio actually is: your portfolio shows the hiring manager that you’re capable of doing the job you’re applying for. That means that whatever it contains, your portfolio needs to be your own work, and it needs to demonstrate the specific skills that hiring managers for your position want to see. Particularly if you’re looking for your first job in the industry, your portfolio is the only “experience” you have, so the more unique and relevant you can make it in the eyes of hiring managers, the more likely you are to hear back about an interview.