How to Create a Project Portfolio for Data Science Job Applications
Want a job in the data field? Start building your data science portfolio as early as possible! For entry-level positions, a strong project portfolio is where you can truly shine and stand out from other applicants.
If you lack previous experience in the data science field, your portfolio of projects will likely determine whether you get that crucial interview callback. Moreover, these projects often play a pivotal role during the interview process itself.
What is a Portfolio in Data Science?
In the data science world, terms like projects and portfolio are used differently by various professionals. Some envision a portfolio as a carefully-designed package of projects, such as a custom-built website.
For our purposes, we'll define a data science portfolio as:
The collection of projects you're showcasing in your job application, regardless of their presentation format.
We'll discuss presentation methods later in this article, but first, let's explore why building a data science portfolio is crucial for your career.
(This article is part of our in-depth Data Science Career Guide. To read the other articles, please refer to the table of contents or the links at the end of this post.)
Why Data Science Projects Are Crucial for Your Portfolio
Employers won't pay you to do something you've never done before. This fundamental rule applies to all industries, including data science. It's logical: would you dine at a restaurant where the chef had never cooked before? Or board a plane with a pilot who'd never flown? Probably not.
Whether you're transitioning into data science from full-time study, another career, or seeking a different data science role, you need to demonstrate experience. Even for entry-level positions, you must show you can do the job before anyone will pay you to do it.
How to Prove Your Skills Without Professional Experience
Most entry-level applicants have little to no professional data science experience. So how can you prove you've got the necessary skills? The answer lies in building a strong data science portfolio with impressive projects.
A well-crafted portfolio of data science projects serves as a substitute for work experience in your job applications. It demonstrates to potential employers that you can indeed perform the data science work you're applying for.
The Impact of Data Science Portfolio Projects Throughout the Hiring Process
Projects are arguably the most crucial part of your application, as they appear at every stage of the hiring process:
- Mentioned on your resume
- Linked in your application
- Discussed during job interviews
Most recruiters we interviewed said they review projects and portfolios when screening candidates and discuss them during interviews. For instance, Michael Hupp, manager of data science and analytics at G2 Crowd, describes their interview process:
"We always ask them about their projects. We'll try and quiz them on the technical skills, but we also want to make sure they're able to talk about the project, and the results, in an understandable way."
During interviews, you might be asked to:
- Explain statistical choices in a data analysis project
- Walk a hiring manager through your code
- Discuss experiences working on group projects
- Describe challenges faced while completing a project
Recruiters use projects to assess various aspects of a candidate, including technical abilities and passion for the subject matter. Without prior professional experience, your data science portfolio projects will be your primary evidence of competence throughout the hiring process. Therefore, it's crucial to create high-quality, relevant projects that showcase your skills effectively.
Key Elements Your Portfolio Must Demonstrate
The specific skills your data science portfolio needs to showcase depend on the job you're targeting. For instance:
- Aspiring data analysts in marketing should highlight marketing-related analytical skills
- Prospective machine learning engineers should present impressive machine learning projects
Regardless of the role you're pursuing, remember this crucial mantra: your portfolio should prove you can do the work.
Essential Skills to Showcase
Demonstrating your ability to do the work goes beyond just proving technical proficiency. For most data-related positions, your portfolio should exhibit:
- Effective communication skills
- Collaboration abilities
- Technical competence
- Data reasoning capabilities
- Initiative and motivation
The Importance of Uniqueness
It's crucial to emphasize that the you in "prove you can do the work" is significant. Your data science portfolio projects should be unique and original.
Edouard Harris, co-founder of SharpestMinds, explains:
"The point of a portfolio, and in large part the guiding principle of the entire application process, is being able to prove that you did work in a way that can be easily verified."
He further advises against showcasing commonly done projects with existing tutorials:
"If you choose to show off something that is commonly done and has existing tutorials out there already, it is very difficult for me as a hiring manager to evaluate whether you have actually done a bunch of work and thought, or whether you've simply followed along with a generic tutorial."
Therefore, when you build a data science portfolio, focus on creating original projects that demonstrate your unique skills and problem-solving abilities. This approach will help you stand out and prove your capabilities to potential employers.
Essential Projects to Include in Your Data Science Portfolio
A strong data science portfolio typically consists of 3-5 projects that highlight your job-relevant skills. The key is to demonstrate that you can perform the work required in your target role. The more closely your portfolio reflects the day-to-day tasks of the jobs you're applying for, the more convincing it will be.
Refael "Rafi" Zikavashvili, CEO and co-founder of Pramp, advises:
"Don't pick just random projects to work on and add it to your resume or portfolio. Solve a problem that relates to the companies that you're interested in."
Three Key Factors to Consider When Choosing Projects
When selecting projects for your data science portfolio, consider these three crucial factors:
1. Types of Tasks
Consider the specific tasks you'll need to perform in your target job. Ask yourself:
- Will you be doing extensive data cleaning?
- Is machine learning a key component?
- How important is data visualization?
- Will you be working on natural language processing?
- Are you expected to build dashboards and analytics tools?
Ensure your portfolio projects incorporate these relevant tasks.
2. Subject Areas
Align your projects with the industry or field you're targeting:
- For marketing positions, focus on projects that answer marketing-related questions
- If aiming for mobile app development, demonstrate your ability to extract product insights from app data
Showcasing knowledge or interest in subjects relevant to your target jobs can make your application stand out.
3. Types of Data Sets
Different industries often work with specific types of data. Demonstrate your experience with relevant data sets:
- If your target job involves time series data, include projects showcasing time series analysis skills
- For roles in text analytics, include projects working with unstructured text data
By working with data sets similar to those you'd encounter on the job, you prove your readiness for the role.
Remember, the goal of your data science portfolio is to demonstrate your ability to solve real-world problems relevant to your target industry. By carefully selecting projects that align with these three factors, you'll create a compelling portfolio that showcases your skills and increases your chances of landing your desired data science role.
Essential Projects for Your Portfolio
While tailoring your portfolio to specific job applications yields the best results, entry-level applicants often need to cast a wider net. Many entry-level positions require similar skills across industries. To create a strong foundation for your data science portfolio, include at least one project from each of these categories:
1. Data Cleaning Project
Data preparation, munging, or cleaning accounts for 60-80% of most data science jobs, making it a crucial skill to demonstrate. Your data cleaning project should:
- Use a messy, uncleaned dataset
- Pose interesting analytical questions
- Clean the data and perform basic analysis to answer those questions
To increase difficulty and showcase additional skills:
- Collect your own data using APIs or web scraping
- Work with unstructured data instead of messy-but-structured datasets
2. Data Storytelling and Visualization Project
Effective data storytelling and visualization are key components of any data science role. This project should:
- Take readers on an analytical journey
- Present conclusions understandable to non-technical audiences
- Showcase your data visualization and communication skills
Consider presenting your project in a Jupyter Notebook or R Markdown. To add complexity:
- Customize chart designs
- Include interactive elements
3. Group Project
Collaboration is crucial in data science. A group project demonstrates your ability to:
- Communicate effectively in a team setting
- Divide labor fairly
- Collaborate genuinely
- Manage projects using Git and GitHub
To make your portfolio stand out, consider contributing to open-source projects:
"Successful commits to well-known open-source projects" are one of the things that makes a data science portfolio stand out to me.
― Alina Chistyakova, Lead IT Recruiter at Spice IT Recruitment
Jeff Hall, HR Director at Kitware, adds:
"What really puts a plus in the column of candidates that apply here is having contributed to our specific open-source projects."
By including these three types of projects in your data science portfolio, you'll demonstrate a wide range of skills essential for entry-level positions across various industries. Remember to tailor your projects to your specific interests and target roles whenever possible to create a truly compelling portfolio.
Advanced Project Ideas to Enhance Your Portfolio
While the core projects mentioned earlier form a solid foundation, consider adding these advanced project types to make your data science portfolio stand out:
1. End-to-End System Building Project
Many data science roles involve building systems that can analyze data sets continuously as they arrive, rather than focusing on a single, static data set. This type of project demonstrates your ability to:
- Create systems that perform consistent analysis on new data inputs
- Develop user-friendly systems that others can easily understand and operate
A basic version of this project could include:
- Well-commented code that analyzes a public, regularly-updated data set
- A comprehensive README file explaining how to use the system
- Command-line functionality for easy execution by other coders
To increase the complexity and showcase advanced skills, consider:
- Building interactive web dashboards
- Developing systems that handle real-time or streaming data
The primary goal is to demonstrate your ability to create reusable, understandable analytical systems that other programmers can easily work with and modify.
2. Explanatory Blog Post, Article, or Talk
Effective communication of complex technical concepts is a crucial skill for data scientists. Adding an explanatory piece to your portfolio can showcase your ability to break down intricate ideas for a broader audience. When creating this type of content:
- Choose a suitably complex topic that you thoroughly understand
- Explain the concept in simple, accessible terms
- Tailor the content to your target industry
For example, you could write a blog post detailing the inner workings of a machine learning algorithm commonly used in your target industry. This approach not only demonstrates your technical knowledge but also your ability to communicate effectively – a highly valued skill in the data science field.
By incorporating these advanced projects into your data science portfolio, you'll demonstrate a broader range of skills and stand out to potential employers. Remember to balance complexity with clarity, ensuring that your projects showcase both your technical prowess and your ability to communicate data-driven insights effectively.
How to Present Your Data Science Portfolio Projects
After deciding on the projects to include in your data science portfolio, the next crucial step is determining how to present them effectively. There are two primary approaches to showcasing your work:
- GitHub
- Personal portfolio websites
Our research, based on conversations with numerous recruiters, overwhelmingly suggests that having an active GitHub account is essential. If you're aiming for broad appeal in the data science job market, GitHub should be your starting point.
While some recruiters appreciate well-crafted project presentations on personal portfolio sites, others prefer to focus solely on a candidate's GitHub profile. Given this mixed feedback, we recommend prioritizing your GitHub presence.
Leveraging GitHub
If you're new to GitHub, start by familiarizing yourself with the basics:
When setting up your GitHub account, choose a professional username, as this will be visible to potential employers.
Best Practices for Presenting Projects on GitHub
- Use Jupyter Notebook or R Markdown: These formats are ideal for showcasing most data science projects.
- Keep code blocks concise: Intersperse them with clear, concise explanations of what the code does and why.
- Utilize text formatting: Use headings, subheadings, bold, italics, and code snippets to enhance readability.
- Follow coding best practices: Adhere to naming conventions, preferred styles, and maintain clean, efficient code.
- Add helpful comments: Include comments that make your code easy to understand at a glance.
Remember, good commenting practices not only make your code more accessible but also demonstrate your communication and teamwork skills – crucial attributes in collaborative data science environments.
Advanced GitHub Tips
To ensure your GitHub projects are professional and accessible, consider these important points:
Code and Data Management
- Use relative file paths for data access to ensure your code runs on any machine
- Include package and version information for all external libraries used
- Protect sensitive information like API keys (see our guide on how to share your portfolio safely)
- Verify you have the legal right to redistribute any included data
Creating an Effective README File
Each project should have a README file in Markdown format that includes:
- A brief project overview
- Project goals and objectives
- Techniques and technologies used
- Summary of conclusions
- Installation and usage instructions
Maintaining an Active and Clean GitHub Profile
Remember that your entire GitHub profile is visible to potential employers. To make a good impression:
- Keep your account active with regular contributions
- Remove or archive old, abandoned projects
- Continuously improve and iterate on existing projects
Regularly updating your projects demonstrates your ongoing engagement and interest in data science work, making you a more attractive candidate to potential employers.
Maximizing GitHub Visibility
To ensure employers can easily find your GitHub profile:
- Include a clickable GitHub link on your resume (as mentioned in our resume guide)
- Add your GitHub URL to all professional social media profiles (LinkedIn, Twitter, etc.)
- Include your GitHub link in all job application forms
By following these guidelines, you'll create a compelling and professional GitHub presence that showcases your data science skills effectively. Remember, your GitHub profile is often the first in-depth look employers get at your work, so make sure it represents your abilities accurately and impressively.
Next-Level Portfolio Tip: Dedicated Project Sites
While GitHub is essential for showcasing your data science projects, creating dedicated portfolio websites can take your presentation to the next level. Although not all hiring managers will explore these sites, those who do may find them particularly impressive.
The Benefits of Visual and Interactive Data Science Projects
Edouard Harris from SharpestMinds emphasizes the importance of visual elements in your portfolio:
"In general, you want something that's visual. Ideally something that you have running on a server somewhere."
Harris highlights the advantages of having an accessible, interactive project:
"The optimal situation is: you're at a meetup [talking with someone in the industry], you cleverly steer the conversation in the direction of this cool thing that you built. Then you can take out your phone and be like: check this out. Play with it. It's right here."
Showcasing Advanced Skills Through Web-Based Projects
Creating web-based visual or interactive data projects demonstrates valuable skills beyond basic data analysis:
- Server setup and management
- User interface design
- Web development
These additional skills can significantly enhance your appeal to potential employers in the data science field.
Balancing Effort and Impact
While building dedicated project sites requires more time and effort than uploading notebooks to GitHub, the long-term benefits can be substantial, especially when networking at industry events. (Learn more about networking in data science)
Consider the difference in impact:
- Scrolling through GitHub code on a small screen at a conference (challenging and less engaging)
- Presenting a clear, visual, interactive data project (memorable and impressive)
Inspiration for Projects
For inspiration, check out these examples of visually striking and interactive data projects:
Note: These examples are highly polished and not expected from entry-level applicants. However, they illustrate the potential impact of visual and interactive projects compared to traditional GitHub repositories.
By creating dedicated project sites alongside your GitHub portfolio, you'll demonstrate a broader range of skills and create more memorable impressions during networking events and job interviews. This approach can set you apart in the competitive field of data science.
How to Build a Data Science Portfolio
Now that you understand the importance of a project portfolio, what to include, and how to present it, it's time to tackle the challenging part: building a data science portfolio.
While your chosen projects will depend on your interests and target job roles, here are some resources to help you get started:
- Dataquest's data science courses with open-ended guided projects
- Our Python Projects for Beginners article (most ideas are adaptable to R programming)
These resources can serve as a foundation for more complex projects. For example, you could complete a guided project and then apply similar analysis to a new dataset for your portfolio.
Finding Unique Data Sources
Choosing the right data is crucial for your project. To stand out, avoid overused datasets from popular platforms like Kaggle―popular data sets on Kaggle will have been used in hundreds of projects, and employers will be sick of seeing them. Instead, consider these less common data sources:
- Data Portals: 551+ open data portals worldwide
- Data.gov: US government data on various topics
- AWS Open Data: Diverse datasets from web-crawling to satellite monitoring
- Data.world: A GitHub-like platform for datasets
- r/datasets: A subreddit for sharing and requesting datasets
- AcademicTorrents: Datasets from scientific research and publications
Creating Unique Datasets
To ensure your project is truly unique, consider creating your own dataset. Here are some methods:
1. Web Scraping and API Access
Learn these techniques with Dataquest's resources:
- APIs and Web Scraping course
- Web scraping with BeautifulSoup tutorial
- Using APIs tutorial
- Real-time Twitter data analysis tutorial
2. Conducting Surveys or Manual Data Collection
While time-intensive, these methods can yield unique datasets:
3. Personal Data Analysis
Analyze your own data for unique insights:
While using personal data can be risky, it can lead to interesting projects with broader appeal if approached thoughtfully.
By utilizing these resources and creating unique datasets, you can develop standout projects for your data science portfolio that demonstrate your skills and creativity to potential employers.
Enhancing Your Projects with Compelling Visualizations
To make your data science portfolio stand out, focus on creating unique and professional visualizations. Upgrading your charts and graphs from the default look can significantly impact how recruiters perceive your work.
Improving Data Visualizations with Code and Design Principles
- Learn to customize charts programmatically: Tutorial on creating FiveThirtyEight-style charts in Python
- Apply fundamental design concepts: Basic design principles for data visualization
Essential Data Visualization Resources
- Data Viz Project's Chart Gallery: Explore diverse chart types to find the best fit for your data
- Hubspot's Data Visualization Design Guide: Tips for designing specific chart types
- Dataquest's Guide to Color in Data Visualization: In-depth advice on color selection
- Geckoboard's Data Visualization Tips: Printable poster for quick reference
Finding Inspiration for Your Data Science Portfolio Projects
To elevate your projects from good to great, explore these sources of inspiration:
- FiveThirtyEight: Leaders in data journalism, covering politics and sports
- Access their published datasets to reverse-engineer their analyses
- Information is Beautiful Awards: Annual awards for outstanding data-based projects
- r/DataIsBeautiful: Subreddit featuring amateur and professional data visualizations
- Share your own projects for community feedback
- Kaggle Competitions: Explore completed data science projects with full code access
- Data Science Teams at Major News Outlets: Check "data" sections of national newspapers for interesting projects and datasets
By utilizing these design resources and drawing inspiration from exemplary projects, you can create a data science portfolio that not only demonstrates your technical skills but also showcases your ability to communicate insights effectively through compelling visualizations.
Explore Our Complete Data Science Career Guide
This article is part of our comprehensive Data Science Career Guide. Continue your journey with these related articles:
- Introduction and Table of Contents
- Before You Apply: Considering Your Options
- How and Where to Find Data Science Jobs
- How to Write a Data Science Resume
- How to Create a Data Science Project Portfolio — You are here
- How to Fill in Application Forms, When to Apply, and Other Considerations
- Preparing for Job Interviews in Data Science
- Assessing and Negotiating Job Offers