Last Updated։ February 11, 2026

32 Best Free Datasets for Projects (2026)

Whether you're building your first data science project or adding to your existing portfolio of data analysis projects, finding high-quality, free data is not an easy task. The right dataset can make the difference between a project that showcases your skills and one that gets lost in data cleaning headaches.

We've curated spectacular free datasets for:

Data visualization projects
Machine learning projects
Data processing projects
Data cleaning projects
Data analytics projects
Government and demographic analysis
Academic and research projects
Personal data analysis

Plus, we've included powerful search tools to help you find exactly what you need.

How to Choose Quality Datasets

Before getting into our list, here's what makes a great dataset for data projects:

Clean and well-documented data saves you time. Look for datasets with clear column headers, data dictionaries, and minimal missing values. A messy dataset might be good practice for data cleaning skills, but it shouldn't be your first project before focusing on your real project.
Appropriate size and complexity matter. Start with datasets that have enough rows to be interesting (typically 1,000+ records) but won't overwhelm your system. As you build confidence, scale up to larger datasets.
Interesting questions drive engagement. The best datasets let you explore multiple angles and tell compelling stories. Look for data that makes you curious.
Reliable sources ensure accuracy. Government agencies, academic institutions, and established organizations typically provide trustworthy data. Community-contributed datasets can be excellent, but verify their credibility first.

Free Datasets Inside Structured Projects

If you'd rather work with datasets that are already packaged into complete project ideas, Dataquest offers several free, hands-on projects built around real-world data. You can download the datasets and use them in your own data science projects.

Exploring Hacker News Posts (Python): Analyze real Hacker News data to uncover trends in posts, comments, and engagement.
Analyzing Kickstarter Projects (SQL): Use real Kickstarter campaign data to identify factors that influence project success.
A ML Analysis of Stack Overflow Survey Data (Python): Build and evaluate machine learning models using real survey data from developers.
Predicting Condominium Sale Prices (R): Use real NYC housing data to build a linear regression model for price prediction.
Identifying Profitable App Profiles for the App Store (Python): Explore real mobile app store data to identify characteristics of profitable apps.
Predicting Listing Gains in the Indian IPO Market (Python): Apply deep learning techniques to real stock market data to predict IPO performance.

Each project includes a downloadable dataset and a clear analytical objective, making them a strong starting point if you're building your portfolio.

View all free projects with downloadable datasets

Public Datasets for Data Visualization Projects

For data visualization projects, you want clean data that tells a story. These sources provide well-maintained datasets perfect for creating compelling charts and dashboards.

1. FiveThirtyEight

FiveThirtyEight is a data journalism site that publishes datasets behind their stories on politics, sports, and culture. Their data is exceptionally clean and comes with context from published articles.

View FiveThirtyEight Datasets

Examples include airline safety data, US weather historical data, and study drug usage patterns. Each dataset connects to an article, giving you a model for how to present findings.

2. Our World in Data

Our World in Data provides research and data on global challenges like poverty, disease, and climate change. Their datasets come with ready-made visualizations you can study and improve upon.

View Our World in Data

This is a great resource for country-level comparisons and understanding how to visualize trends over time. The site covers literacy rates, economic progress, health outcomes, and more.

3. NASA Earth and Space Data

NASA maintains extensive free public datasets on both Earth science and space exploration. You can filter by format to find CSV datasets ready for analysis.

View NASA Earth Data
View NASA Space Data

The data ranges from satellite imagery to climate measurements, offering unique opportunities for scientific visualization projects.

4. Tableau Public Datasets

Tableau Public curates datasets specifically designed for data visualization practice. These cover health, social impact, climate, and government topics.

View Tableau Public Datasets

While Tableau Public is a visualization platform, their datasets work with any analytics tool and are particularly well-suited for creating professional dashboards.

Public Datasets for Machine Learning Projects

Machine learning projects require datasets with clear target variables and sufficient features for prediction. These repositories are curated explicitly for ML work and tend to have larger datasets.

5. Kaggle

Kaggle hosts thousands of datasets contributed by its data science community, plus competition datasets. With hundreds of thousands of datasets available, it's one of the largest repositories for machine learning projects.

View Kaggle Datasets

Popular examples include the Titanic survival dataset, house price predictions, and satellite image classification. Each large dataset has a usability score and community discussions to help you get started. Building a strong Kaggle profile can even help during job interviews for data scientist roles.

6. UCI Machine Learning Repository

The UCI Machine Learning Repository is one of the oldest and most respected sources for ML datasets. While user-contributed, the vast majority are clean and well-documented.

View UCI Machine Learning Repository

Find datasets on email spam classification, wine characteristics, and solar flares. These tend to be smaller datasets perfect for learning core machine learning algorithms before scaling to larger projects.

7. OpenML

OpenML is a collaborative platform where you can share, explore, and compare machine learning experiments on thousands of datasets covering image classification, natural language processing, and social sciences.

View OpenML Datasets

The community aspect lets you benchmark your machine learning model performance against others and replicate experiments to learn from successful approaches.

8. TensorFlow Datasets

TensorFlow provides specialized datasets optimized for deep learning and artificial intelligence projects, including image, text, and audio data ready for model training.

View TensorFlow Datasets

Notable collections include CelebA (200,000+ celebrity images) and Common Crawl corpus (multi-language web data spanning seven years). These are ideal for practicing neural networks and modern ML techniques.

Public Datasets for Data Processing Projects

Large-scale data processing projects need substantial, interesting datasets. Cloud providers host these specifically to encourage using their platforms, but you can download and process them locally too.

9. AWS Public Datasets

Amazon Web Services hosts massive datasets available for download or cloud processing. You'll need an AWS account, but the free tier lets you explore without charges.

View AWS Public Datasets

Examples include Google Books n-grams (common word patterns from millions of books), Common Crawl (5+ billion web pages), and Landsat satellite imagery. These are perfect for learning distributed computing with tools like Spark.

10. Google Cloud Public Datasets

Google Cloud Platform offers large datasets accessible through BigQuery. Your first 1TB of queries is free, making it practical for learning SQL and working with big data.

View Google Public Datasets

Notable datasets include USA Names (Social Security applications from 1879-2015), GitHub activity (2.8 million public repositories), and historical weather data from 9,000 NOAA stations. These demonstrate real-world data scale.

11. Wikipedia Datasets

Wikipedia offers complete dumps of article content, edit history, and metadata. This gives you massive text datasets for natural language processing and analyzing how information evolves.

View Wikipedia Datasets

The breadth of topics makes Wikipedia data valuable for text analysis, information retrieval, and understanding collaborative content creation at scale.

Public Datasets for Data Cleaning Projects

Data cleaning projects benefit from real-world messiness. These aggregators can help you find a public dataset that requires research, cleaning, and thoughtful preprocessing.

12. Data.gov

Data.gov is the US government's open data platform with over 290,000 datasets from federal agencies. The data ranges from government budgets to school performance, often requiring significant cleaning and domain research.

View Data.gov Datasets

Examples include the Food Environment Atlas, school system finances, and chronic disease indicators. This government data represents real public sector information with all its complexity.

13. data.world

data.world functions as a social network for data people, where you can search, copy, analyze, and collaborate on datasets. It combines user-contributed data with partnerships providing federal government data.

View data.world Datasets

A key differentiator is the ability to write SQL queries directly in their interface to explore and join multiple datasets before downloading. The free Community plan provides access to thousands of projects.

14. The World Bank Open Data

The World Bank funds development programs globally and releases extensive data monitoring these initiatives. Datasets include World Development Indicators, educational statistics, and project costs.

View World Bank Datasets

The data often has missing values and requires multiple clicks to access, making it realistic practice for working with international development data and understanding data quality issues.

15. /r/datasets

The datasets subreddit is where Reddit's community shares interesting, unusual datasets. The scope varies widely since submissions are user-driven, but you'll find unique data you won't see elsewhere.

View /r/datasets

Notable examples include complete Reddit submission history, Jeopardy questions, and NYC property tax data. Sort by top posts of all time to find the most valuable contributions.

Free Datasets for Data Analytics Projects

Business analysts and data analytics professionals need datasets that support operational insights and business intelligence work.

16. Quandl (Nasdaq Data Link)

Quandl, now Nasdaq Data Link, specializes in financial and economic datasets. It offers both free and premium data covering real estate, economic indicators, and financial markets.

View Nasdaq Data Link

The platform is particularly valuable for time series analysis and financial modeling. Data is available in multiple formats and can be accessed via API for automated workflows.

17. Pew Research Center

Pew Research conducts extensive surveys on politics, social issues, and media. They release datasets publicly for secondary analysis after an embargo period.

View Pew Research Datasets

Topics include US politics, journalism and media, internet and tech, and religion. These datasets are excellent for understanding survey methodology and social science research.

18. Bureau of Labor Statistics

The BLS provides economic data including unemployment rates, inflation, wages, and productivity. Most data can be filtered by time and geography.

View BLS Data

This is essential data for economic analysis and understanding labor market trends. The datasets are regularly updated, providing opportunities for time series analysis and forecasting.

Government and Census Data

Government agencies provide some of the most reliable open data available. These sources are particularly strong for demographic, economic, and public health research.

19. US Census Bureau

The Census Bureau offers demographic data at state, city, and zip code levels. This data is exceptionally clean and comprehensive, ideal for geographic data visualizations.

View Census Data

The data is also accessible via API, and R packages like choroplethr make it easy to create maps and visualizations of population trends, income, education, and housing.

20. UK Data Service

The UK Data Service provides access to thousands of datasets on British society, covering topics from crime and education to transportation and health.

View UK Data Service

This is valuable for international comparisons and understanding how different countries structure their open data platforms. Many datasets are longitudinal, tracking changes over decades.

21. National Centers for Environmental Information

The NCEI (formerly the National Climatic Data Center) provides extensive climate data and weather records. This is crucial for anyone working on climate change analysis or environmental data science.

View NCEI Data

The datasets span historical data weather patterns, severe weather events, and long-term climate trends, offering rich opportunities for time series analysis and climate modeling.

Academic and Research Datasets

Academic repositories provide peer-reviewed research data with detailed documentation. These are excellent for learning proper data citation and understanding research methodology.

22. Harvard Dataverse

Harvard Dataverse is an open data repository managed by Harvard's Institute for Quantitative Social Science. It contains over 75,000 datasets spanning 2,000+ databases across all research disciplines.

View Harvard Dataverse

This is a great resource for finding research data with proper documentation and understanding how academic researchers structure and share their data. All datasets include citation information and often link to published papers.

23. Stanford Large Network Dataset Collection

Stanford maintains a collection of network datasets including social networks, communication patterns, web graphs, and citation networks. This is essential for anyone learning graph analysis and network science.

View Stanford Network Data

The datasets are particularly valuable for learning about real-world network structures and practicing graph algorithms with actual social and information networks.

24. World Health Organization (Global Health Observatory)

The WHO maintains comprehensive global health data through the Global Health Observatory, including all COVID-19 pandemic data plus datasets on antimicrobial resistance, dementia, air pollution, and immunization.

View WHO Data

With over 1,000 health indicators, this is invaluable for anyone working in public health analytics or studying global health trends across countries and time periods.

25. Humanitarian Data Exchange

The Humanitarian Data Exchange is managed by the UN Office for the Coordination of Humanitarian Affairs. It provides open data on humanitarian crises, conflict zones, and disaster response.

View Humanitarian Data Exchange

This data source is unique in covering emergency response and humanitarian needs, offering perspective on how data supports critical decision-making in crisis situations.

26. Academic Torrents

Academic Torrents hosts datasets from scientific papers, making research data accessible. The site contains everything from the famous Enron email corpus to student learning factors and news article datasets.

View Academic Torrents

You'll need a BitTorrent client to download, but the datasets are often large and comprehensive. This is particularly good for finding datasets used in published research papers.

Personal Data Sources

Want something truly unique? Analyze your own data. These platforms let you download your personal activity and spending patterns.

27. Amazon Purchase History

Amazon lets you download your complete order history, spending data, and browsing activity. This makes for an interesting personal data science project analyzing your own consumer behavior.

Access Amazon Reports

Sign into your Amazon account, navigate to Amazon Privacy Central and request your data. After Amazon processes your request (which can take a few hours to a few days), you'll receive an email with a download link. You can analyze spending patterns, category preferences, and purchasing seasonality.

28. Facebook Personal Data

Facebook provides tools to download your complete activity data, including posts, messages, photos, and engagement metrics.

Download Facebook Data

Select the data types you want and Facebook will compile them for download. This offers insight into social media usage patterns and personal digital footprint.

29. Netflix Viewing History

Netflix allows you to request your viewing data, though the process takes up to 30 days and the data provided is somewhat limited compared to other platforms.

Request Netflix Data

While more restricted, this data can still support projects analyzing personal entertainment preferences and viewing habits over time.

Powerful Data Set Search Tools

Can't find what you need? These search tools aggregate datasets from across the web.

30. Google Dataset Search

Google Dataset Search indexes over 25 million datasets from publishers worldwide. It's like Google Search specifically for finding data.

Use Google Dataset Search

The search is powerful with extensive filters to narrow results by format, license, topic, and update frequency. This should be your first stop when looking for data on a specific topic.

31. GitHub Repositories

GitHub hosts numerous dataset collections, including the popular "Awesome Public Datasets" repository. While primarily a code platform, many projects share their data here so you can easily find a public dataset to work with.

View Awesome Public Datasets

You can also access GitHub's own data through their API to analyze repository activity, code evolution, and open source development patterns.

32. Microsoft Azure Open Datasets

Microsoft Azure provides curated open datasets optimized for machine learning, including weather data, satellite imagery, and public domain datasets integrated with Azure services.

View Azure Open Datasets

While designed for use with Azure, these datasets can be downloaded and used with any analytics platform. The collection focuses on commonly-used benchmark datasets for ML.

Building Your Data Science Portfolio

Now that you have access to quality datasets, it's time to build projects that showcase your skills. The key is choosing datasets that let you demonstrate specific competencies employers value.

For beginners, start with clean datasets from sources like FiveThirtyEight or UCI. Focus on completing end-to-end projects rather than getting lost in data cleaning.
For intermediate practitioners, tackle datasets from Kaggle competitions or Data.gov that require more preprocessing. Document your cleaning process to show real-world data wrangling skills.
For advanced work, use large-scale datasets from AWS or Google Cloud to demonstrate distributed computing skills, or combine multiple sources to show data integration capabilities.

All of our hands on data science courses include guided projects using real, high-quality datasets designed to accelerate learning and build your portfolio. These projects walk you through the complete analysis process, from data exploration to presenting insights.

Next Steps

Ready to start your next data project? Pick a dataset that excites you, formulate interesting questions, and start exploring. The best portfolio projects demonstrate both technical skills and genuine curiosity about the data.

Remember: employers want to see your thinking process, not just polished results. Document your approach, explain your decisions, and share what you learned. That's what makes a portfolio irresistible.

Explore more in our 'Build a Data Science Portfolio' series:

Storytelling with Data
Building a Data Science Blog for Your Portfolio
Building a Machine Learning Project
Data Science Portfolios That Will Get You the Job
32 Best Free Datasets for Projects
How to Present Your Portfolio on GitHub

Data Cleaning

Data Visualization