32 Best Free Datasets for Projects (2025)
Whether you're building your first data science project or adding to your existing portfolio of data analysis projects, finding high-quality, free data is not an easy task. The right dataset can make the difference between a project that showcases your skills and one that gets lost in data cleaning headaches.
We've curated spectacular free datasets for:
- Data visualization projects
- Machine learning projects
- Data processing projects
- Data cleaning projects
- Data analytics projects
- Government and demographic analysis
- Academic and research projects
- Personal data analysis
Plus, we've included powerful search tools to help you find exactly what you need.
How to Choose Quality Datasets
Before getting into our list, here's what makes a great dataset for data projects:
- Clean and well-documented data saves you time. Look for datasets with clear column headers, data dictionaries, and minimal missing values. A messy dataset might be good practice for data cleaning skills, but it shouldn't be your first project before focusing on your real project.
- Appropriate size and complexity matter. Start with datasets that have enough rows to be interesting (typically 1,000+ records) but won't overwhelm your system. As you build confidence, scale up to larger datasets.
- Interesting questions drive engagement. The best datasets let you explore multiple angles and tell compelling stories. Look for data that makes you curious.
- Reliable sources ensure accuracy. Government agencies, academic institutions, and established organizations typically provide trustworthy data. Community-contributed datasets can be excellent, but verify their credibility first.
If you prefer a bit more hand-holding while learning, consider guided projects that combine quality datasets with step-by-step project ideas. These walk you through the complete analysis process with datasets that have already been vetted, so you can focus on building skills rather than worrying about data quality. Many are free to access!
Public Datasets for Data Visualization Projects
For data visualization projects, you want clean data that tells a story. These sources provide well-maintained datasets perfect for creating compelling charts and dashboards.
1. FiveThirtyEight
FiveThirtyEight is a data journalism site that publishes datasets behind their stories on politics, sports, and culture. Their data is exceptionally clean and comes with context from published articles.
Examples include airline safety data, US weather historical data, and study drug usage patterns. Each dataset connects to an article, giving you a model for how to present findings.
2. Our World in Data
Our World in Data provides research and data on global challenges like poverty, disease, and climate change. Their datasets come with ready-made visualizations you can study and improve upon.
This is a great resource for country-level comparisons and understanding how to visualize trends over time. The site covers literacy rates, economic progress, health outcomes, and more.
3. NASA Earth and Space Data
NASA maintains extensive free public datasets on both Earth science and space exploration. You can filter by format to find CSV datasets ready for analysis.
View NASA Earth Data
View NASA Space Data
The data ranges from satellite imagery to climate measurements, offering unique opportunities for scientific visualization projects.
4. Tableau Public Datasets
Tableau Public curates datasets specifically designed for data visualization practice. These cover health, social impact, climate, and government topics.
While Tableau Public is a visualization platform, their datasets work with any analytics tool and are particularly well-suited for creating professional dashboards.
Public Datasets for Machine Learning Projects
Machine learning projects require datasets with clear target variables and sufficient features for prediction. These repositories are curated explicitly for ML work and tend to have larger datasets.
5. Kaggle
Kaggle hosts thousands of datasets contributed by its data science community, plus competition datasets. With hundreds of thousands of datasets available, it's one of the largest repositories for machine learning projects.
Popular examples include the Titanic survival dataset, house price predictions, and satellite image classification. Each large dataset has a usability score and community discussions to help you get started. Building a strong Kaggle profile can even help during job interviews for data scientist roles.
6. UCI Machine Learning Repository
The UCI Machine Learning Repository is one of the oldest and most respected sources for ML datasets. While user-contributed, the vast majority are clean and well-documented.
View UCI Machine Learning Repository
Find datasets on email spam classification, wine characteristics, and solar flares. These tend to be smaller datasets perfect for learning core machine learning algorithms before scaling to larger projects.
7. OpenML
OpenML is a collaborative platform where you can share, explore, and compare machine learning experiments on thousands of datasets covering image classification, natural language processing, and social sciences.
The community aspect lets you benchmark your machine learning model performance against others and replicate experiments to learn from successful approaches.
8. TensorFlow Datasets
TensorFlow provides specialized datasets optimized for deep learning and artificial intelligence projects, including image, text, and audio data ready for model training.
Notable collections include CelebA (200,000+ celebrity images) and Common Crawl corpus (multi-language web data spanning seven years). These are ideal for practicing neural networks and modern ML techniques.
Public Datasets for Data Processing Projects
Large-scale data processing projects need substantial, interesting datasets. Cloud providers host these specifically to encourage using their platforms, but you can download and process them locally too.
9. AWS Public Datasets
Amazon Web Services hosts massive datasets available for download or cloud processing. You'll need an AWS account, but the free tier lets you explore without charges.
Examples include Google Books n-grams (common word patterns from millions of books), Common Crawl (5+ billion web pages), and Landsat satellite imagery. These are perfect for learning distributed computing with tools like Spark.
10. Google Cloud Public Datasets
Google Cloud Platform offers large datasets accessible through BigQuery. Your first 1TB of queries is free, making it practical for learning SQL and working with big data.
Notable datasets include USA Names (Social Security applications from 1879-2015), GitHub activity (2.8 million public repositories), and historical weather data from 9,000 NOAA stations. These demonstrate real-world data scale.
11. Wikipedia Datasets
Wikipedia offers complete dumps of article content, edit history, and metadata. This gives you massive text datasets for natural language processing and analyzing how information evolves.
The breadth of topics makes Wikipedia data valuable for text analysis, information retrieval, and understanding collaborative content creation at scale.
Public Datasets for Data Cleaning Projects
Data cleaning projects benefit from real-world messiness. These aggregators can help you find a public dataset that requires research, cleaning, and thoughtful preprocessing.
12. Data.gov
Data.gov is the US government's open data platform with over 290,000 datasets from federal agencies. The data ranges from government budgets to school performance, often requiring significant cleaning and domain research.
Examples include the Food Environment Atlas, school system finances, and chronic disease indicators. This government data represents real public sector information with all its complexity.
13. data.world
data.world functions as a social network for data people, where you can search, copy, analyze, and collaborate on datasets. It combines user-contributed data with partnerships providing federal government data.
A key differentiator is the ability to write SQL queries directly in their interface to explore and join multiple datasets before downloading. The free Community plan provides access to thousands of projects.
14. The World Bank Open Data
The World Bank funds development programs globally and releases extensive data monitoring these initiatives. Datasets include World Development Indicators, educational statistics, and project costs.
The data often has missing values and requires multiple clicks to access, making it realistic practice for working with international development data and understanding data quality issues.
15. /r/datasets
The datasets subreddit is where Reddit's community shares interesting, unusual datasets. The scope varies widely since submissions are user-driven, but you'll find unique data you won't see elsewhere.
Notable examples include complete Reddit submission history, Jeopardy questions, and NYC property tax data. Sort by top posts of all time to find the most valuable contributions.
Free Datasets for Data Analytics Projects
Business analysts and data analytics professionals need datasets that support operational insights and business intelligence work.
16. Quandl (Nasdaq Data Link)
Quandl, now Nasdaq Data Link, specializes in financial and economic datasets. It offers both free and premium data covering real estate, economic indicators, and financial markets.
The platform is particularly valuable for time series analysis and financial modeling. Data is available in multiple formats and can be accessed via API for automated workflows.
17. Pew Research Center
Pew Research conducts extensive surveys on politics, social issues, and media. They release datasets publicly for secondary analysis after an embargo period.
Topics include US politics, journalism and media, internet and tech, and religion. These datasets are excellent for understanding survey methodology and social science research.
18. Bureau of Labor Statistics
The BLS provides economic data including unemployment rates, inflation, wages, and productivity. Most data can be filtered by time and geography.
This is essential data for economic analysis and understanding labor market trends. The datasets are regularly updated, providing opportunities for time series analysis and forecasting.
Government and Census Data
Government agencies provide some of the most reliable open data available. These sources are particularly strong for demographic, economic, and public health research.
19. US Census Bureau
The Census Bureau offers demographic data at state, city, and zip code levels. This data is exceptionally clean and comprehensive, ideal for geographic data visualizations.
The data is also accessible via API, and R packages like choroplethr make it easy to create maps and visualizations of population trends, income, education, and housing.
20. UK Data Service
The UK Data Service provides access to thousands of datasets on British society, covering topics from crime and education to transportation and health.
This is valuable for international comparisons and understanding how different countries structure their open data platforms. Many datasets are longitudinal, tracking changes over decades.
21. National Centers for Environmental Information
The NCEI (formerly the National Climatic Data Center) provides extensive climate data and weather records. This is crucial for anyone working on climate change analysis or environmental data science.
The datasets span historical data weather patterns, severe weather events, and long-term climate trends, offering rich opportunities for time series analysis and climate modeling.
Academic and Research Datasets
Academic repositories provide peer-reviewed research data with detailed documentation. These are excellent for learning proper data citation and understanding research methodology.
22. Harvard Dataverse
Harvard Dataverse is an open data repository managed by Harvard's Institute for Quantitative Social Science. It contains over 75,000 datasets spanning 2,000+ databases across all research disciplines.
This is a great resource for finding research data with proper documentation and understanding how academic researchers structure and share their data. All datasets include citation information and often link to published papers.
23. Stanford Large Network Dataset Collection
Stanford maintains a collection of network datasets including social networks, communication patterns, web graphs, and citation networks. This is essential for anyone learning graph analysis and network science.
The datasets are particularly valuable for learning about real-world network structures and practicing graph algorithms with actual social and information networks.
24. World Health Organization (Global Health Observatory)
The WHO maintains comprehensive global health data through the Global Health Observatory, including all COVID-19 pandemic data plus datasets on antimicrobial resistance, dementia, air pollution, and immunization.
With over 1,000 health indicators, this is invaluable for anyone working in public health analytics or studying global health trends across countries and time periods.
25. Humanitarian Data Exchange
The Humanitarian Data Exchange is managed by the UN Office for the Coordination of Humanitarian Affairs. It provides open data on humanitarian crises, conflict zones, and disaster response.
View Humanitarian Data Exchange
This data source is unique in covering emergency response and humanitarian needs, offering perspective on how data supports critical decision-making in crisis situations.
26. Academic Torrents
Academic Torrents hosts datasets from scientific papers, making research data accessible. The site contains everything from the famous Enron email corpus to student learning factors and news article datasets.
You'll need a BitTorrent client to download, but the datasets are often large and comprehensive. This is particularly good for finding datasets used in published research papers.
Personal Data Sources
Want something truly unique? Analyze your own data. These platforms let you download your personal activity and spending patterns.
27. Amazon Purchase History
Amazon lets you download your complete order history, spending data, and browsing activity. This makes for an interesting personal data science project analyzing your own consumer behavior.
Sign into your Amazon account, navigate to Amazon Privacy Central and request your data. After Amazon processes your request (which can take a few hours to a few days), you'll receive an email with a download link. You can analyze spending patterns, category preferences, and purchasing seasonality.
28. Facebook Personal Data
Facebook provides tools to download your complete activity data, including posts, messages, photos, and engagement metrics.
Select the data types you want and Facebook will compile them for download. This offers insight into social media usage patterns and personal digital footprint.
29. Netflix Viewing History
Netflix allows you to request your viewing data, though the process takes up to 30 days and the data provided is somewhat limited compared to other platforms.
While more restricted, this data can still support projects analyzing personal entertainment preferences and viewing habits over time.
Powerful Data Set Search Tools
Can't find what you need? These search tools aggregate datasets from across the web.
30. Google Dataset Search
Google Dataset Search indexes over 25 million datasets from publishers worldwide. It's like Google Search specifically for finding data.
The search is powerful with extensive filters to narrow results by format, license, topic, and update frequency. This should be your first stop when looking for data on a specific topic.
31. GitHub Repositories
GitHub hosts numerous dataset collections, including the popular "Awesome Public Datasets" repository. While primarily a code platform, many projects share their data here so you can easily find a public dataset to work with.
You can also access GitHub's own data through their API to analyze repository activity, code evolution, and open source development patterns.
32. Microsoft Azure Open Datasets
Microsoft Azure provides curated open datasets optimized for machine learning, including weather data, satellite imagery, and public domain datasets integrated with Azure services.
While designed for use with Azure, these datasets can be downloaded and used with any analytics platform. The collection focuses on commonly-used benchmark datasets for ML.
Building Your Data Science Portfolio
Now that you have access to quality datasets, it's time to build projects that showcase your skills. The key is choosing datasets that let you demonstrate specific competencies employers value.
- For beginners, start with clean datasets from sources like FiveThirtyEight or UCI. Focus on completing end-to-end projects rather than getting lost in data cleaning.
- For intermediate practitioners, tackle datasets from Kaggle competitions or Data.gov that require more preprocessing. Document your cleaning process to show real-world data wrangling skills.
- For advanced work, use large-scale datasets from AWS or Google Cloud to demonstrate distributed computing skills, or combine multiple sources to show data integration capabilities.
All of our data science courses include guided projects using real, high-quality datasets designed to accelerate learning and build your portfolio. These projects walk you through the complete analysis process, from data exploration to presenting insights.
Next Steps
Ready to start your next data project? Pick a dataset that excites you, formulate interesting questions, and start exploring. The best portfolio projects demonstrate both technical skills and genuine curiosity about the data.
Remember: employers want to see your thinking process, not just polished results. Document your approach, explain your decisions, and share what you learned. That's what makes a portfolio irresistible.
Explore more in our 'Build a Data Science Portfolio' series: