The Best Free Books for Learning Data Science
If you want to learn data science, obviously the most important thing you can do is get your hands on some real-world data and start coding. Our learning platform is based on helping you do exactly that. But what can you do to keep learning in those moments when you’re not sitting in front of a computer? Read some data science books!
As a student we recently spoke with pointed out that ebooks are a great way to immerse yourself in data science learning in those moments when you can’t actually get hands-on with code (like on a bus ride, for example, or while waiting in line). You can even listen to them like podcasts if you use an ebook app with a “read aloud” feature (although you’ll have to put up with a computerized reading voice and it’s likely to mangle any attempts to read code).
And thankfully, the data science community is very open and giving, so there are a ton of ebooks about data science that you can enjoy without paying a dime. Below, we’ve listed plenty of our favorites, but this is really just the tip of the iceberg, so if you ever get through all of these, rest assured there’s even more out there.
Note: Some of the links below are PDF links.
General Data Science Topics
The Elements of Data Analytic Style - This book by Johns Hopkins professor Jeff Leek is a useful guide for anyone involved with data analysis, and covers a lot of the little details you might miss in statistics lessons and textbooks. It’s a pay-what-you-want book, so while you can technically get this one for free, we recommend making a contribution if you can.
The Art of Data Science - Another pay-what-you-want book that takes a big-picture view of how to do data science rather than focusing on the technical nitty gritty of statistical or programming techniques.
An Introduction to Data Science - This introductory textbook was written by Syracuse professor Jeffrey Stanton, and it covers a lot of the fundamentals of data science and statistics. It also covers some R programming, but sections of it are very worthwhile reading even for those who’re learning Python.
Social Media Mining - This textbook from Cambridge University Press won’t be relevant for every data science project, but if you do have to scrape data from social media platforms, this is a well-rated guidebook. Note that the site also includes links to some free slide presentations on related topics as well.
The Data Science Handbook - This book is a collection of interviews with prominent data scientists. It doesn’t offer any technical or mathematical insight, but it’s a great read for anyone who’s thinking about data science as a career and wondering what it entails, what roles are out there, and whether it might be right for them.
Python Skills for Data Science
Python Data Science Handbook - An O’Reilly text by Jake VanderPlas that is also available as a series of Jupyter Notebooks on Github. It’s not for total beginners; it assumes some knowledge of Python programming basics (but don’t worry, we’ve got a free class you can take for that).
Automate the Boring Stuff with Python - This total beginner’s Python book isn’t focused on data science specifically, but the introductory concepts it teaches are all relevant in data science, and some of the specific skills later in the book (like web scraping and working with Excel files and CSVs) will be of use to data scientists, too.
A Byte of Python (PDF link) - Like Automate the Boring Stuff, this is another well-liked Python-from-scratch ebook that teaches the basics of the language to total beginners. It’s not data-science-specific, but most of the concepts it covers are relevant to data scientists, and it has also been translated into a wide variety of languages, so it’s easily accessible to learners all over the globe.
Learn Python, Break Python - Yet another well-liked Python-for-beginners tome that encourages readers to learn Python by “breaking” it and watching how it handles errors and mistakes.
R Skills for Data Science
R Programming for Data Science - Roger D. Peng’s free text will teach you R for data science from scratch, covering the basics of R programming. This is a pay-what-you-want text, but if you do choose to chip in a bit of money, note that for $20 you can get it together with all of the mentioned datasets and code files.
An Introduction to Data Science (PDF link) - This introductory text was already listed above, but we’re listing it again in the R section as well, because it does cover quite a bit of R programming for data science.
Advanced R - This is precisely what it sounds like: a free online text that covers more advanced R topics.
Neural Networks and Deep Learning - This free online book aims to teach machine learning principles. It’s not the place to go to learn the technical intricacies of any particular library, and it’s written with the now-outdated Python 2.7 rather than Python 3, but there’s still a lot of valuable wisdom here.
Bayesian Reasoning and Machine Learning (PDF link) - A massive 680-page PDF that covers many important machine learning topics, and which was written to serve students who don’t necessarily have any formal background in computer science or advanced mathematics.
Understanding Machine Learning: From Theory to Algorithms - Looking for a thorough look at machine learning that runs from the fundamentals all the way through advanced machine learning theory? Look no further.
Deep Learning - This textbook from MIT Press is only available in HTML format, but it covers everything from the basics up through what’s happening with research into deep learning.
Machine Learning Yearning - This upcoming book from Andrew Ng isn’t technically available, or even finished, but signing up for a mailing list will get you emailed copies of draft chapters. Ng says that where courses teaching technical skills can give you a “hammer”, this book’s aim is to teach you how to use that hammer correctly.
Natural Language Processing with Python - A great text for anyone interested in NLP, and the online version has been updated with Python 3 (the printed version of this book uses Python 2).
Introduction to Probability (PDF link) - Precisely what it sounds like: an introductory textbook that teaches probability and statistics.
Statistical Inference for Data Science - A rigorous look at statistical inference for readers who are already somewhat comfortable with basic statistics topics and programming with R.
An Introduction to Statistical Learning (PDF link) - A great introduction to data-science-relevant statistical concepts and R programming.
The Elements of Statistical Learning - Another valuable statistics text that covers just about everything you might want to know, and then some (it’s over 750 pages long). Make sure you get the most updated version of the book from here (as of this writing, that’s the 2017 edition.
Data Mining and Analysis - This Cambridge University Press text will take you deep into the statistics and algorithms used for various types of data analysis.
Our classes teach you everything that you need to become a data scientist in a hands-on, project-based format, but that doesn’t mean that you won’t benefit from supplementing your learning with books like these. Especially if you’re trying to make quick progress, immersing yourself in the topic is always a great idea, and if you can pull out a PDF and get a little learning in while you’re standing in line at the grocery, that’s only going to help in the long run.
If you do find any of these books valuable, we do suggest that you attempt to purchase them, via the author’s site if possible. It’s always good to reward folks who are giving away great learning content for free, and many of these books are available in alternative formats or in print if you’re willing to pay.
Remember, also, that this is just the tip of the iceberg when it comes to free data science ebooks. There are hundreds more out there. We’ll be updating this article from time to time when we find new free books we like, but if you know of a great one that isn’t listed here, drop us a line at hello [at] dataquest.io and maybe we’ll add it to the list!
Charlie is a student of data science, and also a content marketer at Dataquest.