But Excel, while it is great, has its limitations. If you’re working with data every day, these limitations are significant. We spoke with Pacific Life data scientist Alyssa Columbus, who gave us nine reasons why you might find it worth your while to learn a programming language for data analysis, even if you’re already an Excel master and not trying to get a job in data science.
In the points below, Alyssa is specifically comparing Excel with doing data analysis in R, a popular programming language for data analysis. But most these advantages are also true of Python, the other popular coding language for working with data. We offer free introductory courses in both, and either would be a significant power upgrade over Excel.
1. More powerful data importing and manipulation
Unlike Excel, Alyssa says, “R can essentially read any type of data.” And data formats that it can’t read natively can still work: “There are also R packages specifically designed to read XML, JSON, SPSS, Excel, SAS, and STATA data files, and you can also scrape data from websites and execute SQL queries.”
Data manipulation – tasks like subsetting, merging, and recoding data – is also much easier in R. Anyone who’s spent a lot of time trying to merge and clean several large datasets in Excel for analysis can attest that it’s a difficult and time-consuming process. But R, and some popular packages like
plyr, make this process both simpler and faster.
2. Easier automation
The fact that Excel has a GUI (a user interface where you can click buttons rather than writing code) definitely makes it more approachable, but that can be a big hindrance when you’re trying to automate a process or run the same analysis multiple times.
Using a programming language can make this much faster. For example, if you needed to run the same analysis on a new set of sales data each week, doing this in Excel would require opening a different file manually each week and re-entering formulas and other elements needed for the analysis. But you could do that same analysis automatically in a language like R, writing a simple script that imports the new data and runs the same analysis each week, outputting the results in whatever format you’d like.
3. Easier working with lots of data
In Excel, projects are organized in sheets or tabs, and if you’ve ever dealt with Excel files that have tons of sheets or lots of data entries in each sheet, you know that it can get very slow very quickly. Working with enough data in Excel can sometimes even cause crashes. Programming languages like R, however, can handle large amounts of data much more quickly, and they can’t really crash in the same way that Excel can, so you don’t have to worry about losing your work.
4. More reproducible
Data analysis is most useful when you can explain what you’ve done to others, and others can easily reproduce your work to confirm it (or you can reproduce it yourself to double-check). But this is difficult in Excel; there’s no way to clearly document or illustrate the steps you took in your analysis, and re-doing it would entail re-opening the original Excel file and manually re-executing all of the steps you took (if you can remember them).
Reproducing results is much easier in a programming language like R. Re-running an analysis is as simple as pressing “Enter”, and it’s easy to add comments to your code that explain what’s happening at every step of the process, so that anyone can double-check your work.
5. Easier to find and fix errors
When you’ve made an error in Excel, figuring out what’s gone wrong can be difficult, since you might have to scroll through thousands of cells of data to find the answer, or attempt to manually re-trace your steps.
But when you make an error in a coding language like R, you’ll typically get an error message that explains what the computer thinks has gone wrong. And of course, you should also have comments explaining each line of your code, which makes it easier to go back and re-check each step looking for mistakes. Typically, programmers also use a system for version control, so if you experience an error you haven’t before, you’ll be able to compare your current code with its previous iteration to get a sense of what’s gone wrong.
This doesn’t mean that you’ll always be able to fix mistakes immediately. But “mistakes in data analysis are inevitable,” Alyssa says, “and it’s easier to find and fix those mistakes in R than in Excel.”
6. Open source accessibility
Excel is great, but it’s owned by Microsoft, which means you’re ultimately at the whims of the Washington-based company in terms of bugs, updates, and feature support. Unlike Excel, Alyssa says, “R is not a black box. You can examine R code for any function or computation you perform. You can even modify and improve key functions by changing the code.”
Python is also open source, and in the case of both languages, this also means that any developer (including you) can create packages to augment the language and add functionality or improve ease-of-use. Both languages have popular and widely-used packages and libraries that were created by third-party developers to make data analysis and visualization easier.
Excel does have some third-party add-ins, admittedly, but because it’s proprietary software, they’re not as powerful and it’s not as easy for you to add any functionality you might want or need.
7. Advanced statistics and machine learning capabilities
Both R and Python have more advanced statistical capabilities than Excel (and this is particularly true of R, which was designed with advanced statistical analyses in mind). Both languages also allow for the creation of machine learning models (often with the integration of machine learning packages and frameworks like
scikit-learn, and TensorFlow).
8. Advanced data visualization capabilities
Obviously, Excel can create a variety of charts, but programming languages can do more, and R in particular “has better, more advanced and state-of-the-art graphics capabilities with the
shiny packages,” Alyssa says.
The ability to create attractive and informative visualizations is particularly important in business contexts, since the people who make decisions at a company may not be familiar with statistical analysis or adept at reading complex charts. The easier you can make understanding your data, the more likely it is that your work will have a real impact.
9. Cross-platform stability
Scripts in programming languages like R and Python can be run on any platform without hiccups. You can be confident that your R script is going to work across Windows, Mac, and Linux machines, but the same isn’t always true of Excel files.
That’s not to say that you should entirely abandon Excel, of course! Alyssa pointed out that it does have some advantages over programming languages. Because of its graphical user interface, manual data entry is often easier in Excel, and if you just want to scan through a spreadsheet quickly, looking through an Excel file can be quicker and easier than doing that with code. Both Python and R do have ways to render sections of the data you’re working with in a visual table format so that it’s easier to see what you’re working with, but in Excel, this visual data presentation is the default rather than something you have to implement intentionally.
Still, it’s clear that if you’re doing a lot of data analysis, using a programming language offers some very substantial advantages over Excel. If you work with Excel a lot, learning even a little bit of R or Python programming could save you hours of time spent in Excel each week. If you’re only familiar with Excel, the idea of learning a programming language like R or Python can be intimidating.
But it’s not as difficult as you might think! At Dataquest, we believe that anyone can learn to code, and we offer free introductory classes in R and Python that you can take without any coding experience and without any math or engineering background.