April 21, 2021

11 Reasons To Learn Bash (A.K.A. Command Line)

Bash — the command-line language for Unix-based operating systems — allows you to control your computer like a developer. But it's not just a skill for software devs — learning bash can be valuable for anyone who works with data.

What is Bash?

In short, Bash is the Unix command-line interface (CLI). You’ll also see it called the terminal, the command line, or the shell. It's a command language that allows us to work with files on our computers in a way that's far more efficient and powerful than using a GUI (graphical user interface).

Making the switch from graphical user interfaces (GUIs) to a command-line interface can feel overwhelming. And while Dataquest makes learning the command line very straightforward, you might be wondering: why should I bother?

Here are a few reasons why you should be learning bash and using the command line:

1. Bash Skills Are Popular, and Pay Handsomely

According to 2020’s Stack Overflow’s Developer Survey, bash/shell (i.e. the family of Linux command language interpreters) is the sixth most used language overall, ranking ahead of Python and R. It was also associated with higher salaries than either Python or R, according to the survey.

It also ranked high on the most-loved technologies list (53.7%), and lower on the most-dreaded technologies list (46%).

And while StackOverflow’s survey covers software developers and engineers of all sorts, the command line is of particular relevance for data scientists because Bash/Shell correlates heavily with Data Science technologies like Python, IPython/Jupyter, TensorFlow and PyTorch. This is also supported by the most recent Python Developers Survey conducted by Python Software Foundation.

2. Command Line Skills Help With Building Repeatable Data Processes

Part of a data scientist’s role is to make sure certain information is available regularly, often daily. Most of the time this data is acquired, processed and displayed in the same way.

The command line is well suited for this purpose because commands are easily automated and replicated.

Consider the following situation:

Your employer decides to invest in data analytics. Several data professionals will be joining the team. You are tasked with making sure that their machines have everything they need to get started.

If you can work with a CLI (command language interpreter), you can write a few scripts that will install, configure and test everything automatically.

If you can’t, you’ll have to resort to a GUI and make the same mouse and click movements, repeatedly, across several machines.

That’s just one example of how terminal skills can help make data science processes more scalable and repeatable.

3. Learning Bash Makes You More Flexible

In a data science role, you’ll often find you have more flexibility if you can use the terminal rather than having to rely on clicking through GUIs.

Since the command line is a program that runs other programs (hence the name “shell”), the interaction between programs is often easier to adjust in the command line.

Once you’ve mastered bash commands, it’s relatively easy to write scripts, and shell scripts make building all sorts of data pipelines and workflows much simpler.

More broadly, knowing how to use the shell gives you a second option for interacting with your computer.

You can always use the GUI when you prefer, but the command line can provide you with more direct power and control for those times when you need it.

4. Working With Text Files is Easier

Text files are among the most common ways methods to store and handle data. Almost any data science project is going to involve some work with text files. Being able to handle text files quickly and efficiently is thus a very useful skill for a data scientist.

The shell has very powerful text processing tools like AWK and sed, which help with getting acquainted with files and facilitate data cleaning.

For example, the code below uses AWK to print the first and third columns of a file named a_csv_file, where the second field’s value is Dataquest, using a comma as a field separator.

awk 'BEGIN {FS=","} {if ($2=="Dataquest") {print $1 $3} } a_csv_file'

All it takes is one line of code!

5. It’s Less Resource-Intensive

When you’re working with limited computing resources or simply want to maximize your speed, the using the command line is virtually always going to be better than using a GUI because using a GUI means resources must be dedicated to rendering the graphical output.

This is true both for working locally and remotely. When connecting remotely, GUIs consume much more bandwidth than terminals, wasting resources.

Moreover, latency, i.e. the “time interval between the stimulation and response”, will be higher when using a GUI, which can be particularly frustrating if you’re trying to control a mouse that’s a second or two behind your actual movements.

If you’re just typing in the command line, the latency is likely to be lower and it will also be easier to handle since you know precisely where your cursor is at any given time.

6. You Need Command Line Skills for the Cloud

Cloud services often are connected to and operated through a command line interface.

This is particularly important for more advanced data science work like deep learning, where your local computing resources are likely to be insufficient for the tasks you’d like to perform. To quote from this 2018 article by Nucleus Research:

In last year’s research, fewer than 10 percent of [deep learning] projects were being run on premise. That trend has accelerated, with only 4 percent of projects running on-premise in 2018.

According to the same article, “96 percent of deep learning today is running in the cloud.”

If you’re interested in learning advanced techniques like deep learning, command line skills will be necessary for moving your data to and from the cloud efficiently.

7. Unix Shell Skills Transfer Well to Other Shells

There are just a few popular shells (bash, zsh, fish, ksh, tcsh, cmd, Windows PowerShell, etc.) and they are more alike than they are different, making it easy to switch between them.

For example, the bash commands that you learn in our command line courses will work on Unix-based machines like Macs and Linux computers. But many of the exact same commands also work on Windows in the Command Prompt and/or Windows PowerShell.

This cross-compatibility is particularly useful when you’re using online services that require some kind of command-line interface. Even if their system doesn't use bash, it'll use something similar enough that you'll be able to figure it out.

GUIs, on the other hand, exist in infinite varieties, and learning one won’t necessarily help you learn any others.

8. You Can Probably Type Faster Than You Click

Research shows that mouse use plateaus rather quickly, while keyboard use, despite its steep learning curve, can be more efficient.

251 experienced users of Microsoft Word were given a questionnaire assessing their choice of methods for the most frequently occurring commands. Contrary to our expectations, most experienced users rarely used the efficient keyboard shortcuts, favoring the use of icon toolbars instead.

A second study was done to verify that keyboard shortcuts are, indeed, the most efficient method. Six participants performed common commands using menu selection, icon toolbars, and keyboard shortcuts. The keyboard shortcuts were, as expected, the most efficient.

In other words: even if you feel like you’re working quickly through a GUI, there’s a good chance that at least for some tasks, you’ll be more efficient in the command line.

9. Auditing and Debugging is Easier

Because it is so easy to track all of your activity on the command line, auditing and debugging is much easier.

You can easily look through the log to track every single action you took in the shell, whereas if a misclick leads to a mistake when you’re working with a GUI, there’s likely to be no record of it.

10. The Unix Shell is Available Everywhere

Although it’s only built-in on Mac and Linux machines, Windows users can still join in on the fun with tools like WSLCygwin and MinGW. (And, as previously mentioned, many of the bash commands you'll learn work in Windows' native sell options like the command prompt anyway).

That means that the command line skills you learn in these courses will be usable on virtually every computer you encounter (including your personal machine, no matter what operating system you use).

11. The Command Line is Simpler Than You Think

There is a misconception that using the command line requires you to know several hundred commands. In fact, although there are hundreds of commands available for use, you’re likely to need just a tiny percentage of these commands to do most of common data science tasks.

Still not convinced? We’ll leave you with a quote from the great (and free!) book The Linux Command Line:

When I am asked to explain the difference between Windows and Linux, I often use a toy analogy.

Windows is like a Game Boy. You go to the store and buy one all shiny new in the box. You take it home, turn it on, and play with it. Pretty graphics, cute sounds. After a while, though, you get tired of the game that came with it, so you go back to the store and buy another one. This cycle repeats over and over.

Finally, you go back to the store and say to the person behind the counter, “I want a game that does this!” only to be told that no such game exists because there is no “market demand” for it. Then you say, “But I only need to change this one thing!” The person behind the counter says you can’t change it. The games are all sealed up in their cartridges. You discover that your toy is limited to the games others have decided that you need.

Linux, on the other hand, is like the world’s largest Erector Set. You open it, and it’s just a huge collection of parts. There’s a lot of steel struts, screws, nuts, gears, pulleys, motors, and a few suggestions on what to build. So, you start to play with it. You build one of the suggestions and then another.

After a while you discover that you have your own ideas of what to make. You don’t ever have to go back to the store, as you already have everything you need. The Erector Set takes on the shape of your imagination. It does what you want. Your choice of toys is, of course, a personal thing, so which toy would you find more satisfying?

Ready to Learn the Command Line?

Now you understand why it's valuable to learn the bash command line interface, how can you actually do it?

The easiest way is to learn right in your browser with Dataquest's guided, interactive command-line courses. You'll learn all the commands you'll need to work efficiently, writing real bash commands within a few minutes of signing up.

Best of all? All three of our courses are free to try!

What Will You Learn?

In these two command line courses, you’ll learn to use the Unix terminal interface that’s built-in on Mac and Linux machines. Don’t worry, we’ll also provide Windows users with the tools required to take full advantage of the content.

In the first course, you’ll learn what the command line interface is, why it is important in the data science workflow, and how you can navigate and manage your computer by giving it instructions called commands. You’ll also learn about wildcards and how to use them together with commands like lsmvcpmkdir and many more for faster searches and workflows.

The second course is focused on basic text processing in the shell, using commands like headcatcut and grep. It covers how you can combine these commands to create powerful chains of commands from simpler building blocks. You’ll also learn about multi-user systems and the power of output redirection.

And as is the case for all Dataquest courses, these new command line courses use an interactive command line environment and answer-checking to allow you to apply and check everything you’re learning from directly within your browser.

Bruno Cunha

About the author

Bruno Cunha

Bruno is currently a content author and Data Scientist at Dataquest teaching data science to thousands of students.