11 Reasons Why You Should Learn the Command Line

October 7, 2020
why learn command line data science

In a field as tightly connected to computer science as data science is, being able to control your computer like a developer can be a very valuable asset, even for a data analyst or data scientist. The Unix command-line interface (CLI; you’ll also see it called the terminal or bash, shell, etc.) allows us to do that and much more.

Making the switch from graphical user interfaces (GUIs) to a CLI can feel overwhelming, but we’re here to help you!

To give you a jump-start, here are a few reasons why you should be learning the command line.

1. Command Line Skills Are Popular, and Pay Handsomely

According to 2019’s Stack Overflow’s Developer Survey, bash/shell (i.e. the family of Linux command language interpreters) is the sixth most used language overall, ranking ahead of Python and R. It was also associated with higher salaries than either Python or R, according to the survey.

It also ranked high on the most-loved technologies list (59.5%), and lower on the most-dreaded technologies list (40%).

And while StackOverflow’s survey covers software developers and engineers of all sorts, the command line is of particular relevance for data scientists because Bash/Shell correlates heavily with Data Science technologies like Python, IPython/Jupyter, TensorFlow and PyTorch. This is also supported by the most recent Python Developers Survey conducted by Python Software Foundation.

2. Command Line Skills Help With Building Repeatable Data Processes

Part of a data scientist’s role is to make sure certain information is available regularly, often daily. Most of the time this data is acquired, processed and displayed in the same way.

The command line is well suited for this purpose because commands are easily automated and replicated.

Consider the following situation:

Your employer decides to invest in data analytics. Several data professionals will be joining the team. You are tasked with making sure that their machines have everything they need to get started.

If you can work with a CLI (command language interpreter), you can write a few scripts that will install, configure and test everything automatically.

If you can’t, you’ll have to resort to a GUI and make the same mouse and click movements, repeatedly, across several machines.

That’s just one example of how terminal skills can help make data science processes more scalable and repeatable.

3. Command Line Skills Make You More Flexible

In a data science role, you’ll often find you have more flexibility if you can use the terminal rather than having to rely on clicking through GUIs.

Since the command line is a program that runs other programs (hence the name “shell”), the interaction between programs is often easier to adjust in the command line.

Once you’ve mastered command line commands, it’s relatively easy to write scripts, and shell scripts make building all sorts of data pipelines and workflows much simpler.

More broadly, knowing how to use the shell gives you a second option for interacting with your computer.

You can always use the GUI when you prefer, but the command line can provide you with more direct power and control for those times when you need it.

4. Working With Text Files is Easier

Text files are among the most common ways methods to store and handle data, and almost any data science project is going to involve some work with text files. Being able to handle text files quickly and efficiently is thus a very useful skill for a data scientist.

The shell has very powerful text processing tools like AWK and sed, which help with getting acquainted with files and facilitate data cleaning.

For example, the code below uses AWK to print the first and third columns of a file named a_csv_file, where the second field’s value is Dataquest, using a comma as a field separator.

awk 'BEGIN {FS=","} {if ($2=="Dataquest") {print $1 $3} } a_csv_file'

All it takes is one line of code!

5. It’s Less Resource-Intensive

When you’re working with limited computing resources or simply want to maximize your speed, the using the command line is virtually always going to be better than using a GUI because using a GUI means resources must be dedicated to rendering the graphical output.

This is true both for working locally and remotely. When connecting remotely, GUIs consume much more bandwidth than terminals, wasting resources.

Moreover, latency, i.e. the “time interval between the stimulation and response”, will be higher when using a GUI, which can be particularly frustrating if you’re trying to control a mouse that’s a second or two behind your actual movements.

If you’re just typing in the command line, the latency is likely to be lower and it will also be easier to handle since you know precisely where your cursor is at any given time.

6. You Need Command Line Skills for the Cloud

Cloud services often are connected to and operated through a command line interface.

This is particularly important for more advanced data science work like deep learning, where your local computing resources are likely to be insufficient for the tasks you’d like to perform. To quote from this 2018 article by Nucleus Research:

In last year’s research, fewer than 10 percent of [deep learning] projects were being run on premise. That trend has accelerated, with only 4 percent of projects running on-premise in 2018.

According to the same article, “96 percent of deep learning today is running in the cloud.”

If you’re interested in learning advanced techniques like deep learning, command line skills will be necessary for moving your data to and from the cloud efficiently.

7. Unix Shell Skills Transfer Well to Other Shells

There are just a few popular shells (bash, zsh, fish, ksh, tcsh, cmd, Windows PowerShell, etc.) and they are more alike than they are different, making it easy to switch between them.

This is particularly useful when you’re using online services that require some kind of CLI. On the other hand, GUIs are endless, and learning one won’t necessarily help you learn any others.

8. You Can Probably Type Faster Than You Click

Research shows that mouse use plateaus rather quickly, while keyboard use, despite its steep learning curve, can be more efficient.

251 experienced users of Microsoft Word were given a questionnaire assessing their choice of methods for the most frequently occurring commands. Contrary to our expectations, most experienced users rarely used the efficient keyboard shortcuts, favoring the use of icon toolbars instead.

A second study was done to verify that keyboard shortcuts are, indeed, the most efficient method. Six participants performed common commands using menu selection, icon toolbars, and keyboard shortcuts. The keyboard shortcuts were, as expected, the most efficient.

In other words: even if you feel like you’re working quickly through a GUI, there’s a good chance that at least for some tasks, you’ll be more efficient in the command line.

9. Auditing and Debugging is Easier

Because it is so easy to track all of your activity on the command line, auditing and debugging is much easier.

You can easily look through the log to track every single action you took in the shell, whereas if a misclick leads to a mistake when you’re working with a GUI, there’s likely to be no record of it.

10. The Unix Shell is Available Everywhere

Although it’s only built-in on Mac and Linux machines, Windows users can still join in on the fun with tools like WSLCygwin and MinGW.

That means that the command line skills you learn in these courses will be usable on virtually every computer you encounter (including your personal machine, no matter what operating system you use).

11. The Command Line is Simpler Than You Think

There is a misconception that using the command line requires you to know several hundred commands. In fact, although there are hundreds of commands available for use, you’re likely to need just a tiny percentage of these commands to do most of common data science tasks.

Still not convinced? We’ll leave you with a quote from the great (and free!) book The Linux Command Line:

When I am asked to explain the difference between Windows and Linux, I often use a toy analogy.

Windows is like a Game Boy. You go to the store and buy one all shiny new in the box. You take it home, turn it on, and play with it. Pretty graphics, cute sounds. After a while, though, you get tired of the game that came with it, so you go back to the store and buy another one. This cycle repeats over and over. 

Finally, you go back to the store and say to the person behind the counter, “I want a game that does this!” only to be told that no such game exists because there is no “market demand” for it. Then you say, “But I only need to change this one thing!” The person behind the counter says you can’t change it. The games are all sealed up in their cartridges. You discover that your toy is limited to the games others have decided that you need.

Linux, on the other hand, is like the world’s largest Erector Set. You open it, and it’s just a huge collection of parts. There’s a lot of steel struts, screws, nuts, gears, pulleys, motors, and a few suggestions on what to build. So, you start to play with it. You build one of the suggestions and then another. 

After a while you discover that you have your own ideas of what to make. You don’t ever have to go back to the store, as you already have everything you need. The Erector Set takes on the shape of your imagination. It does what you want. Your choice of toys is, of course, a personal thing, so which toy would you find more satisfying?

Want to Learn the Command Line?

Now that we’ve convinced you to learn the command line, click here to check out our introductory command line course.It's one of two recently-reworked courses that dall into our Data Analyst in Python and Data Scientist in Python paths:

No prerequisite knowledge is required to take these courses.

What Will You Learn?

In these two command line courses, you’ll learn to use the Unix terminal interface that’s built-in on Mac and Linux machines. Don’t worry, we’ll also provide Windows users with the tools required to take full advantage of the content.

In the first course, you’ll learn what the command line interface is, why it is important in the data science workflow, and how you can navigate and manage your computer by giving it instructions called commands. You’ll also learn about wildcards and how to use them together with commands like lsmvcpmkdir and many more for faster searches and workflows.

The second course is focused on basic text processing in the shell, using commands like headcatcut and grep. It covers how you can combine these commands to create powerful chains of commands from simpler building blocks. You’ll also learn about multi-user systems and the power of output redirection.

And as is the case for all Dataquest courses, these new command line courses use an interactive command line environment and answer-checking to allow you to apply and check everything you’re learning from directly within your browser.


announcements, bash, cli, Command line, course launches, launches, shell, terminal

You may also like

Get started with Dataquest today - for free!

Or, visit our pricing page to learn about our Basic and Premium plans.