Why mastering a 50-year-old programming language is the key to getting a data science job.
SQL is old. There, I said it.
I first heard about SQL in 1997. I was in high school, and as part of a computing class we were working with databases in Microsoft Access. The computers we used were outdated, and the class was boring. Even then, it seemed that SQL was ancient.
SQL dates back almost 50 years to 1970 when Edgar Codd, a computer scientist working for IBM, wrote a paper describing a new system for organizing data in databases. By the end of the decade, several prototypes of Codd's system had been built, and a query language — the Structured Query Language (SQL) — was born to interact with these databases.
So why should someone who wants to get a job in data spend time learning this 'ancient' language? Why not spend all your time mastering Python/R, or focusing on 'sexier' data skills, like Deep Learning, Scala, and Spark?
While knowing the fundamentals of a more general-purpose language like Python or R is critical, ignoring SQL will make it much harder to get a job in data. Here are three key reasons why:
1. SQL is everywhere
Almost all of the biggest names in tech use SQL. Uber, Netflix, Airbnb — the list goes on. Even within companies like Facebook, Google, and Amazon who have built their own high-performance database systems, data teams use SQL to query data and perform analysis.
But it's not just tech companies: Companies big and small use SQL. Data Scientist and former Dataquest student Vicknesh got his first job as a Data Analyst. He quickly found himself using SQL daily: "SQL is so pervasive, it permeates everything here. It’s like the SQL syntax persists through time and space. Everything uses SQL or a derivative of SQL."
2. SQL is in demand
If you want to get a job in data, your focus should be the skills that employers want. I analyzed 25,000 jobs advertised on Indeed, looking at key skills mentioned in job ads with 'data' in the title:
SQL was easily the most mentioned skill, being mentioned in 35.7% of ads– 1.39 times as many ads as Python, and over twice the number of ads as R. Of greater interest is what skills are required for people who want to get their first job in data. Most entry-level jobs in data are Data Analyst roles. If we look at jobs ads with 'data analyst' in the title, the numbers are even more conclusive:
For data analysts, SQL is mentioned in the majority of ads, over three times as often as Python and R. Learning SQL will not only make you more qualified for these jobs, it will set you apart from the other candidates.
3. SQL isn't going anywhere
SQL is more popular among data scientists and data engineers than Python or R. The fact that SQL is a language of choice is incredibly important.
Image: Stack Overflow Developer Survey 2017
Despite lots of hype around NOSQL, Hadoop and other technologies, SQL remains one of the most popular languages — being not only the second most popular language for data scientists/engineers in the 2017 Stack Overflow Developer Survey, but the second most popular lanuage amongst all developers for the last five years of the survey.
This gives aspiring data practicioners the confidence that they're not learning a dying language, but instead are learning the lingua franca of data.
So, what's the best way to learn SQL?
We now understand why we should learn SQL, the obvious question is 'how?'
There are literally thousands of SQL courses online, but most of them don't prepare you for using SQL in in the real world. The best way to illustrate this is to look at the queries they teach you to write:
The queries above demonstrate the complexity of the SQL taught at the end of SQL courses by three of the more popular online learning sites. The problem is that real-world SQL doesn't look like that. Real-world SQL looks like this:
When you're answering business questions with data, you often write SQL queries that need to combine data from lots of tables, wrangling it into its final form. The end result is students finding themselves unprepared to get the jobs they want, just like this recent post from a data science forum:
What we're doing about it
Here at Dataquest, we believe that SQL competency is the one of the key skills for anyone who wants to get a job in data. We're not suggesting you learn SQL instead of Python and/or R, but instead thoroughly learn SQL as your second language — becoming familiar with writing queries at a high level.
Over the last few months, we've undertaken to rewrite and extend our SQL curriculum to equip aspiring data analysts, data scientists, and data engineers for their new careers. We've already released four new SQL courses with more on the way. In our Data Analyst and Data Scientist paths:
and in our Data Engineering path:
Our interactive courses are written with goal of equipping our students with the skills they need at the level they'll need. You won't spend time watching videos — instead, you'll be writing your first queries in minutes, and be on your way to mastering the most important data skill.
While we start from zero, our courses go beyond the basics so you can become a SQL master. As an example, the 'real-life' SQL image above is taken from our SQL Intermediate course.
You can sign up and complete the first mission in each course for free, and we encourage you to try them out and let us know what you think.
We love SQL
I hope I've persuaded you that mastering SQL is key to starting your career in data. While it's easy to be distracted by the latest and greatest new language or framework, learning SQL will pay dividends on your path to break into the data industry.
It might just be the most important language you learn.