SQL Fundamentals Tutorial: Start Learning SQL Today!
Whatever you’re hoping to do with data, having SQL skills is likely to be key.
That’s because despite the fact that SQL is quite old, some version of it is used in the databases of just about every company on earth. Whether you’re interested in doing data science or simply accessing and analyzing a little bit of your company’s data, chances are SQL fundamentals are going to be required.
What is SQL?
We’ll start this SQL fundamentals tutorial with a very fundamental question: what is SQL, anyway? SQL (often pronounced “sequel”) is an abbreviation for Structured Query Language. It’s a kind of programming language that’s designed specifically for use querying and updating databases.
This is useful because it’s not always practical or advisable to store data in other formats like spreadsheets or CSVs. Imagine, for example, a company with terabytes of data that’s being updated regularly by people in different departments across the company. Trying to manage this with spreadsheets would be a logistical and security nightmare! Databases allow for the secure storage of huge amounts of data, and SQL is the tool we use to access and make changes to that data without needing to store it all in a file on our local machine.
In SQL, we express our requests to the database as “queries”. For example, we might send a query — an instruction — to a database to return a specific subset of data, like a particular table, or to update a particular value in the database.
Let’s imagine we want to get some data on salaries from our company’s database. We certainly don’t want to download all of the company’s data, so instead we’ll write a SQL query that gives us only the specific data we need. If we assume there’s a table in the company database called, salaries
, that query would probably look something like this:
SELECT * FROM salaries
The *
character here means “everything” — select every column from the salaries
table. With this single short query, we’ll be able to get just the table we need from our company database.
But SQL can do a lot more than query a full table. It can help us efficiently merge, filter, update, and analyze data to answer all sorts of questions. So let’s dive in and start learning how to write SQL queries!
Note: for many students, it may be easier learn SQL interactively and write queries in your browser rather than following along with a tutorial. You can do that for free in our interactive SQL Fundamentals course:
Learn SQL by doing
Go hands-on with SQL right now, using our interactive learning platform.
- Write real queries
- Work with real data
- Right in your browser!
(Start free)
SQL Fundamentals Tutorial: Introduction
In this tutorial, we’ll explore data from the American Community Survey on job outcome statistics based on college majors. While the original CSV version can be found on FiveThirtyEight’s Github, we’ll be using a slightly modified version of the data that’s stored as a database. (Remember, SQL is used for querying data stored in databases).
Specifically, we’ll be working with a of the data that contains the 2010-2012 data for recent college grads only. This data is stored as a SQLite database (SQLite is one of the most popular SQL-based database management systems).
If you’re familiar with how to query SQLite databases, you can download the database file here. If you’re not comfortable with getting a SQLite database set up and querying it on your local machine, it may be easier to work through our free SQL fundamentals course, which is interactive and allows you to write real SQL queries of this database using only your browser.
Previewing A Table Using SELECT
Whenever we’re working with a new data set, it’s helpful to start by previewing what we’re actually working with. For this tutorial, we’ll be working with the database file jobs.db
, which contains a single table named recent_grads
:
Here’s a visual representation of that structure:
Let’s start by displaying only the first five rows of our table. This will give us a good idea of what we’re working with, and limiting the query to five rows will ensure it’s quick and we don’t get overwhelmed with a huge table.
Like other programming languages, code in SQL has to adhere to a defined structure and vocabulary. To specify that we want to return the first 5 rows from recent_grads
, we need to run the following SQL query:
SELECT * FROM recent_grads LIMIT 5
In this query, we specify:
- the columns we wanted using
SELECT *
- the table we wanted to query using
FROM recent_grads
- the number of rows we wanted using
LIMIT 5
Here’s a visual breakdown of the different components of the query:
Now, let’s see what this query returns! Here’s what we’ll get when this query is run:
index | Rank | Major_code | Major | Major_category | Total | Sample_size | Men | Women | ShareWomen | Employed | Full_time | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2419 | PETROLEUM ENGINEERING | Engineering | 2339 | 36 | 2057 | 282 | 0.120564 | 1976 | 1849 | 270 | 1207 | 37 | 0.018381 | 110000 | 95000 | 125000 | 1534 | 364 | 193 |
1 | 2 | 2416 | MINING AND MINERAL ENGINEERING | Engineering | 756 | 7 | 679 | 77 | 0.101852 | 640 | 556 | 170 | 388 | 85 | 0.117241 | 75000 | 55000 | 90000 | 350 | 257 | 50 |
2 | 3 | 2415 | METALLURGICAL ENGINEERING | Engineering | 856 | 3 | 725 | 131 | 0.153037 | 648 | 558 | 133 | 340 | 16 | 0.024096 | 73000 | 50000 | 105000 | 456 | 176 | 0 |
3 | 4 | 2417 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | Engineering | 1258 | 16 | 1123 | 135 | 0.107313 | 758 | 1069 | 150 | 692 | 40 | 0.050125 | 70000 | 43000 | 80000 | 529 | 102 | 0 |
4 | 5 | 2405 | CHEMICAL ENGINEERING | Engineering | 32260 | 289 | 21239 | 11021 | 0.341631 | 25694 | 23170 | 5180 | 16697 | 1672 | 0.061098 | 65000 | 50000 | 75000 | 18314 | 4440 | 972 |
As we can see, our query returned exactly what we wanted: every column of our selected table, but only the first five rows. If we changed the end of the query to read LIMIT 10
, it would return the first ten rows instead. If we remove the LIMIT
command altogether, it would return every row in the table.
A Quick Note About Capitalization
Before we move on, you may have noticed that we’re typing some things, like SELECT
, in all capital letters, while other things like the table name are written in lower case.
Depending on the specific database you’re querying and what sort of database management system it uses, capitalization of table names may or may not matter. However, to ensure that your queries will work and be easy to read with any system, it’s best to stick to these conventions:
- Statements or commands (like
SELECT
,LIMIT
, etc.) should be written in all-caps. - Table and column names should be written however they are stored in the database you’re querying (so if the database stores them in all lower case, they should be all lower case in your queries, too).
Now, on to our next statement: WHERE
!
Filtering Rows Using WHERE
If we take a look at the details of this data set, we can get a better idea of what data is here and what types of questions we might be able to answer. Spend some time getting familiar with what each column represents.
Based on an understanding of what each column represents, here are some questions we may have:
- Which majors had mostly female students? Which ones had mostly male students?
- Which majors had the largest spread (difference) between the 25th and 75th percentile starting salaries?
- Which engineering majors had the highest full time employment rates?
Let’s start by focusing on the first question. We need to translate this question into SQL so that we can get the answer we’re looking for from the database.
To determine which majors had mostly female students, we want the following subset of data:
- only the
Major
column - only the rows where
ShareWomen
is greater than0.5
(corresponding to 50%)
To return only the Major
column, we need to add the specific column name to the SELECT
statement part of the query (instead of using the *
operator to return all columns):
SELECT Major FROM recent_grads
This will return all of the values in the Major
column. We can specify multiple columns this way as well and the results table will preserve the order of the columns:
SELECT Major, Major_category FROM recent_grads
To return only the values where ShareWomen
is greater than or equal to 0.5
, we need to add a WHERE
clause:
SELECT Major FROM recent_gradsWHERE ShareWomen >= 0.5
Finally, we can limit the number of rows returned using LIMIT
:
SELECT Major FROM recent_gradsWHERE ShareWomen >= 0.5 LIMIT 5
If we run that query, here’s what we’ll see:
Major |
---|
ACTUARIAL SCIENCE |
COMPUTER SCIENCE |
ENVIRONMENTAL ENGINEERING |
NURSING |
INDUSTRIAL PRODUCTION TECHNOLOGIES |
Here’s a breakdown of the different components of our query:
While in the SELECT
part of the query, we express the specific column we want. In the WHERE
part we express the specific rows we want. The beauty of SQL is that these can be independent.
Let’s write a SQL query that returns the majors where females were a minority. We’ll only return the Major
and ShareWomen
columns (in that order) and, we won’t limit the number of rows returned:
SELECT Major, ShareWomen FROM recent_grads WHERE ShareWomen < 0.5
Major | ShareWomen |
---|---|
PETROLEUM ENGINEERING | 0.120564 |
MINING AND MINERAL ENGINEERING | 0.101852 |
METALLURGICAL ENGINEERING | 0.153037 |
NAVAL ARCHITECTURE AND MARINE ENGINEERING | 0.107313 |
CHEMICAL ENGINEERING | 0.341631 |
NUCLEAR ENGINEERING | 0.144967 |
ASTRONOMY AND ASTROPHYSICS | 0.441356 |
MECHANICAL ENGINEERING | 0.139793 |
ELECTRICAL ENGINEERING | 0.437847 |
COMPUTER ENGINEERING | 0.199413 |
AEROSPACE ENGINEERING | 0.196450 |
BIOMEDICAL ENGINEERING | 0.119559 |
MATERIALS SCIENCE | 0.310820 |
ENGINEERING MECHANICS PHYSICS AND SCIENCE | 0.183985 |
BIOLOGICAL ENGINEERING | 0.320784 |
INDUSTRIAL AND MANUFACTURING ENGINEERING | 0.343473 |
GENERAL ENGINEERING | 0.252960 |
ARCHITECTURAL ENGINEERING | 0.350442 |
COURT REPORTING | 0.236063 |
FOOD SCIENCE | 0.222695 |
ELECTRICAL ENGINEERING TECHNOLOGY | 0.325092 |
MATERIALS ENGINEERING AND MATERIALS SCIENCE | 0.292607 |
MANAGEMENT INFORMATION SYSTEMS AND STATISTICS | 0.278790 |
CIVIL ENGINEERING | 0.227118 |
CONSTRUCTION SERVICES | 0.342229 |
OPERATIONS LOGISTICS AND E-COMMERCE | 0.322222 |
MISCELLANEOUS ENGINEERING | 0.189970 |
PUBLIC POLICY | 0.251389 |
ENGINEERING TECHNOLOGIES | 0.090713 |
MISCELLANEOUS FINE ARTS | 0.410180 |
GEOLOGICAL AND GEOPHYSICAL ENGINEERING | 0.324838 |
FINANCE | 0.355469 |
ECONOMICS | 0.340825 |
BUSINESS ECONOMICS | 0.249190 |
NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL … | 0.430537 |
ACCOUNTING | 0.253583 |
MATHEMATICS | 0.244103 |
PHYSICS | 0.448099 |
MEDICAL TECHNOLOGIES TECHNICIANS | 0.434298 |
STATISTICS AND DECISION SCIENCE | 0.281936 |
ENGINEERING AND INDUSTRIAL MANAGEMENT | 0.174123 |
MEDICAL ASSISTING SERVICES | 0.178982 |
COMPUTER PROGRAMMING AND DATA PROCESSING | 0.269194 |
GENERAL BUSINESS | 0.417925 |
ARCHITECTURE | 0.321770 |
INTERNATIONAL BUSINESS | 0.282903 |
PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTR… | 0.451465 |
MOLECULAR BIOLOGY | 0.077453 |
MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION | 0.200023 |
MISCELLANEOUS ENGINEERING TECHNOLOGIES | 0.000000 |
MECHANICAL ENGINEERING RELATED TECHNOLOGIES | 0.377437 |
INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY | 0.436302 |
PHYSICAL SCIENCES | 0.426924 |
MILITARY TECHNOLOGIES | 0.429685 |
ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLO… | 0.232444 |
MARKETING AND MARKETING RESEARCH | 0.382900 |
POLITICAL SCIENCE AND GOVERNMENT | 0.485930 |
GEOGRAPHY | 0.473190 |
COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY | 0.180883 |
COMPUTER NETWORKING AND TELECOMMUNICATIONS | 0.305005 |
GEOLOGY AND EARTH SCIENCE | 0.470197 |
PUBLIC ADMINISTRATION | 0.476461 |
COMMUNICATIONS | 0.305109 |
CRIMINAL JUSTICE AND FIRE PROTECTION | 0.125035 |
COMMERCIAL ART AND GRAPHIC DESIGN | 0.374356 |
SPECIAL NEEDS EDUCATION | 0.366177 |
TRANSPORTATION SCIENCES AND TECHNOLOGIES | 0.321296 |
NEUROSCIENCE | 0.475010 |
MULTI/INTERDISCIPLINARY STUDIES | 0.495397 |
ATMOSPHERIC SCIENCES AND METEOROLOGY | 0.124950 |
EDUCATIONAL ADMINISTRATION AND SUPERVISION | 0.448732 |
PHILOSOPHY AND RELIGIOUS STUDIES | 0.416810 |
ENGLISH LANGUAGE AND LITERATURE | 0.339671 |
SCIENCE AND COMPUTER TEACHER EDUCATION | 0.423209 |
MUSIC | 0.444582 |
COSMETOLOGY SERVICES AND CULINARY ARTS | 0.383719 |
Expressing Multiple Filter Criteria Using AND
To filter rows by specific criteria, we need to use the WHERE
statement. A simple WHERE
statement requires three things:
- The column we want the database to filter on:
ShareWomen
- A comparison operator that specifies how we want to compare a value in a column:
>
- The value we want the database to compare each value to:
0.5
Here are the comparison operators we can use:
- Less than:
<
- Less than or equal to:
<=
- Greater than:
>
- Greater than or equal to:
>=
- Equal to:
=
- Not equal to:
!=
The comparison value after the operator must be either text or a number, depending on the field. Because ShareWomen
is a numeric column, we don’t need to enclose the number 0.5
in quotes.
Finally, most database systems require that the SELECT
and FROM
statements come first, before WHERE
or any other statements. We can use the AND
operator to combine multiple filter criteria. For example, to determine which engineering majors had majority female, we’d need to specify 2 filtering criteria.
SELECT Major FROM recent_gradsWHERE Major_category = 'Engineering' AND ShareWomen > 0.5
Major |
---|
ENVIRONMENTAL ENGINEERING |
INDUSTRIAL PRODUCTION TECHNOLOGIES |
It looks like only 2 majors met this criteria. If we wanted to “zoom” back out to look at all of the columns for both of these majors to see if they shared some other common attributes, we can modify the SELECT
statement and use the symbol *
to represent all columns
SELECT * FROM recent_gradsWHERE Major_category = 'Engineering' AND ShareWomen > 0.5
index | Rank | Major_code | Major | Major_category | Total | Sample_size | Men | Women | ShareWomen | Employed | Full_time | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30 | 31 | 2410 | ENVIRONMENTAL ENGINEERING | Engineering | 4047 | 26 | 2639 | 3339 | 0.558548 | 2983 | 2384 | 930 | 1951 | 308 | 0.093589 | 50000 | 42000 | 56000 | 2028 | 830 | 260 |
38 | 39 | 2503 | INDUSTRIAL PRODUCTION TECHNOLOGIES | Engineering | 4631 | 73 | 528 | 1588 | 0.750473 | 4428 | 3988 | 597 | 3242 | 129 | 0.028308 | 46000 | 35000 | 65000 | 1394 | 2454 | 480 |
The ability to quickly iterate on queries as you think of new questions is one of the most powerful aspects of SQL. The SQL workflow lets data professionals focus on asking and answering questions, instead of worrying about lower-level programming concepts.
Now, let’s write a SQL query that returns all majors with a majority of female students and that had a median salary greater than 50000
. Let’s also limit the columns displayed in the results like so:
SELECT Major, Major_category, Median, ShareWomen FROM recent_grads WHERE ShareWomen > 0.5 AND Median > 50000
Major | Major_category | Median | ShareWomen |
---|---|---|---|
ACTUARIAL SCIENCE | Business | 62000 | 0.535714 |
COMPUTER SCIENCE | Computers & Mathematics | 53000 | 0.578766 |
It looks like only two majors with mostly female students also had a median salary higher than $50,000.
Returning One of Several Conditions With OR
We used the AND
operator to specify that our filter needs to fit two Boolean conditions. Both of the conditions had to evaluate to True
for the record to appear in the results.
If we wanted to specify a filter that meets either of the conditions instead, we could use the OR
operator.
SELECT [column1, column2,...] FROM [table1]WHERE [condition1] OR [condition2]
Let’s write a SQL query that returns the first 20 majors that either have a Median
salary greater than or equal to 10,000
, or have less than or equal to 1,000
Unemployed
people. Let’s only include the following columns in the results and in this order:
Major
Median
Unemployed
SELECT Major, Median, Unemployed FROM recent_grads WHERE Median >= 10000 OR Unemployed <= 1000 LIMIT 20
Major | Median | Unemployed |
---|---|---|
PETROLEUM ENGINEERING | 110000 | 37 |
MINING AND MINERAL ENGINEERING | 75000 | 85 |
METALLURGICAL ENGINEERING | 73000 | 16 |
NAVAL ARCHITECTURE AND MARINE ENGINEERING | 70000 | 40 |
CHEMICAL ENGINEERING | 65000 | 1672 |
NUCLEAR ENGINEERING | 65000 | 400 |
ACTUARIAL SCIENCE | 62000 | 308 |
ASTRONOMY AND ASTROPHYSICS | 62000 | 33 |
MECHANICAL ENGINEERING | 60000 | 4650 |
ELECTRICAL ENGINEERING | 60000 | 3895 |
COMPUTER ENGINEERING | 60000 | 2275 |
AEROSPACE ENGINEERING | 60000 | 794 |
BIOMEDICAL ENGINEERING | 60000 | 1019 |
MATERIALS SCIENCE | 60000 | 78 |
ENGINEERING MECHANICS PHYSICS AND SCIENCE | 58000 | 23 |
BIOLOGICAL ENGINEERING | 57100 | 589 |
INDUSTRIAL AND MANUFACTURING ENGINEERING | 57000 | 699 |
GENERAL ENGINEERING | 56000 | 2859 |
ARCHITECTURAL ENGINEERING | 54000 | 170 |
COURT REPORTING | 54000 | 11 |
Now, let’s start getting even more advanced with our SQL queries.
Grouping Operators With Parentheses
There’s a certain class of questions that we can’t answer using only the techniques we’ve learned so far. For example, if we wanted to write a query that returned all Engineering
majors that either had mostly female graduates or an unemployment rate below 5.1%, we would need to use parentheses to express this more complex logic.
The three raw conditions we’ll need are:
Major_category = 'Engineering'ShareWomen >= 0.5Unemployment_rate < 0.051
What the SQL query looks like using parantheses:
SELECT Major, Major_category, ShareWomen, Unemployment_rate FROM recent_grads
WHERE (Major_category = 'Engineering') AND (ShareWomen > 0.5 OR Unemployment_rate < 0.051);
Notice that we enclosed the logic we wanted to be evaluated together in parentheses. This is very similar to how we group mathematical calculations together using parentheses to define which are evaluated first. The parentheses makes it explictly clear to the database that we want all of the rows where both of the expressions in the statements evaluate to True
:
(Major_category = 'Engineering' AND ShareWomen > 0.5) -> True or False(ShareWomen > 0.5 OR Unemployment_rate < 0.051) -> True or False
If we had written the where
statement without any parentheses, the database would guess what our intentions are, and actually execute the following query instead:
WHERE (Major_category = 'Engineering' AND ShareWomen > 0.5) OR (Unemployment_rate < 0.051)
Leaving the parentheses out implies that we want the calculation to happen from left to right, in the order in which the logic is written, so it wouldn’t return us the data we want.
Now let’s run our intended query and see the results! We’ll run the query we explored above, which returns all Engineering
majors that either had mostly women graduates or had an unemployment rate below 5.1%, which was the nationwide unemployment rate in August 2015.
Let’s only include the following columns in the results and in this order:
Major
Major_category
ShareWomen
Unemployment_rate
Here’s our query:
SELECT Major, Major_category, ShareWomen, Unemployment_rateFROM recent_gradsWHERE (Major_category = 'Engineering') AND (ShareWomen > 0.5 OR Unemployment_rate < 0.051)
Major | Major_category | ShareWomen | Unemployment_rate |
---|---|---|---|
PETROLEUM ENGINEERING | Engineering | 0.120564 | 0.018381 |
METALLURGICAL ENGINEERING | Engineering | 0.153037 | 0.024096 |
NAVAL ARCHITECTURE AND MARINE ENGINEERING | Engineering | 0.107313 | 0.050125 |
MATERIALS SCIENCE | Engineering | 0.310820 | 0.023043 |
ENGINEERING MECHANICS PHYSICS AND SCIENCE | Engineering | 0.183985 | 0.006334 |
INDUSTRIAL AND MANUFACTURING ENGINEERING | Engineering | 0.343473 | 0.042876 |
MATERIALS ENGINEERING AND MATERIALS SCIENCE | Engineering | 0.292607 | 0.027789 |
ENVIRONMENTAL ENGINEERING | Engineering | 0.558548 | 0.093589 |
INDUSTRIAL PRODUCTION TECHNOLOGIES | Engineering | 0.750473 | 0.028308 |
ENGINEERING AND INDUSTRIAL MANAGEMENT | Engineering | 0.174123 | 0.033652 |
That looks great, but what if we want to sort the results based on the values of a specific column to make them more readable?
Ordering Results Using ORDER BY
The results of every query we’ve written so far have been ordered by the Rank
column. Recall the query from earlier in this post that returned all of the columns and you’ll see the Rank column:
SELECT * FROM recent_grads LIMIT 5
index | Rank | Major_code | Major | Major_category | Total | Sample_size | Men | Women | ShareWomen | Employed | Full_time | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2419 | PETROLEUM ENGINEERING | Engineering | 2339 | 36 | 2057 | 282 | 0.120564 | 1976 | 1849 | 270 | 1207 | 37 | 0.018381 | 110000 | 95000 | 125000 | 1534 | 364 | 193 |
1 | 2 | 2416 | MINING AND MINERAL ENGINEERING | Engineering | 756 | 7 | 679 | 77 | 0.101852 | 640 | 556 | 170 | 388 | 85 | 0.117241 | 75000 | 55000 | 90000 | 350 | 257 | 50 |
2 | 3 | 2415 | METALLURGICAL ENGINEERING | Engineering | 856 | 3 | 725 | 131 | 0.153037 | 648 | 558 | 133 | 340 | 16 | 0.024096 | 73000 | 50000 | 105000 | 456 | 176 | 0 |
3 | 4 | 2417 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | Engineering | 1258 | 16 | 1123 | 135 | 0.107313 | 758 | 1069 | 150 | 692 | 40 | 0.050125 | 70000 | 43000 | 80000 | 529 | 102 | 0 |
4 | 5 | 2405 | CHEMICAL ENGINEERING | Engineering | 32260 | 289 | 21239 | 11021 | 0.341631 | 25694 | 23170 | 5180 | 16697 | 1672 | 0.061098 | 65000 | 50000 | 75000 | 18314 | 4440 | 972 |
As the questions we want to answer get more complex, we want more control over how the results are ordered. We can specify the order using the ORDER BY clause.
For example, we may want to understand which majors that met the criteria in the WHERE
statement had the lowest unemployment rate. The following query will return the results in ascending order by the Unemployment_rate
column.
SELECT Rank, Major, Major_category, ShareWomen, Unemployment_rate
FROM recent_grads
WHERE (Major_category = 'Engineering') AND (ShareWomen > 0.5 OR Unemployment_rate < 0.051)
ORDER BY Unemployment_rate
Rank | Major | Major_category | ShareWomen | Unemployment_rate |
---|---|---|---|---|
15 | ENGINEERING MECHANICS PHYSICS AND SCIENCE | Engineering | 0.183985 | 0.006334 |
1 | PETROLEUM ENGINEERING | Engineering | 0.120564 | 0.018381 |
14 | MATERIALS SCIENCE | Engineering | 0.310820 | 0.023043 |
3 | METALLURGICAL ENGINEERING | Engineering | 0.153037 | 0.024096 |
24 | MATERIALS ENGINEERING AND MATERIALS SCIENCE | Engineering | 0.292607 | 0.027789 |
39 | INDUSTRIAL PRODUCTION TECHNOLOGIES | Engineering | 0.750473 | 0.028308 |
51 | ENGINEERING AND INDUSTRIAL MANAGEMENT | Engineering | 0.174123 | 0.033652 |
17 | INDUSTRIAL AND MANUFACTURING ENGINEERING | Engineering | 0.343473 | 0.042876 |
4 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | Engineering | 0.107313 | 0.050125 |
31 | ENVIRONMENTAL ENGINEERING | Engineering | 0.558548 | 0.093589 |
If we instead want the results ordered by the same column but in descending order, we can add the DESC
keyword:
SELECT Rank, Major, Major_category, ShareWomen, Unemployment_rate
FROM recent_gradsWHERE (Major_category = 'Engineering') AND (ShareWomen > 0.5 OR Unemployment_rate < 0.051)
ORDER BY Unemployment_rate DESC
Rank | Major | Major_category | ShareWomen | Unemployment_rate |
---|---|---|---|---|
31 | ENVIRONMENTAL ENGINEERING | Engineering | 0.558548 | 0.093589 |
4 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | Engineering | 0.107313 | 0.050125 |
17 | INDUSTRIAL AND MANUFACTURING ENGINEERING | Engineering | 0.343473 | 0.042876 |
51 | ENGINEERING AND INDUSTRIAL MANAGEMENT | Engineering | 0.174123 | 0.033652 |
39 | INDUSTRIAL PRODUCTION TECHNOLOGIES | Engineering | 0.750473 | 0.028308 |
24 | MATERIALS ENGINEERING AND MATERIALS SCIENCE | Engineering | 0.292607 | 0.027789 |
3 | METALLURGICAL ENGINEERING | Engineering | 0.153037 | 0.024096 |
14 | MATERIALS SCIENCE | Engineering | 0.310820 | 0.023043 |
1 | PETROLEUM ENGINEERING | Engineering | 0.120564 | 0.018381 |
15 | ENGINEERING MECHANICS PHYSICS AND SCIENCE | Engineering | 0.183985 | 0.006334 |
Let’s write a query that returns all majors where ShareWomen
is greater than 0.3
and Unemployment_rate
is less than .1
. Let’s only include the following columns in the results and in this order:
Major
,ShareWomen
,Unemployment_rate
We’ll order the results in descending order by the ShareWomen
column. This will allow us to easily compare the unemployment rates for majors with higher or lower shares of women.
SELECT Major, ShareWomen, Unemployment_rate FROM recent_grads
WHERE ShareWomen > 0.3 AND Unemployment_rate < .1O
RDER BY ShareWomen DESC
And here are our results!
Major | ShareWomen | Unemployment_rate |
---|---|---|
EARLY CHILDHOOD EDUCATION | 0.967998 | 0.040105 |
MATHEMATICS AND COMPUTER SCIENCE | 0.927807 | 0.000000 |
ELEMENTARY EDUCATION | 0.923745 | 0.046586 |
ANIMAL SCIENCES | 0.910933 | 0.050862 |
PHYSIOLOGY | 0.906677 | 0.069163 |
MISCELLANEOUS PSYCHOLOGY | 0.905590 | 0.051908 |
HUMAN SERVICES AND COMMUNITY ORGANIZATION | 0.904075 | 0.037819 |
NURSING | 0.896019 | 0.044863 |
GEOSCIENCES | 0.881294 | 0.024374 |
MASS MEDIA | 0.877228 | 0.089837 |
COGNITIVE SCIENCE AND BIOPSYCHOLOGY | 0.854523 | 0.075236 |
ART HISTORY AND CRITICISM | 0.845934 | 0.060298 |
EDUCATIONAL PSYCHOLOGY | 0.817099 | 0.065112 |
GENERAL EDUCATION | 0.812877 | 0.057360 |
SOCIAL WORK | 0.810704 | 0.068828 |
TEACHER EDUCATION: MULTIPLE LEVELS | 0.798920 | 0.036546 |
COUNSELING PSYCHOLOGY | 0.798746 | 0.053621 |
MATHEMATICS TEACHER EDUCATION | 0.792095 | 0.016203 |
PSYCHOLOGY | 0.779933 | 0.083811 |
GENERAL MEDICAL AND HEALTH SERVICES | 0.774577 | 0.082102 |
HEALTH AND MEDICAL ADMINISTRATIVE SERVICES | 0.770901 | 0.089626 |
SOIL SCIENCE | 0.764427 | 0.000000 |
AREA ETHNIC AND CIVILIZATION STUDIES | 0.758060 | 0.063429 |
APPLIED MATHEMATICS | 0.753927 | 0.090823 |
FAMILY AND CONSUMER SCIENCES | 0.752144 | 0.067128 |
INDUSTRIAL PRODUCTION TECHNOLOGIES | 0.750473 | 0.028308 |
SOCIAL PSYCHOLOGY | 0.747561 | 0.029650 |
HUMANITIES | 0.745662 | 0.068584 |
HOSPITALITY MANAGEMENT | 0.733992 | 0.061169 |
SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION | 0.733968 | 0.054083 |
THEOLOGY AND RELIGIOUS VOCATIONS | 0.728495 | 0.062628 |
FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN L… | 0.728033 | 0.075566 |
INTERDISCIPLINARY SOCIAL SCIENCES | 0.721866 | 0.092306 |
MISCELLANEOUS AGRICULTURE | 0.719974 | 0.059767 |
JOURNALISM | 0.719859 | 0.069176 |
MISCELLANEOUS EDUCATION | 0.718365 | 0.059212 |
COMPUTER AND INFORMATION SYSTEMS | 0.707719 | 0.093460 |
COMMUNICATION DISORDERS SCIENCES AND SERVICES | 0.707136 | 0.047584 |
MISCELLANEOUS HEALTH MEDICAL PROFESSIONS | 0.702020 | 0.081411 |
LIBERAL ARTS | 0.700898 | 0.078268 |
FORESTRY | 0.690365 | 0.096726 |
OCEANOGRAPHY | 0.688999 | 0.056995 |
ART AND MUSIC EDUCATION | 0.686024 | 0.038638 |
PHYSICAL FITNESS PARKS RECREATION AND LEISURE | 0.683943 | 0.051467 |
ADVERTISING AND PUBLIC RELATIONS | 0.673143 | 0.067961 |
HUMAN RESOURCES AND PERSONNEL MANAGEMENT | 0.672161 | 0.059570 |
MULTI-DISCIPLINARY OR GENERAL SCIENCE | 0.669999 | 0.055807 |
FINE ARTS | 0.667034 | 0.084186 |
COMPOSITION AND RHETORIC | 0.666119 | 0.081742 |
HISTORY | 0.651741 | 0.095667 |
ECOLOGY | 0.651660 | 0.054475 |
GENETICS | 0.643331 | 0.034118 |
TREATMENT THERAPY PROFESSIONS | 0.640000 | 0.059821 |
NUTRITION SCIENCES | 0.638147 | 0.068701 |
ZOOLOGY | 0.637293 | 0.046320 |
INTERNATIONAL RELATIONS | 0.632987 | 0.096799 |
UNITED STATES HISTORY | 0.630716 | 0.047179 |
DRAMA AND THEATER ARTS | 0.629505 | 0.077541 |
CRIMINOLOGY | 0.618223 | 0.097244 |
MICROBIOLOGY | 0.615727 | 0.066776 |
PLANT SCIENCE AND AGRONOMY | 0.606889 | 0.045455 |
BIOLOGY | 0.601858 | 0.070725 |
SECONDARY TEACHER EDUCATION | 0.601752 | 0.052229 |
AGRICULTURE PRODUCTION AND MANAGEMENT | 0.594208 | 0.050031 |
PRE-LAW AND LEGAL STUDIES | 0.591001 | 0.071965 |
AGRICULTURAL ECONOMICS | 0.589712 | 0.077250 |
STUDIO ARTS | 0.584776 | 0.089552 |
ENVIRONMENTAL SCIENCE | 0.584556 | 0.078585 |
BUSINESS MANAGEMENT AND ADMINISTRATION | 0.580948 | 0.072218 |
COMPUTER SCIENCE | 0.578766 | 0.063173 |
LANGUAGE AND DRAMA EDUCATION | 0.576360 | 0.050306 |
MISCELLANEOUS BIOLOGY | 0.566641 | 0.058545 |
NATURAL RESOURCES MANAGEMENT | 0.564639 | 0.066619 |
ENVIRONMENTAL ENGINEERING | 0.558548 | 0.093589 |
HEALTH AND MEDICAL PREPARATORY PROGRAMS | 0.556604 | 0.069780 |
MISCELLANEOUS SOCIAL SCIENCES | 0.543405 | 0.073080 |
ACTUARIAL SCIENCE | 0.535714 | 0.095652 |
SOCIOLOGY | 0.532334 | 0.084951 |
BOTANY | 0.528969 | 0.000000 |
INFORMATION SCIENCES | 0.526476 | 0.060741 |
PHARMACOLOGY | 0.524153 | 0.085532 |
GENERAL AGRICULTURE | 0.515543 | 0.019642 |
BIOCHEMICAL SCIENCES | 0.515406 | 0.080531 |
INTERCULTURAL AND INTERNATIONAL STUDIES | 0.507377 | 0.083634 |
PHYSICAL AND HEALTH EDUCATION TEACHING | 0.506721 | 0.074667 |
CHEMISTRY | 0.505141 | 0.053972 |
MULTI/INTERDISCIPLINARY STUDIES | 0.495397 | 0.070861 |
NEUROSCIENCE | 0.475010 | 0.048482 |
GEOLOGY AND EARTH SCIENCE | 0.470197 | 0.075449 |
PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTR… | 0.451465 | 0.055521 |
EDUCATIONAL ADMINISTRATION AND SUPERVISION | 0.448732 | 0.000000 |
PHYSICS | 0.448099 | 0.048224 |
MUSIC | 0.444582 | 0.075960 |
ASTRONOMY AND ASTROPHYSICS | 0.441356 | 0.021167 |
ELECTRICAL ENGINEERING | 0.437847 | 0.059174 |
MEDICAL TECHNOLOGIES TECHNICIANS | 0.434298 | 0.036983 |
NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL … | 0.430537 | 0.071540 |
PHYSICAL SCIENCES | 0.426924 | 0.035354 |
SCIENCE AND COMPUTER TEACHER EDUCATION | 0.423209 | 0.047264 |
GENERAL BUSINESS | 0.417925 | 0.072861 |
PHILOSOPHY AND RELIGIOUS STUDIES | 0.416810 | 0.096052 |
MISCELLANEOUS FINE ARTS | 0.410180 | 0.089375 |
COSMETOLOGY SERVICES AND CULINARY ARTS | 0.383719 | 0.055677 |
MARKETING AND MARKETING RESEARCH | 0.382900 | 0.061215 |
MECHANICAL ENGINEERING RELATED TECHNOLOGIES | 0.377437 | 0.056357 |
COMMERCIAL ART AND GRAPHIC DESIGN | 0.374356 | 0.096798 |
SPECIAL NEEDS EDUCATION | 0.366177 | 0.041508 |
FINANCE | 0.355469 | 0.060686 |
ARCHITECTURAL ENGINEERING | 0.350442 | 0.061931 |
INDUSTRIAL AND MANUFACTURING ENGINEERING | 0.343473 | 0.042876 |
CONSTRUCTION SERVICES | 0.342229 | 0.060023 |
CHEMICAL ENGINEERING | 0.341631 | 0.061098 |
ECONOMICS | 0.340825 | 0.099092 |
ENGLISH LANGUAGE AND LITERATURE | 0.339671 | 0.087724 |
ELECTRICAL ENGINEERING TECHNOLOGY | 0.325092 | 0.087557 |
GEOLOGICAL AND GEOPHYSICAL ENGINEERING | 0.324838 | 0.075038 |
OPERATIONS LOGISTICS AND E-COMMERCE | 0.322222 | 0.047859 |
TRANSPORTATION SCIENCES AND TECHNOLOGIES | 0.321296 | 0.072725 |
BIOLOGICAL ENGINEERING | 0.320784 | 0.087143 |
MATERIALS SCIENCE | 0.310820 | 0.023043 |
COMMUNICATIONS | 0.305109 | 0.075177 |
Master the Fundamentals of SQL
As you can see, SQL is a powerful language for accessing data, and this post is really just scratching the surface. If you’d like to learn more, we encourage you to sign up for free and start our interactive SQL Fundamentals course, which this blog post is based on. The course goes even deeper, though, and will teach you how to use SQL to do things like:
- Calculate summary statistics
- Further segment data using grouping
- Write more complex queries using subqueries
In the course, you’ll continue to work with salary data from recent college graduates as well as data from the CIA World Factbook.
Learn SQL the right way!
- Write real queries
- Work with real data
- Right in your browser!
Why passively watch video lectures when you can learn by doing?