January 19, 2023

Top 20 SQL JOINs Interview Questions and Answers (2023)

SQL joins

Data is very valuable to organizations. Actionable insights that give an organization competitive advantage and help it run more efficiently can be extracted from the organization’s data. Therefore, data must be collected and stored.

Databases are an organized way to store and query data. There are two main types of databases: relational and non-relational.

Relational databases are very popular because they’re structured and data is organized in tables. The data in the various tables may have one-to-one, one-to-many, or many-to-many relationships. The way the data is organized in a relational database is defined in its schema.

Non-relational databases store data using simple key-value pairs in non-tabular form. Although relational databases support Binary Large Objects (BLOB) for storing semi-structured and unstructed data, storing and retrieving these types of data is easier with non-relational databases.

The Structured Query Language (SQL) is the language for accessing and interacting with relational databases. If you want to add, delete, edit, or query information on a relational database, the easiest way to do it is through SQL.

Relational databases dominate the database market and they are projected to grow by more than 30 percent between 2021 and 2026. SQL is the most in-demand skill for data related jobs. As more and more organizations embrace the use of relational databases, the demand for data professionals with SQL skills will continue to grow.

More often than not, you may need to combine two or more tables in a relational database to get the data needed for performing your analysis. You can combine tables in SQL with JOIN clauses. There are several SQL JOIN clauses; understanding how each of them works can be challenging.

SQL JOIN questions are, therefore, interviewers’ favorites. They tend to show up in most SQL interviews. In this tutorial, we’ll take you step-by-step through the answers to the top 20 SQL JOIN interview questions — and equip you with the knowledge to ace your upcoming SQL interviews. We’ll be writing a lot of SQL queries. This SQL Cheat Sheet will help you get up to speed if your SQL skills are a little rusty.


SQL JOINs Interview Questions

In a SQL interview, you’ll most likely be asked questions that require you to combine tables. SQL JOIN clauses are used to combine data from two or more tables in a relational database.

SQL JOIN clauses are often used when the tables share some sort of relationship. The most common conditional expression to join tables is the equality conditional (equi-join). However, you can join tables that don’t share any relationships and use other conditional expressions besides equality.

Here are some commoly asked SQL JOIN interview questions and their answers. We recommend that you draw your own diagrams and repeatedly practice these questions to master how SQL JOINs work.

1. What are SQL JOINs?

Answer

SQL JOIN clauses are used to combine rows from two or more tables. Usually, tables are joined based on a related column between them, but you can also join tables that have no relationships. The term "SQL JOIN" is used to refer to a SQL query with the JOIN keyword. This is known as an explicit join. However, you can also join tables in SQL without using the JOIN keyword. This is known as an implicit join.

2. What are the general explicit and implicit join notation syntaxes?

Answer

EXPLICIT JOIN:

SELECT *
FROM [left table]
[JOIN CLAUSE] [right table]
ON conditional expression;

IMPLICIT JOIN:

SELECT *
FROM [left table], [right table]
WHERE conditional expression;

The conditional expression is not required for some types of SQL JOINs.

3. What are the different types of JOINs in SQL?

Answer

image-1.png

The main types of SQL JOINs are:

  • CROSS JOIN: matches every row of the left table with every row of the right table and returns their Cartesian product. It’s also known as Cartesian JOIN.
  • [INNER] JOIN: returns rows that have matching values on both the right and left tables, based on the specified conditional expression.
  • NATURAL JOIN: a type of equi-join that combines the left and right tables using common column names.
  • LEFT [OUTER] JOIN: returns all the rows from the left table with the matching rows from the right table, based on the specified conditional expression.
  • RIGHT [OUTER] JOIN: returns all the rows from the right table with the matching rows from the left table, based on the specified conditional expression.
  • FULL [OUTER] JOIN: returns all rows from the left and the right tables, regardless of whether the conditional expression is matched.
  • SELF JOIN: allows you to join a table to itself as if the tables were two different tables.

It’s worth noting that the SQL syntax placed inside square brackets is optional and can be excluded from the query. For example, the SQL query below executes an INNER JOIN:

SELECT *
FROM left_table
JOIN right_table
ON conditional expression;

4. What type of JOIN is executed without conditional expression in the explict and implict join notations shown below?

Explicit join:

SELECT *
FROM [left table]
[JOIN CLAUSE] [right table]

Implict join:

SELECT *
FROM [left table], [right table]

Answer

Let’s set up a SQLite database with tables. When you try to connect to a database that doesn’t exist, it’s created for you. First:

pip install ipython-sql

Then create the database and tables:

%%capture

sql sqlite:///joins.db
%%sql

-- create a students table:

CREATE TABLE students (
    student_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL
);

INSERT INTO students VALUES (1, "Mary", "Wilson");
INSERT INTO students VALUES (2, "Tim", "Ben");
INSERT INTO students VALUES (3, "Alice", "Robinson");
INSERT INTO students VALUES (4, "Reece", "Bells");

-- create a student_contact table:

CREATE TABLE student_contact (
    student_id,
    email_address,
    FOREIGN KEY (student_id) REFERENCES students(student_id)
);

INSERT INTO student_contact VALUES (1, "[email protected]");
INSERT INTO student_contact VALUES (2, "[email protected]");
INSERT INTO student_contact VALUES (3, "[email protected]");

-- create a staff table:

CREATE TABLE staff (
    staff_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL
);

INSERT INTO staff VALUES (1, "Ada", "Lovelace");
INSERT INTO staff VALUES (2, "Adam ", "Smith");
INSERT INTO staff VALUES (3, "Nikolo", "Tesla");

-- create a staff_contact table:

CREATE TABLE staff_contact (
    staff_id,
    email_address,
    FOREIGN KEY (staff_id) REFERENCES staff(staff_id)
);

INSERT INTO staff_contact VALUES (1, "[email protected]");
INSERT INTO staff_contact VALUES (2, "[email protected]");
INSERT INTO staff_contact VALUES (3, "[email protected]");
     * sqlite:///joins.db
    Done.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    Done.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    Done.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    Done.
    1 rows affected.
    1 rows affected.
    1 rows affected.

We’ll use the student_contact as the left table and staff_contact as the right table to answer this question. Let’s see them:

%sql SELECT * FROM student_contact;
     * sqlite:///joins.db
    Done.
student_id email_address
1 [email protected]
2 [email protected]
3 [email protected]
%sql SELECT * FROM staff_contact;
     * sqlite:///joins.db
    Done.
staff_id email_address
1 [email protected]
2 [email protected]
3 [email protected]

If we don’t specify the join condition, SQL does it for us. It assumes a CROSS JOIN. If the left table has n rows and y columns, and the right table has m rows and z columns, a CROSS JOIN will return y + z columns and m x n rows. Our example will return:

student_id email_address staff_id email_address
1 [email protected] 1 [email protected]
1 [email protected] 2 [email protected]
1 [email protected] 3 [email protected]
2 [email protected] 1 [email protected]
2 [email protected] 2 [email protected]
2 [email protected] 3 [email protected]
3 [email protected] 1 [email protected]
3 [email protected] 2 [email protected]
3 [email protected] 3 [email protected]

This is true for JOIN clauses that don’t have conditional expressions built into them. The NATURAL JOIN has an equality expression built into it and will not result in a CROSS JOIN because we don’t even need to specify the conditional expression when using it.

Let’s evaluate whether this is true for the the following explict joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN:

%%sql

-- INNER JOIN without conditional expression

SELECT *
FROM student_contact
INNER JOIN staff_contact;
     * sqlite:///joins.db
    Done.
student_id email_address staff_id email_address_1
1 [email protected] 1 [email protected]
1 [email protected] 2 [email protected]
1 [email protected] 3 [email protected]
2 [email protected] 1 [email protected]
2 [email protected] 2 [email protected]
2 [email protected] 3 [email protected]
3 [email protected] 1 [email protected]
3 [email protected] 2 [email protected]
3 [email protected] 3 [email protected]
%%sql

-- LEFT OUTER JOIN without conditional expression

SELECT *
FROM student_contact
LEFT OUTER JOIN staff_contact;
     * sqlite:///joins.db
    Done.
student_id email_address staff_id email_address_1
1 [email protected] 1 [email protected]
1 [email protected] 2 [email protected]
1 [email protected] 3 [email protected]
2 [email protected] 1 [email protected]
2 [email protected] 2 [email protected]
2 [email protected] 3 [email protected]
3 [email protected] 1 [email protected]
3 [email protected] 2 [email protected]
3 [email protected] 3 [email protected]
%%sql

-- RIGHT OUTER JOIN without conditional expression

SELECT *
FROM student_contact
RIGHT OUTER JOIN staff_contact;
     * sqlite:///joins.db
    Done.
student_id email_address staff_id email_address_1
1 [email protected] 1 [email protected]
1 [email protected] 2 [email protected]
1 [email protected] 3 [email protected]
2 [email protected] 1 [email protected]
2 [email protected] 2 [email protected]
2 [email protected] 3 [email protected]
3 [email protected] 1 [email protected]
3 [email protected] 2 [email protected]
3 [email protected] 3 [email protected]
%%sql

-- FULL OUTER JOIN without conditional expression

SELECT *
FROM student_contact
FULL OUTER JOIN staff_contact;
     * sqlite:///joins.db
    Done.
student_id email_address staff_id email_address_1
1 [email protected] 1 [email protected]
1 [email protected] 2 [email protected]
1 [email protected] 3 [email protected]
2 [email protected] 1 [email protected]
2 [email protected] 2 [email protected]
2 [email protected] 3 [email protected]
3 [email protected] 1 [email protected]
3 [email protected] 2 [email protected]
3 [email protected] 3 [email protected]

Let’s evaluate whether this is also true for the implicit join:

%%sql

-- IMPLICIT JOIN without conditional expression

SELECT *
FROM student_contact, staff_contact;
     * sqlite:///joins.db
    Done.
student_id email_address staff_id email_address_1
1 [email protected] 1 [email protected]
1 [email protected] 2 [email protected]
1 [email protected] 3 [email protected]
2 [email protected] 1 [email protected]
2 [email protected] 2 [email protected]
2 [email protected] 3 [email protected]
3 [email protected] 1 [email protected]
3 [email protected] 2 [email protected]
3 [email protected] 3 [email protected]

Lastly, let’s evaluate whether this is true for a NATURAL JOIN:

%%sql

-- NATURAL JOIN has in-built equality conditional

SELECT *
FROM student_contact
NATURAL JOIN staff_contact;
     * sqlite:///joins.db
    Done.
student_id email_address staff_id

NATURAL JOIN does not return a CROSS JOIN because of its in-built equality conditional. It searches for the same column name(s) in the left and right tables and applies the equality (=) conditional. For this example, email_address is the similar column in both tables. However, there are no matching email addresses in the tables, so an empty table is returned.

5. How is the INNER JOIN executed?

Answer

The INNER JOIN returns the matching values in both the right and left tables. A CROSS JOIN is first executed, then the conditional expression specified is used to limit the rows returned. Let’s illustrate with the students and student_contact tables.

The cross join of these tables returns:

%%sql

-- CROSS JOIN of the students and student_contact

SELECT *
FROM students
CROSS JOIN student_contact;
     * sqlite:///joins.db
    Done.
student_id first_name last_name student_id_1 email_address
1 Mary Wilson 1 [email protected]
1 Mary Wilson 2 [email protected]
1 Mary Wilson 3 [email protected]
2 Tim Ben 1 [email protected]
2 Tim Ben 2 [email protected]
2 Tim Ben 3 [email protected]
3 Alice Robinson 1 [email protected]
3 Alice Robinson 2 [email protected]
3 Alice Robinson 3 [email protected]
4 Reece Bells 1 [email protected]
4 Reece Bells 2 [email protected]
4 Reece Bells 3 [email protected]

Let’s assume we’re using the equality conditional. Then rows matching in the student_id columns from both tables are returned:

student_id first_name last_name student_id email_address
1 Mary Wilson 1 [email protected]
2 Tim Ben 2 [email protected]
3 Alice Robinson 3 [email protected]

There is no matching value for Reece Bells on both tables. So, values from this row are not returned. This SQL query can be evaluated using an INNER JOIN as:

%%sql

-- INNER JOIN of the students and student_contact

SELECT *
FROM students
INNER JOIN student_contact
    ON students.student_id = student_contact.student_id;
     * sqlite:///joins.db
    Done.
student_id first_name last_name student_id_1 email_address
1 Mary Wilson 1 [email protected]
2 Tim Ben 2 [email protected]
3 Alice Robinson 3 [email protected]

6. How is the LEFT OUTER JOIN executed?

Answer

First LEFT OUTER JOIN returns all the rows in the left table:

student_id first_name last_name student_id email_address
1 Mary Wilson
2 Tim Ben
3 Alice Robinson
4 Reece Bells

Next, matching values from CROSS JOIN with the right tables are also returned:

student_id first_name last_name student_id email_address
1 Mary Wilson 1 [email protected]
2 Tim Ben 2 [email protected]
3 Alice Robinson 3 [email protected]
4 Reece Bells

Finally, unmatched rows from the left table are added to the result. Their values are padded with either NULL or None:

student_id first_name last_name student_id email_address
1 Mary Wilson 1 [email protected]
2 Tim Ben 2 [email protected]
3 Alice Robinson 3 [email protected]
4 Reece Bells NULL NULL

The SQL query for a LEFT OUTER JOIN with equality conditional is shown below:

%%sql

-- LEFT OUTER JOIN of the students and student_contact

SELECT *
FROM students
LEFT JOIN student_contact
    ON students.student_id = student_contact.student_id;
     * sqlite:///joins.db
    Done.
student_id first_name last_name student_id_1 email_address
1 Mary Wilson 1 [email protected]
2 Tim Ben 2 [email protected]
3 Alice Robinson 3 [email protected]
4 Reece Bells None None

7. How is the FULL OUTER JOIN executed?

Answer

The FULL OUTER JOIN returns all rows from the left and the right tables, regardless of whether the conditional expression is matched. To show how this works, we’ll create two additional tables. The courses table lists the courses presently being offered in school, and the last_enrolment table lists the courses students last enrolled in.

%%sql

-- CREATE the courses table

CREATE TABLE courses (
    course_id INTEGER PRIMARY KEY,
    course_name TEXT NOT NULL
);

INSERT INTO courses VALUES (100, "Advanced Physics");
INSERT INTO courses VALUES (200, "Computer Science");
INSERT INTO courses VALUES (300, "Economics");
INSERT INTO courses VALUES (400, "Quantum Computing");
INSERT INTO courses VALUES (500, "Cryptography");

-- CREATE the last_enrolment table

CREATE TABLE last_enrolment (
    student_id,
    course_id,
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

INSERT INTO last_enrolment VALUES (2, 500);
INSERT INTO last_enrolment VALUES (1, 500);
INSERT INTO last_enrolment VALUES (3, 400);
INSERT INTO last_enrolment VALUES (2, 400);
INSERT INTO last_enrolment VALUES (4, 111);
     * sqlite:///joins.db
    Done.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    Done.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    1 rows affected.
    1 rows affected.

Let’s view these tables:

%sql SELECT * FROM courses;
     * sqlite:///joins.db
    Done.
course_id course_name
100 Advanced Physics
200 Computer Science
300 Economics
400 Quantum Computing
500 Cryptography
%sql SELECT * FROM last_enrolment;
     * sqlite:///joins.db
    Done.
student_id course_id
2 500
1 500
3 400
2 400
4 111

Let’s get back to how the FULL OUTER JOIN works. First, the CROSS JOIN is evaluated:

%%sql
SELECT * 
FROM last_enrolment
CROSS JOIN courses;
     * sqlite:///joins.db
    Done.
student_id course_id course_id_1 course_name
2 500 100 Advanced Physics
2 500 200 Computer Science
2 500 300 Economics
2 500 400 Quantum Computing
2 500 500 Cryptography
1 500 100 Advanced Physics
1 500 200 Computer Science
1 500 300 Economics
1 500 400 Quantum Computing
1 500 500 Cryptography
3 400 100 Advanced Physics
3 400 200 Computer Science
3 400 300 Economics
3 400 400 Quantum Computing
3 400 500 Cryptography
2 400 100 Advanced Physics
2 400 200 Computer Science
2 400 300 Economics
2 400 400 Quantum Computing
2 400 500 Cryptography
4 111 100 Advanced Physics
4 111 200 Computer Science
4 111 300 Economics
4 111 400 Quantum Computing
4 111 500 Cryptography

Next, the entire row of the left table, last_enrolment, is returned:

student_id course_id
2 500
1 500
3 400
2 400
4 111

Then the matching values in the right table, courses, are returned from the CROSS JOIN:

student_id course_id course_id course_name
2 500 500 Cryptography
1 500 500 Crypography
3 400 400 Quantum Computing
2 400 400 Quantum Computing
4 111

Next, the non-matching rows in the right table are returned:

student_id course_id course_id course_name
2 500 500 Cryptography
1 500 500 Crypography
3 400 400 Quantum Computing
2 400 400 Quantum Computing
4 111
100 Advanced Physics
200 Computer Science
300 Economics

Finally, the empty cells are padded with NULL or None:

student_id course_id course_id course_name
2 500 500 Cryptography
1 500 500 Crypography
3 400 400 Quantum Computing
2 400 400 Quantum Computing
4 111 NULL NULL
NULL NULL 100 Advanced Physics
NULL NULL 200 Computer Science
NULL NULL 300 Economics

Let’s confirm that our result is correct by running a FULL OUTER JOIN query:

%%sql

SELECT * 
FROM last_enrolment
FULL OUTER JOIN courses
    ON last_enrolment.course_id = courses.course_id;
     * sqlite:///joins.db
    Done.
student_id course_id course_id_1 course_name
2 500 500 Cryptography
1 500 500 Cryptography
3 400 400 Quantum Computing
2 400 400 Quantum Computing
4 111 None None
None None 100 Advanced Physics
None None 200 Computer Science
None None 300 Economics

8. How do you eliminate redundant columns from SQL JOINs?

Answer

The first way to eliminate redundant columns when running JOIN queries is to use the USING clause as our conditional expression:

SELECT *
FROM left_table
[JOIN CLAUSE] right_table
    USING [col_name, ..];

Let’s first run an INNER JOIN query:

%%sql

-- INNER JOIN of the students and student_contact

SELECT *
FROM students
INNER JOIN student_contact
    ON students.student_id = student_contact.student_id;
     * sqlite:///joins.db
    Done.
student_id first_name last_name student_id_1 email_address
1 Mary Wilson 1 [email protected]
2 Tim Ben 2 [email protected]
3 Alice Robinson 3 [email protected]

There are duplicate student_id columns. Let’s eliminate this redundancy with the USING clause:

%%sql

-- INNER JOIN of the students and student_contact with USING

SELECT *
FROM students
INNER JOIN student_contact
    USING (student_id);
     * sqlite:///joins.db
    Done.
student_id first_name last_name email_address
1 Mary Wilson [email protected]
2 Tim Ben [email protected]
3 Alice Robinson [email protected]

You can see that the duplicate column has been removed.

The second way to eliminate the redundant column is to use the NATURAL JOIN. The NATURAL JOIN clause is semantically similar to the INNER JOIN .. USING clause, in which the conditional expression is equality.

Let try NATURAL JOIN with the previous example:

%%sql

-- NATURAL JOIN instead of `INNER JOIN ... USING`

SELECT *
FROM students
NATURAL JOIN student_contact;
     * sqlite:///joins.db
    Done.
student_id first_name last_name email_address
1 Mary Wilson [email protected]
2 Tim Ben [email protected]
3 Alice Robinson [email protected]

A third way to eliminate duplicate columns is to list the column names you want displayed in SELECT:

%%sql

-- INNER JOIN list column names in SELECT and using ALIASES

SELECT 
    s.student_id, 
    s.first_name, 
    s.last_name, 
    sc.email_address
FROM students AS s
INNER JOIN student_contact AS sc
    ON s.student_id = sc.student_id;
     * sqlite:///joins.db
    Done.
student_id first_name last_name email_address
1 Mary Wilson [email protected]
2 Tim Ben [email protected]
3 Alice Robinson [email protected]

So far, we’ve seen how the SQL JOIN clauses are used to combine two tables. Some of the SQL JOIN interview questions that we’ll see next will require combining and filtering data from more than two tables. Writing SQL JOIN queries for these types of questions may be challenging. Take the Combining Tables in SQL Course to reinforce your knowledge of how SQL JOINS work and prepare yourself for the next questions.

9. How is a UNION clause different from a JOIN clause?

Answer

The JOIN and UNION clauses are used to combine data from two or more tables. With the JOIN clause, the columns and matching rows from both tables are returned. The number of columns in the combined table is the sum of the number of columns on both tables.

UNION combines data from tables by stacking them vertically. The number of columns in the SELECT statement and the order of their data types must be the same. The staff and students tables can be combined with UNION as follows:

%%sql 

/*
* Both tables have columns with the same data types in the same order
* We can use the * wild card instead of specifying columns in SELECT
*/

SELECT * FROM staff

UNION

SELECT * FROM students;
     * sqlite:///joins.db
    Done.
staff_id first_name last_name
1 Ada Lovelace
1 Mary Wilson
2 Adam Smith
2 Tim Ben
3 Alice Robinson
3 Nikolo Tesla
4 Reece Bells
%%sql 

/*
 * We specify the columns we want returned in SELECT
*/

SELECT 
    first_name, 
    last_name
FROM staff

UNION

SELECT 
    first_name, 
    last_name
FROM students;
     * sqlite:///joins.db
    Done.
first_name last_name
Ada Lovelace
Adam Smith
Alice Robinson
Mary Wilson
Nikolo Tesla
Reece Bells
Tim Ben

10. What do you understand by Non-EQUI JOIN?

Answer

Non-EQUI JOINs combine tables using other conditional operators besides equality. The other operators may include: less than (<), less than or equals (<=), greater than (>), greater than or equals (>=), not equals (<>), and BETWEEN. For example:

%%sql

SELECT * 
FROM last_enrolment
FULL OUTER JOIN courses
    ON last_enrolment.course_id > 300
    AND courses.course_id < 300
WHERE courses.course_id IS NOT NULL
    AND last_enrolment.course_id IS NOT NULL;
     * sqlite:///joins.db
    Done.
student_id course_id course_id_1 course_name
2 500 100 Advanced Physics
2 500 200 Computer Science
1 500 100 Advanced Physics
1 500 200 Computer Science
3 400 100 Advanced Physics
3 400 200 Computer Science
2 400 100 Advanced Physics
2 400 200 Computer Science

11. How will you join a table to itself?

Answer

You can join a table to itself if it contains related columns. Let’s assume that Ada Lovelace is the supervisor to the other staff. We’ll add a supervisor column with Ada Lovelace’s staff_id. The supervisor and staff_id columns are related.

%sql SELECT * FROM staff;
     * sqlite:///joins.db
    Done.
staff_id first_name last_name
1 Ada Lovelace
2 Adam Smith
3 Nikolo Tesla
%%sql

-- Add supervisor column

ALTER TABLE staff
ADD supervisor INTEGER;

SELECT * FROM staff;
     * sqlite:///joins.db
    Done.
    Done.
staff_id first_name last_name supervisor
1 Ada Lovelace None
2 Adam Smith None
3 Nikolo Tesla None
%%sql

-- Update supervisor column with Ada Lovelace staff_id

UPDATE staff
SET supervisor = 1
WHERE staff_id <> 1;

SELECT * FROM staff;
     * sqlite:///joins.db
    Done.
    Done.
staff_id first_name last_name supervisor
1 Ada Lovelace None
2 Adam Smith 1
3 Nikolo Tesla 1

Let’s perform self join on this table to list the staff and their supervisor:

%%sql

SELECT * 
FROM staff s1
LEFT JOIN staff s2
    ON s1.supervisor = s2.staff_id
     * sqlite:///joins.db
    Done.
staff_id first_name last_name supervisor staff_id_1 first_name_1 last_name_1 supervisor_1
1 Ada Lovelace None None None None None
2 Adam Smith 1 1 Ada Lovelace None
3 Nikolo Tesla 1 1 Ada Lovelace None

You can observe that the left table, s1, contains the information for the staff and the right table; s2 contains the information about their supervisors. Let’s make things clearer by specifying column names in SELECT:

%%sql

SELECT
    s1.staff_id AS staff_id,
    s1.first_name AS first_name,
    s1.last_name AS last_name,
    s2.first_name AS supervisor_first_name,
    s2.last_name AS supervisor_last_name
FROM staff s1
LEFT JOIN staff s2
    ON s1.supervisor = s2.staff_id
     * sqlite:///joins.db
    Done.
staff_id first_name last_name supervisor_first_name supervisor_last_name
1 Ada Lovelace None None
2 Adam Smith Ada Lovelace
3 Nikolo Tesla Ada Lovelace

Let’s drop the supervisor column we added to the staff table:

%%sql

ALTER TABLE staff
DROP COLUMN supervisor;

SELECT * FROM staff;
     * sqlite:///joins.db
    Done.
    Done.
staff_id first_name last_name
1 Ada Lovelace
2 Adam Smith
3 Nikolo Tesla

12. What do you understand about Nested JOINs?

Answer

Nested JOIN simply means combining more than two tables. Let’s use nested join to create a table containing students’ full information: id, names, contact, and courses last enrolled.

%%sql

SELECT *
FROM students
LEFT JOIN student_contact
    USING (student_id)
LEFT JOIN last_enrolment
    USING (student_id)
LEFT JOIN courses
    USING (course_id);
     * sqlite:///joins.db
    Done.
student_id first_name last_name email_address course_id course_name
1 Mary Wilson [email protected] 500 Cryptography
2 Tim Ben [email protected] 500 Cryptography
2 Tim Ben [email protected] 400 Quantum Computing
3 Alice Robinson [email protected] 400 Quantum Computing
4 Reece Bells None 111 None

At the start, the students and student_contact tables are the left and right tables respectively. When these two tables are joined, the resulting table is the left table for the next join operation with the last_enrolment table, which is the right table. The resulting table from this join operation is the left table when joining with the courses table.

13. How will you use a subquery to execute INNER JOINs for more than two tables?

Answer

Both subquery and join can be used to combine data from two or more tables. Subqueries can be used to perform inner join operations if the equality conditional operator is used with the WHERE clause. This is an example of implict join:

%%sql

SELECT *
FROM (
        -- Third join operation

        SELECT * 
        FROM

            -- Second join operation

            (SELECT * 
            FROM
                (

                -- First join operation

                SELECT * 
                FROM students, student_contact
                WHERE students.student_id = student_contact.student_id

                -- End of first join operation

                ) AS sub_query_one, last_enrolment
            WHERE sub_query_one.student_id = last_enrolment.student_id

             -- End of second join operation

            ) AS sub_query_two, courses

        WHERE sub_query_two.course_id = courses.course_id

        -- End of third join operation   
)
ORDER BY 1;
     * sqlite:///joins.db
    Done.
student_id first_name last_name student_id:1 email_address student_id:2 course_id course_id:1 course_name
1 Mary Wilson 1 [email protected] 1 500 500 Cryptography
2 Tim Ben 2 [email protected] 2 500 500 Cryptography
2 Tim Ben 2 [email protected] 2 400 400 Quantum Computing
3 Alice Robinson 3 [email protected] 3 400 400 Quantum Computing

There are duplicate columns in the result. These can be removed by listing the names of the columns in the outermost SELECT statement.

If you’re having a hard time understanding how subqueries work, our SQL Subqueries course will help you get started. If you’re new to SQL and you’re having difficulties understanding the SQL queries, we recommend that you learn the basics from our SQL Fundamentals Skill Path.

14. Write the SQL queries to execute the operations shown in the diagrams

image-2.png

Answer

Diagram 1 set notation is $A – B$, read as A difference B. We want all the elements that belong to A but not B.

Let’s assume A is students table and B is student_contact. We want to return only records that are in students but not in student_contact.

Let’s view the tables:

%sql SELECT * FROM students;
     * sqlite:///joins.db
    Done.
student_id first_name last_name
1 Mary Wilson
2 Tim Ben
3 Alice Robinson
4 Reece Bells
%sql SELECT * FROM student_contact;
     * sqlite:///joins.db
    Done.
student_id email_address
1 [email protected]
2 [email protected]
3 [email protected]

Only Reece Bells satisfies the condition in diagram 1. The query is given below:

%%sql

SELECT *
FROM students
LEFT JOIN student_contact
    USING (student_id)
WHERE student_contact.student_id IS NULL;
     * sqlite:///joins.db
    Done.
student_id first_name last_name email_address
4 Reece Bells None

LEFT JOIN returns all the rows in students table, and student_contact.student_id IS NULL sets the intersecting rows to NULL.

Diagram 2 set notation is $(A \cap B)^c$. This is the elements of both A and B that are not common to them. We’ll use the last_enrolment and courses tables.

Let’s view the tables:

%sql SELECT * from last_enrolment;
     * sqlite:///joins.db
    Done.
student_id course_id
2 500
1 500
3 400
2 400
4 111
%sql SELECT * FROM courses;
     * sqlite:///joins.db
    Done.
course_id course_name
100 Advanced Physics
200 Computer Science
300 Economics
400 Quantum Computing
500 Cryptography

The common keys are 400 and 500. These will be excluded. The returned table contains 111, 100, 200, and 300. The query is shown below:

%%sql

SELECT * 
FROM last_enrolment
FULL OUTER JOIN courses
    USING(course_id)
WHERE last_enrolment.course_id IS NULL
    OR courses.course_id is NULL;
     * sqlite:///joins.db
    Done.
student_id course_id course_name
4 111 None
None 100 Advanced Physics
None 200 Computer Science
None 300 Economics

15. Discuss a common pitfall with using JOIN

Answer

There are different types of JOINs. We may need to implement a particular type of JOIN to answer our question correctly. As you’re already aware, if you don’t specify the conditional expression, your SQL query will not fail. It will return a CROSS JOIN. If you go on to perform your analysis on the CROSS JOIN, you’ll get an incorrect result. You’ll also get incorrect results when you don’t join your tables with the appropriate conditional expressions.

You’ll also get incorrect results if you incorrectly filter your JOIN with the WHERE clause. Incorrectly filtering can result in an unintended type of JOIN. For example, a LEFT JOIN can be transformed to an INNER JOIN with an incorrect WHERE clause.

Let’s see the LEFT JOIN result:

%%sql

-- LEFT OUTER JOIN of the students and student_contact

SELECT *
FROM students
LEFT JOIN student_contact
    ON students.student_id = student_contact.student_id;
     * sqlite:///joins.db
    Done.
student_id first_name last_name student_id_1 email_address
1 Mary Wilson 1 [email protected]
2 Tim Ben 2 [email protected]
3 Alice Robinson 3 [email protected]
4 Reece Bells None None

Now let’s incorrectly filter the LEFT JOIN to transform it into an INNER JOIN:

%%sql

-- LEFT OUTER JOIN transformed to INNER JOIN by incorrect filtering

SELECT *
FROM students
LEFT JOIN student_contact
    ON students.student_id = student_contact.student_id
WHERE student_contact.student_id > 0;
     * sqlite:///joins.db
    Done.
student_id first_name last_name student_id_1 email_address
1 Mary Wilson 1 [email protected]
2 Tim Ben 2 [email protected]
3 Alice Robinson 3 [email protected]

Logical errors are introduced into our program this way. Logical errors do not return error messages, making them difficult to detect — especially when we’re working with large tables.

Close the connection to the joins.db database:

%sql -x / --close sqlite:///joins.db

16. How will you structure data for facilitating JOIN in a one-to-many relationship condition?

Answer

In a one-to-many relationship, one record in table A can be associated to more than one record in table B. So the primary key of A is a foreign key in B. The students and last_enrolment tables share a one-to-many relationship, as a student can be enrolled in more than one course.

-- create a students table:

CREATE TABLE students (
    student_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL
)

--create last_enrolment table

CREATE TABLE last_enrolment (
    student_id,
    course_id,
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

In the above code snippet, the student_id PRIMARY KEY in the students table is set as FOREIGN KEY in the last_enrolment table.

17. How will you structure data for facilitating JOIN in a many-to-many relationship condition?

Answer

In a many-to-many relationship, multiple records in table A are associated with multiple records in table B. There is a many-to-many relationship between the students and courses tables. A student can take more than one course, and there can be more than one student in a course.

The many-to-many relationship in our example is broken into two one-to-many relationships. The first one-to-many relationship is the relationship between students and last_enrolment, and the second one-to-many relationship is the relationship between courses and last_enrolment. The last_enrolment table connects the students and courses tables and is referred to as a join table.

-- create a students table:

CREATE TABLE students (
    student_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL
)

-- CREATE the courses table

CREATE TABLE courses (
    course_id INTEGER PRIMARY KEY,
    course_name TEXT NOT NULL
)

--create last_enrolment table

CREATE TABLE last_enrolment (
    student_id,
    course_id,
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

18. Write a query to return information on people that love Jazz music

Write a query to return the email and full_names of people that have bought Jazz music. You’ll be working with the chinook.db. You can find the schema and download the file from here.

Answer

First, we connect to the database and list the tables in it:

%sql sqlite:///chinook.db --connect to the database
%%sql

-- List the tables in the database

SELECT name 
FROM sqlite_master 
WHERE type = 'table';
     * sqlite:///chinook.db
    Done.
name
album
artist
customer
employee
genre
invoice
invoice_line
media_type
playlist
playlist_track
track

We’ll return the email, first_name, and last_name columns on the customer table for the subset where the genre is Jazz. We want to connect the customer table to the genre table. The schema will be helpful here. In the schema, we’ll need to join the customer, invoice, invoice_line, track, and genre tables to get the information we want.

%%sql

SELECT
    DISTINCT customer.email,
    customer.first_name || " " || customer.last_name AS full_name
FROM customer
INNER JOIN invoice
    ON customer.customer_id = invoice.customer_id
INNER JOIN invoice_line 
    ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
    ON track.track_id = invoice_line.track_id
INNER JOIN genre
    ON genre.genre_id = track.genre_id
WHERE genre.name = 'Jazz'
ORDER BY 1;
     * sqlite:///chinook.db
    Done.
email full_name
[email protected] Aaron Mitchell
[email protected] Astrid Gruber
[email protected] Bjørn Hansen
[email protected] Camille Bernard
[email protected] Daan Peeters
[email protected] Dan Miller
[email protected] Edward Francis
[email protected] Eduardo Martins
[email protected] Ellie Sullivan
[email protected] Enrique Muñoz
[email protected] Fernanda Ramos
[email protected] František Wichterlová
[email protected] François Tremblay
[email protected] Hannah Schneider
[email protected] Helena Holý
[email protected] Heather Leacock
[email protected] Isabelle Mercier
[email protected] Jack Smith
[email protected] João Fernandes
[email protected] Johannes Van der Berg
[email protected] John Gordon
[email protected] Julia Barnett
[email protected] Kara Nielsen
[email protected] Leonie Köhler
[email protected] Lucas Mancini
[email protected] Luís Gonçalves
[email protected] Luis Rojas
[email protected] Manoj Pareek
[email protected] Marc Dubois
[email protected] Madalena Sampaio
[email protected] Michelle Brooks
[email protected] Mark Philips
[email protected] Niklas Schröder
[email protected] Patrick Gray
[email protected] Phil Hughes
stanisław.wó[email protected] Stanisław Wójcik
[email protected] Steve Murray
[email protected] Terhi Hämäläinen
[email protected] Tim Goyer
[email protected] Victor Stevens
[email protected] Wyatt Girard

It’s possible for a customer to buy several Jazz music items. So the customer details will be returned as many times as they bought Jazz music. We’ve used the DISTINCT keyword to return only one detail for this type of customer.

19. Write a query to return the top 5 spenders on Jazz and Rock

Answer

We’ll continue to work with the chinook.db. We need to join the customer and genre tables and filter by the type of genre. We’ll be returning the customer full_name, and the amount_spent. To get the amount spent, we’ll look at the invoice_line table for the unit_price and quantity purchased.

A single customer can buy music from a genre more than once. Their information will apply multiple times. You need to GROUP BY the customer full_name and SUM the values from the product of unit_price and quantity to get the total amount spent on that genre:

%%sql

-- Top 5 Spenders on Jazz

SELECT
    customer.first_name || " " || customer.last_name AS full_name,
    ROUND(SUM(invoice_line.unit_price * invoice_line.quantity), 2) AS amount_spent
FROM customer
INNER JOIN invoice
    ON customer.customer_id = invoice.customer_id
INNER JOIN invoice_line 
    ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
    ON track.track_id = invoice_line.track_id
INNER JOIN genre
    ON genre.genre_id = track.genre_id
WHERE genre.name = 'Jazz'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
     * sqlite:///chinook.db
    Done.
full_name amount_spent
Fernanda Ramos 15.84
Enrique Muñoz 15.84
Hannah Schneider 14.85
Astrid Gruber 14.85
Kara Nielsen 3.96
%%sql

-- Top 5 Spenders on Rock

SELECT
    customer.first_name || " " || customer.last_name AS full_name,
    ROUND(SUM(invoice_line.unit_price * invoice_line.quantity), 2) AS amount_spent
FROM customer
INNER JOIN invoice
    ON customer.customer_id = invoice.customer_id
INNER JOIN invoice_line 
    ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
    ON track.track_id = invoice_line.track_id
INNER JOIN genre
    ON genre.genre_id = track.genre_id
WHERE genre.name = 'Rock'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
     * sqlite:///chinook.db
    Done.
full_name amount_spent
Helena Holý 75.24
François Tremblay 74.25
Luís Gonçalves 71.28
Hugh O'Reilly 71.28
João Fernandes 67.32

You can easily get the amount_spent values wrong if you use the total column in the invoice table. This is because a customer can buy songs from more than one genre on the same invoice, and the total column will contain the full amount spent, not the amount spent on a particular genre.

This is demonstrated in the query below. Ellie Sullivan bought 10 songs from different genres on 2017-04-16. The total amount spent on all the songs is 9.9, which is different from the amount spent on a particular genre.

%%sql

-- Common mistake made using invoice total to calculate amount spend on a genre

SELECT
    customer.first_name || " " || customer.last_name AS full_name,
    genre.name,
    invoice_line.unit_price,
    invoice_line.quantity,
    invoice.total
FROM customer
INNER JOIN invoice
    ON customer.customer_id = invoice.customer_id
INNER JOIN invoice_line 
    ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
    ON track.track_id = invoice_line.track_id
INNER JOIN genre
    ON genre.genre_id = track.genre_id
WHERE customer.first_name = 'Ellie' AND customer.last_name = 'Sullivan' AND invoice.invoice_date = '2017-04-16 00:00:00';
     * sqlite:///chinook.db
    Done.
full_name name unit_price quantity total
Ellie Sullivan Metal 0.99 1 9.9
Ellie Sullivan Rock 0.99 1 9.9
Ellie Sullivan Rock 0.99 1 9.9
Ellie Sullivan Rock 0.99 1 9.9
Ellie Sullivan Rock 0.99 1 9.9
Ellie Sullivan Rock 0.99 1 9.9
Ellie Sullivan Rock 0.99 1 9.9
Ellie Sullivan Rock 0.99 1 9.9
Ellie Sullivan Metal 0.99 1 9.9
Ellie Sullivan Alternative & Punk 0.99 1 9.9

20. Simplify the query to return the top 5 spenders on Jazz with CTE

CTE stands for Common Table Expression. It enables us to simplify complex queries by deconstructing them into simple blocks. We initiate a CTE with the WITH clause. The SQL query that answers the previous question is a bit complex. We can make the query more readable by performing most of the join operations with CTE.

%%sql

-- Start of JOIN operations with CTE

WITH cte_table AS (
    SELECT
        invoice.customer_id,
        invoice_line.unit_price,
        invoice_line.quantity
    FROM invoice
        INNER JOIN invoice_line 
            ON invoice_line.invoice_id = invoice.invoice_id
        INNER JOIN track
            ON track.track_id = invoice_line.track_id
        INNER JOIN genre
            ON genre.genre_id = track.genre_id
    WHERE genre.name = 'Jazz'
)

-- End of JOIN operations with CTE

-- Simplified query to return the top 5 spenders on Jazz

SELECT
    customer.first_name || " " || customer.last_name AS full_name,
    ROUND(SUM(cte_table.unit_price * cte_table.quantity), 2) AS amount_spent
FROM customer
INNER JOIN cte_table
    ON customer.customer_id = cte_table.customer_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
     * sqlite:///chinook.db
    Done.
full_name amount_spent
Fernanda Ramos 15.84
Enrique Muñoz 15.84
Hannah Schneider 14.85
Astrid Gruber 14.85
Kara Nielsen 3.96

Close connection to chinook.db:

%sql -x / --close sqlite:///chinook.db

Takeaway

In this tutorial, we discussed the top 20 SQL JOINs interview questions and provided a step-by-step guide to their answers. If you aspire to become a business analyst, data analyst, or data scientist, knowledge of SQL is one of the leading requirements for your role. No two ways about it: you must learn SQL programming.

As you work with relational databases, one of the most common SQL operations you’ll perform is joining tables. This is why SQL JOIN questions are so popular with interviewers. At your SQL interviews, you’ll get several questions on SQL JOINs to test your knowledge of how they work. We’ve provided answers to some of the most common questions you may be asked.

If you haven’t learned the basics of SQL yet, this tutorial may not be helpful. Learn how to explore, query, join, and filter data from relational databases in our SQL Fundamentals Skill Path. You’ll master the SQL syntaxes and basic commands that you’ll need to start working with relational databases. On our interactive learning platform, you get to practice writing your own SQL queries.

We understand that porfolio projects are very important, and they help showcase your SQL programming and data analysis skills to potential recruiters. You get to complete real-life SQL projects on the SQL Fundamentals Skill Path. We’ve also curated 10 exiciting SQL project ideas to help you get more practice and projects in your SQL portfolio.

Aghogho Monorien

About the author

Aghogho Monorien

Aghogho is an engineer and aspiring Quant working on the applications of artificial intelligence in finance.

Learn data skills for free

Headshot Headshot

Join 1M+ learners

Try free courses