May 29, 2025

Project Tutorial: Answering Business Questions Using SQL

In this project walkthrough, we'll explore how to use SQL for data analysis from a digital music store and answer critical business questions. By working with the Chinook database—a sample database that represents a digital media store similar to iTunes—we'll demonstrate how SQL can drive data-informed decision-making in a business context.

The Chinook database contains information about artists, albums, tracks, customers, and sales data. Through strategic SQL queries, we'll help the business understand its market trends, evaluate employee performance, and identify growth opportunities. This project showcases real-world SQL applications that data analysts encounter daily.

We'll take you through writing increasingly complex SQL queries, from basic exploratory analysis to advanced queries using Common Table Expressions (CTEs) and subqueries.

What You'll Learn

By the end of this tutorial, you'll know how to:

  • Navigate complex relational database schemas with multiple tables
  • Write SQL queries using joins to connect data across multiple tables
  • Use Common Table Expressions (CTEs) to organize complex queries
  • Apply subqueries to calculate percentages and comparative metrics
  • Analyze business data to provide actionable insights
  • Connect SQL queries to Python in Jupyter

Before You Start: Pre-Instruction

To make the most of this project walkthrough, follow these preparatory steps:

  1. Review the Project
  2. Prepare Your Environment
  3. Get Comfortable with SQL Fundamentals
    • You should be familiar with basic SQL keywords: SELECT, FROM, GROUP BY, and JOIN
    • Some experience with CTEs and subqueries will be helpful, but not required
    • New to Markdown? We recommend learning the basics: Markdown Guide

Setting Up Your Environment

Before we get into our analysis, let's set up our Jupyter environment to work with SQL. We'll use some SQL magic commands that allow us to write SQL directly in Jupyter cells.

%%capture
%load_ext sql
%sql sqlite:///chinook.db

Learning Insight: The %%capture magic command suppresses any output messages from the cell, keeping our notebook clean. The %load_ext sql command loads the SQL extension, and %sql sqlite:///chinook.db connects us to our database.

Now let's verify our connection and explore what tables are available in our database:

%%sql

SELECT name 
FROM sqlite_master 
WHERE type='table';

This special SQLite query shows us all the table names in our database. The Chinook database contains 11 tables representing different aspects of a digital music store:

  • album: Album details
  • artist: Artist information
  • customer: Customer information with assigned support representatives
  • employee: Store employees, including sales support agents
  • genre: Music genres
  • invoice: Sales transactions
  • invoice_line: Individual items within each invoice
  • media_type: Format types (MP3, AAC, etc.)
  • playlist: Curated playlists
  • playlist_track: Tracks within each playlist
  • track: Song information

Understanding the Database Schema

Working with relational databases means understanding how tables connect to each other. The Chinook database uses primary and foreign keys to establish these relationships. Here's a simplified view of the key relationships between the tables we'll be working with:

chinook-schema

  • customer is linked to employee through support_rep_id
  • invoice is linked to customer through customer_id
  • invoice_line is linked to invoice through invoice_id
  • track is linked to album, invoice_line, and genre through album_id, track_id, and genre_id, respectively

Let's preview some of our key tables to understand the data we're working with:

%%sql

SELECT * 
FROM track 
LIMIT 5;
track_id name album_id media_type_id genre_id composer milliseconds bytes unit_price
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99
2 Balls to the Wall 2 2 1 None 342562 5510424 0.99
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99
4 Restless and Wild 3 2 1 F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman 252051 4331779 0.99
5 Princess of the Dawn 3 2 1 Deaffy & R.A. Smith-Diesel 375418 6290521 0.99
%%sql

SELECT * 
FROM invoice_line 
LIMIT 5;
invoice_line_id invoice_id track_id unit_price quantity
1 1 1158 0.99 1
2 1 1159 0.99 1
3 1 1160 0.99 1
4 1 1161 0.99 1
5 1 1162 0.99 1

Learning Insight: When working with a new database, always preview your tables with LIMIT to understand the data structure before writing complex queries. This helps you identify column names, data types, and potential relationships without flooding your output with hundreds of rows.

Business Question 1: Which Music Genres Should We Focus on in the USA?

The Chinook store wants to understand which music genres are most popular in the United States market. This information will help them decide which new albums to add to their catalog. Let's build a query to analyze genre popularity by sales.

Building Our Analysis with a CTE

We'll use a Common Table Expression (CTE) to create a temporary result set that combines data from multiple tables:

%%sql

WITH genre_usa_tracks AS (
    SELECT
        il.invoice_line_id,
        g.name AS genre,
        t.track_id,
        i.billing_country AS country
    FROM track t
    JOIN genre g ON t.genre_id = g.genre_id
    JOIN invoice_line il ON t.track_id = il.track_id
    JOIN invoice i ON il.invoice_id = i.invoice_id
    WHERE i.billing_country = 'USA'
)
SELECT
    genre,
    COUNT(*) AS tracks_sold,
    COUNT(*) * 100.0 / (SELECT COUNT(*) FROM genre_usa_tracks) AS percentage
FROM genre_usa_tracks
GROUP BY genre
ORDER BY tracks_sold DESC;
genre tracks_sold percentage
Rock 561 53.37773549000951
Alternative & Punk 130 12.369172216936251
Metal 124 11.798287345385347
R&B/Soul 53 5.042816365366318
Blues 36 3.4253092293054235
Alternative 35 3.330161750713606
Latin 22 2.093244529019981
Pop 22 2.093244529019981
Hip Hop/Rap 20 1.9029495718363463
Jazz 14 1.3320647002854424
Easy Listening 13 1.236917221693625
Reggae 6 0.570884871550904
Electronica/Dance 5 0.47573739295908657
Classical 4 0.38058991436726924
Heavy Metal 3 0.285442435775452
Soundtrack 2 0.19029495718363462
TV Shows 1 0.09514747859181731

Learning Insight: CTEs make complex queries more readable by breaking them into logical steps. Here, we first create a filtered dataset of USA purchases, then analyze it. The 100.0 in our percentage calculation ensures we get decimal results instead of integer division.

Our results show that Rock music dominates the USA market with over 50% of sales, followed by Latin, Metal, and Alternative & Punk. This suggests the store should prioritize these genres when selecting new inventory.

Key Insights from Genre Analysis

  • Rock dominates: With 561 tracks sold (53.4%), Rock is by far the most popular genre
  • Latin music surprise: The second most popular genre is Latin (10.3%), indicating a significant market segment
  • Long tail effect: Many genres have very small percentages, suggesting niche markets

Business Question 2: Analyzing Employee Sales Performance

The company wants to evaluate its sales support agents' performance to identify top performers and areas for improvement. Let's analyze which employees generate the most revenue.

%%sql

SELECT
    e.first_name || ' ' || e.last_name AS employee_name,
    e.hire_date,
    COUNT(DISTINCT c.customer_id) AS customer_count,
    SUM(i.total) AS total_sales_dollars,
    SUM(i.total) / COUNT(DISTINCT c.customer_id) AS avg_dollars_per_customer
FROM customer c
JOIN invoice i ON c.customer_id = i.customer_id
JOIN employee e ON c.support_rep_id = e.employee_id
GROUP BY e.employee_id, e.hire_date
ORDER BY total_sales_dollars DESC;
employee_name hire_date customer_count total_sales_dollars avg_dollars_per_customer
Jane Peacock 2017-04-01 00:00:00 21 1731.5100000000039 82.45285714285733
Margaret Park 2017-05-03 00:00:00 20 1584.0000000000034 79.20000000000017
Steve Johnson 2017-10-17 00:00:00 18 1393.920000000002 77.44000000000011

Learning Insight: When using GROUP BY with aggregate functions, remember to include all non-aggregated columns in your GROUP BY clause. This is required in most SQL flavors (though SQLite is more forgiving). The || operator concatenates strings in SQLite.

Performance Analysis Results

Our analysis reveals interesting patterns:

  • Jane Peacock leads with the highest average dollars per customer, despite not having the most customers
  • Margaret Park's performance is solid, with metrics close to Jane’s, suggesting a consistent level of customer value delivery
  • Steve Johnson, the newest employee, shows promising performance with metrics similar to more experienced staff

Business Question 3: Combining SQL with Python for Visualization

While SQL excels at data retrieval and transformation, combining it with Python enables powerful visualizations. Let's demonstrate how to pass SQL query results to Python:

import pandas as pd

# Store our query as a string
query = """
SELECT
    genre,
    COUNT(*) AS tracks_sold
FROM genre_usa_tracks
GROUP BY genre
ORDER BY tracks_sold DESC
LIMIT 10;
"""

# Execute the query and store results
result = %sql \$query

# Convert to pandas DataFrame
df = result.DataFrame()

Learning Insight: The %sql inline magic (single percent sign) allows us to execute SQL and capture the results in Python. The dollar sign syntax (\$query) lets us reference Python variables within SQL magic commands.

Challenges and Considerations

During our analysis, we encountered several important SQL concepts worth highlighting:

1. Integer Division Pitfall

When calculating percentages, SQL performs integer division by default:

-- This returns 0 for all percentages
SELECT COUNT(*) / (SELECT COUNT(*) FROM table) AS percentage

-- This returns proper decimals
SELECT COUNT(*) * 100.0 / (SELECT COUNT(*) FROM table) AS percentage

2. JOIN Selection Matters

We used INNER JOIN throughout because we only wanted records that exist in all related tables. If we needed to include customers without invoices, we would use LEFT JOIN instead.

3. Subquery Performance

Our percentage calculation uses a subquery that executes for each row. For larger datasets, consider using window functions or pre-calculating totals in a CTE.

Sharing Your Work with GitHub Gists

GitHub Gists provide an excellent way to share your SQL projects without the complexity of full repositories. Here's how to share your work:

  1. Navigate to gist.github.com
  2. Create a new gist
  3. Name your file with the .ipynb extension for Jupyter notebooks or .sql for SQL scripts
  4. Paste your code and create either a public or secret gist

Gists automatically render Jupyter notebooks with all outputs preserved, making them perfect for sharing analysis results with stakeholders or including in your portfolio of projects.

Summary of Analysis

In this project, we've demonstrated how SQL can answer critical business questions for a digital music store:

  1. Genre Analysis: We identified Rock as the dominant genre in the USA market (53.4%), with Latin music as a surprising second place
  2. Employee Performance: We evaluated sales representatives, discovering that Jane Peacock leads in average revenue per customer
  3. Technical Skills: We applied CTEs, subqueries, multiple joins, and aggregate functions to solve real business problems

These insights enable data-driven decisions about inventory management, employee training, and market strategy.

Next Steps

To extend this analysis and deepen your SQL skills, consider these challenges:

  1. Time-based Analysis: How do sales trends change over time? Add date filtering to identify seasonal patterns
  2. Customer Segmentation: Which customers are the most valuable? Create customer segments based on purchase behavior
  3. Product Recommendations: Which tracks are commonly purchased together? Use self-joins to find associations
  4. International Markets: Expand the genre analysis to compare preferences across different countries

If you're new to SQL and found this project challenging, start with our SQL Fundamentals skill path to build the foundational skills needed for complex analysis. The course covers essential topics like joins, aggregations, and subqueries that we've used throughout this project.

We have some other project walkthrough tutorials you may also enjoy:

Happy querying!

Anna Strahl

About the author

Anna Strahl

A former math teacher of 8 years, Anna always had a passion for learning and exploring new things. On weekends, you'll often find her performing improv or playing chess.