September 28, 2022

10 Exciting SQL Project Ideas for Beginners (2022)

SQL Project Ideas

To demonstrate your SQL skills to your future or current employers, you need to create your own projects. This is how you will learn to apply your knowledge in real-world scenarios.

So, let’s look at 10 cool SQL projects that you can do right now. Most of them are beginner-friendly, but the last projects in the list are more complex. Once you have acquired some experience, you can move on to more dififcult (and more rewarding!) projects.

10 Cool SQL Projects You Can Start Right Now

The easiest way to get started with SQL projects is by using the built-in sqlite3 Python package, which implements a lightweight database engine: SQLite. But don’t worry — you don’t have to know Python! Just a few commands will allow us to create our databases and execute queries on them.

All these projects are tagged with one of three levels of difficulty: easy, intermediate, and hard.

1. CIA Data

Level: easy

Before you create your own database, you can analyze a simple database with the CIA factbook containing data for world countries, such as population, area, birth rate, etc. This Dataquest guided project will help you apply your knowledge in a real-world scenario and learn new facts in the process. For example, what is the country with the highest water-to-land ratio?

2. Bank Marketing

Level: easy

What defines a good marketing strategy? What are the indicators we can use to adjust our next marketing campaign? How do we do this for a financial institution? Kaggle has a Bank Marketing dataset, originally uploaded in the UCI Machine Learning Repository, and it contains a lot of data about customers collected from a phone marketing campaign for a Portuguese financial institution.

Let us see how to create a one-table database using sqlite3 and Python. To run the code, you can use Google Colab, where we can also upload the data we need.

import sqlite3
from pathlib import Path

# Create empty database
Path("bank.db").touch()

# Connect to database
conn = sqlite3.connect("bank.db")
c = conn.cursor()

# Create a table
c.execute(
    """CREATE TABLE bank (
    age int, job text, 
    marital text, education text, 
    default_e text, balance int,
    housing text, loan text,
    contact text, day int,
    month text, duration int,
    campaign int, pdays text,
    previous int, poutcome text,
    deposit text
    );"""
)
    <sqlite3.Cursor at 0x7f917402ff40>

We have now successfully created a database with a single empty table. Note that I changed the column name default to default_e because default is a reserved name, and sqlite3 was throwing an error when attempting to create a table with this column name.

Now we have to populate the table with the data we downloaded from Kaggle. We’re using pandas (a popular data analysis package available on Google Colab) to open the csv file and then insert it into the database.

# Import pandas to work with data sets
import pandas as pd

# Open csv file
bank = pd.read_csv("bank.csv")

# Add table to database
bank.to_sql("bank", conn, if_exists="append", index=False)
    11162

Let’s try to fetch one entry from the table. We can do this by using the .execute() function to run an SQL query and the fetchone() method to get just one entry. You can learn more on how to integrate SQL and Python from the lesson Querying SQLite from Python. It’s easy, and it will take you less than an hour to learn.

Briefly, we use:

  1. fetchone() to get one result.
  2. fetchmany() to get n results.
  3. fetchall() to get all results.
# Return one row from the 'bank' table
c.execute("""SELECT * FROM bank""").fetchone()
    (59,
     'admin.',
     'married',
     'secondary',
     'no',
     2343,
     'yes',
     'no',
     'unknown',
     5,
     'may',
     1042,
     1,
     '-1',
     0,
     'unknown',
     'yes')

Ok, so what questions might we have about this dataset?

  • Which professions are the most popular among customers over 45 years old?
  • For how many people with loans did the marketing campaign succeed?
  • Does the success depend on the balance, deposit, or loan?

These questions are easy to answer using basic SQL queries.

3. Supermarket Sales

Difficulty: easy

Thousands of people visit supermarkets every day to buy food and housing supplies. By analyzing this data, we can learn more about customer preferences regarding payment methods, visiting times, and membership in loyalty programs — based on location or gender. Then, we can adjust every single supermarket to increase customer satisfaction and profits.

This dataset contains data about selling points and the customers of three branches of a supermarket company collected over three months. By analyzing this dataset, we can answer these questions:

  • Which branch has the best results in the loyalty program?
  • Does the membership depend on customer rating?
  • Does gross income depend on the proportion of customers in the loyalty program? On payment method?
  • Are there any differences in indicators between men and women?
  • Which product category generates the highest income?

4. Squirrel Data

Difficulty: easy

Now let’s analyze some funny data. Did you know that New York City collects census data . . . about squirrels? Yep, they even have a website.

This dataset from 2018 contains data about 3000+ squirrel sightings, such as age, fur color, activities, interaction with humans, etc.

We can think of many questions:

  • What was the date for the highest number of sightings?
  • Who runs more, adults or juveniles?
  • Which squirrels are more willing to approach humans, and in which areas?
  • Do squirrels who approach humans eat more frequently?
  • We can also group squirrels by location, fur color, and age and see if they are connected to squirrels’ activities.

Even if you don’t find any correlation, it’s still funny to analyze, and may even make a future employer smile — and positive reactions are what we want from employers!

5. 80 Cereals

Difficulty: easy

What’s next? We can gain some insights into our diet and health by analyzing a dataset of 80 cereal types from seven manufacturers containing the data about the content in nutrient groups (like fat or proteins) and elements (like potassium). What questions can we answer?

  • What is the average content of each nutrient and element per manufacturer?
  • What is the number of calories per ounce for each product?
  • What is the average rating per manufacturer? Is it somehow connected to average nutrient content?
  • Which manufacturer possesses the best shelf location?
  • What is the nutritional value of each cereal according to protein, fat, and carbohydrate data.

6. World Happiness Report

Difficulty: intermediate

Let’s diverge from sales, finances, and funny data and analyze some data of more common interest. A nonprofit called Sustainable Development Solutions Network, created by the United Nations, publishes a yearly World Happiness Report, which contains rankings of national happiness based on respondent answers merged with quality of life metrics, such as education, crime rate, healthcare, etc. This Kaggle page contains the reports from 2015-2022. Note that the 2022 report doesn’t have all the columns.

Here are some questions we may answer:

  • What are the top 10 countries for each metric?
  • What are the most generous countries with the best social support?
  • Does the perception of generosity depend on GDP per capita?
  • What are the average happiness scores by region?
  • What are the countries with the highest healthy life expectancy and GDP per capita? Is there any connection?

This dataset will require you to use slightly more complex clauses like GROUP BY.

7. Museums

Difficulty: intermediate

At this point, we can start working with bigger and more complex tables, such as this dataset with data about museums across the USA. It has 25 columns with data ranging from the museum name to its location and revenue. With this dataset, we can wield the power of SQL and add another table to our database to relate different pieces of information as we would in a real-world database. For example, by using the data about US state population, we can join (merge) the two tables and answer the questions from the Inspiration section on Kaggle’s page. In addition, here are some more interesting questions:

  • Which state has the museums with the highest average revenue? Highest expenses?
  • How many museums belong to universities?
  • You can also categorize the states into Standard Federal Regions and answer the questions from the Inspiration section, such as which regions possess the most museums per capita.

This time you will have to use the JOIN clause to merge the tables.

8. YouTube Statistics

Difficulty: intermediate

As we learn more SQL and gain more experience, let’s keep working with more than one table. YouTube contains tons of data, and it’s always easy to ask questions about it. For example, these datasets have data about almost 2,000 YouTube videos and comments, such as the number of views and likes. We can use this to answer some basic questions:

  • What are the most commented-upon videos? Or the most liked?
  • How many total views does each category have? How many likes?
  • What are the most-liked comments?
  • What is the ratio of views/likes per video? Per each category?

But what will really expand your understanding of SQL is answering more elaborate questions:

  • What is the average sentiment score in each keyword category?
  • How many times do company names (i.e., Apple or Samsung) appear in each keyword category?

In this project you’ll have to determine, for example, how to seach for a specific keyword in each cell of a column. No clause suggestion here from me. Google it!

9. 100 Busiest Airports of the World

Difficulty: intermediate

International airports attract millions of people, from tourists to businessmen. An airport is a major infrastructure project and can breathe new life into an isolated area. Hence, analyzing data about them is extremely important to many professionals, even outside the aviation industry. For this project, we can use the dataset about the 100 busiest airports in the world and practice the following data-related skills:

  • Search for a country/city population dataset and compute the number of passengers per capita
  • Search for GDP data and relate it to the number of passengers
  • Download the data set about international tourism, pre-process it for the analysis, and relate this data to the number of passengers

You can see that the questions and procedures become slightly less specific. Don’t be discouraged; it’s part of the learning process. When you’re working on a dataset on your own, you’ll have to develop your own questions based on very little data. In your data science career, you’ll sometimes encounter pretty vague requests from your managers, and you’ll have to determine the exact questions yourself. It does mean that it is the correct way to do things, but this is what sometimes happens, and we have to deal with it.

It’s possible that you’ll have to use subqueries to investigate the relationships between different economic and demographic factors and airport occupancy.

10. World Development Indicators

Difficulty: hard

World Development Indicators is a collection of indicators gathered by the World Bank to assess the development of world areas. The data from Kaggle includes thousands of data points covering areas from agriculture and industry to trade and access to clean water. The data is available as separate csv files or as one database file (sqlite), which you can access using the sqlite3 package. The number of questions that you can answer will be enough for more than one project, and the questions you select will depend on your personal interests. Here are some examples:

  • Did access to clean water in urban and rural populations in different world regions increase or decrease over the last decades?
  • How did the area of arable land change? Did it have any impact on child employment in agriculture?
  • Do CO2 emissions increase with industry development?

The projects from these data may become your capstone project to share with future employers. Therefore, come prepared, and use your expanded skill-set to create the best projects possible!

Conclusion

SQL is an essential skill for any data practitioner, and learning by doing is one of the best ways to acquiring the necessary skills — and have a better chance of starting a career in data. In this article, we considered SQL projects that you can start as a beginner and steadily advance through to more elaborate databases. These projects cover the most basic SQL queries as well as the most advanced queries you can use to work with multiple tables.

To begin your SQL journey, Dataquest offers a course on SQL Fundamentals, which, in one month, will equip you with the essential basics of SQL, which you can immediately apply in a guided project at the end of the course to show off your skills to your future employers.

If you have any questions or doubts, feel free to ask me on LinkedIn or GitHub.

Artur Sannikov

About the author

Artur Sannikov

I am a Molecular Biology student at the University of Padua, Italy interested in bioinformatics and data analysis.

Learn data skills for free

Headshot Headshot

Join 1M+ learners

Try free courses