February 21, 2022

Tutorial: Connect, Install, and Query PostgreSQL in Python

Databases are everywhere — in your phone, on your computer, and behind your beloved applications. But what's a database worth if you can't query data from it? In this article, we'll show you examples of querying any PostgreSQL-based database from your Python code. First, you'll gain a high-level understanding of PostgreSQL and database connectors. Later in the article, we'll dive deep into actual code examples and other tips on how to use Python to query your database. Let's get started!

What Is PostgreSQL?

PostgreSQL is one of the most popular open-source relational databases. Companies of all sizes and developers around the globe use it. According to DB-Engines, PostgreSQL ranks fourth among the most-popular databases in the world, and it has an upward trend. This isn't a surprise, as you can find PostgreSQL databases behind many web and mobile applications — and even analytical software.

PostgreSQL also has a rich ecosystem with a vast number of tools and extensions that integrate well with the core database. For these reasons, PostgreSQL makes a great choice whether you need a transactional or an analytical database, or wish to build your own custom database solution.

Now that you have an idea of what PostgreSQL is, let's cover how you can use Python to connect to the database.

How to Connect to PostgreSQL from Python?

In order to connect to a PostgreSQL database instance from your Python script, you need to use a database connector library. In Python, you have several options that you can choose from. Some libraries that are written in pure Python include pg8000 and py-postgresql. But the most popular and widely-known one is Psycopg2.

For the rest of this article, you'll see examples of using Psycopg2 to connect to the database and query data.

But first, what is Psycopg2?

What Is Psycopg2?

Psycopg2 is the most widely known and used PostgreSQL connector library in Python. The Psycopg2 library implements the Python DB API 2.0 specifications and (under the hood) uses the C programming language as a wrapper around the libpq PostgreSQL library. Due to its C implementation, Psycopg2 is very fast and efficient.

You can use Psycopg2 to fetch one or more rows from the database based on a SQL query. If you want to insert some data into the database, that's also possible with this library — with multiple options for single or batch inserting.

A completely rewritten implementation of the Python PostgreSQL connector is currently under active development: Psycopg3. Psycopg3 provides the same features as Psycopg2, alongside additional capabilities like asynchronous communication, data ingestion using COPY, and more. It strives to make better use of the functionalities provided by new Python and PostgreSQL generations.

How to Install Psycopg2

To use Psycopg2, you need to install it first. The easiest way is to use pip. As with other Python projects, it's recommended to use a virtual environment for installing the library:

virtualenv env && source env/bin/activate
pip install psycopg2-binary

This snippet will install the Psycopg2 library into your Python virtual environment with all its dependencies. After this, you are able to import the psycopg2 module into your Python code and use it.

Should You Use Psycopg2 or Psycopg2-binary?

As you might have noticed, we installed the psycopg2-binary package, which is the binary version of Psycopg2. What this means is that this version of the library comes with its own version of C libraries, namely liboq and libssl. For Psycopg2 beginners and most users, this version is perfectly fine. On the other hand, if you wish Psycopg2 to use your system's C libraries, you need to build Psycopg2 from source:

pip install psycopg2

Psycopg2, being a database connector library, is essential in many serverless architecture-based data pipelines. Let's quickly cover how you can install Psycopg2 in a serverless environment.

How to Install Psycopg2 in a Serverless Environment

Psycopg2 is often used in AWS Lambda functions or in other serverless environments. Because Psycopg2 depends on a couple of PostgreSQL libraries — which aren't necessarily available in a serverless environment, or not easily — it's recommended to use a pre-compiled version of the library, for example, this one.

You can also find other pre-compiled versions of Psycopg2 by searching for "psycopg2 serverless" or "psycopg2 aws lambda" keywords.

Finally, after installing Psycopg2 in your environment, here's a quick tutorial on how to query data from your database.

How to Query PostgreSQL Using Python

In order to query your database, you need to use two Psycopg2 objects:

First, you need to import the Psycopg2 module and create a connection object:

import psycopg2

conn = psycopg2.connect(database="db_name",
                        host="db_host",
                        user="db_user",
                        password="db_pass",
                        port="db_port")

As you can see, there are quite a few arguments you need to define in order to connect to the database. Here's a quick summary of what each of these arguments means:

  • database: the name of the database that you want to connect to (you can only connect to one database with one connection object)
  • host: this is probably an IP address or a URL that points to the database server (e.g., xyz.example.com)
  • user: the name of the PostgreSQL user
  • password: the matching password for that user
  • port: the port that the PostgreSQL server uses (usually 5432 if the server is hosted locally but can be other)

If you submitted the correct database credentials, you get a live database connection object that you can use to create a cursor object.

A cursor object will help you execute any queries on the database and retrieve data. Here's how to create a cursor object:

cursor = conn.cursor()

Now let's query the database using the cursor we just created:

cursor.execute("SELECT * FROM example_table")

We use the execute() function and submit a query string as its argument. This query that we submitted will be run against the database. It's important to note that after the query executes, you still need to use one of the Psycopg2 functions to retrieve data rows:

Let's see how each of them works!

Example: fetchone()

The most basic way to fetch data from your database is to use the fetchone() function. This function will return exactly one row — the first row — after executing the SQL query.

Here's an example:

print(cursor.fetchone())
(1, 'Budapest', 'newly-built', 'after 2011', 30, 1)

In this example, fetchone() returns one row from the database in the form of a tuple where the order of your data in the tuple will be based on the order of the columns you specified in the query.

Because of this, it's important to make sure you specify the order of columns properly when you create the query string so you know which data is which in the tuple.

Example: fetchall()

What if you need more than just one row from your database? What if you need 10, 100, 1000, or more rows? You can use the fetchall() Psycopg2 function, which works the same way as fetchone() except that it returns not just one row as a result but all of them.

print(cursor.fetchall())
[(1, 'Budapest', 'newly-built', 'after 2011', 30, 1),
 (2, 'Budapest', 'newly-built', 'after 2011', 45, 2),
 (3, 'Budapest', 'newly-built', 'after 2011', 32, 2),
 (4, 'Budapest', 'newly-built', 'after 2011', 48, 2),
 (5, 'Budapest', 'newly-built', 'after 2011', 49, 2),
 (6, 'Budapest', 'newly-built', 'after 2011', 49, 2),
 (7, 'Budapest', 'newly-built', 'after 2011', 71, 3),
 (8, 'Budapest', 'newly-built', 'after 2011', 50, 2),
 (9, 'Budapest', 'newly-built', 'after 2011', 50, 2),
 (10, 'Budapest', 'newly-built', 'after 2011', 57, 3)]
[...]

Notice how we get more rows back, not just one.

Example: fetchmany()

With fetchmany(), you have another option to retrieve multiple records from the database and have more control over the exact amount of rows retrieved.

print(cursor.fetchmany(size=5))
[(1, 'Budapest', 'newly-built', 'after 2011', 30, 1),
 (2, 'Budapest', 'newly-built', 'after 2011', 45, 2),
 (3, 'Budapest', 'newly-built', 'after 2011', 32, 2),
 (4, 'Budapest', 'newly-built', 'after 2011', 48, 2),
 (5, 'Budapest', 'newly-built', 'after 2011', 49, 2)]

Here we only receive five rows because we set the size argument to 5. This function gives you more control on a code-level over how many rows to return from your database table.

Wrapping Up

After you're finished querying your database and using the connection object in your Python code, make sure to always close the connection using conn.close().

I hope this article was useful to help you get started with Python and PostgreSQL. I highly suggest reading the Psycopg2 documentation to learn more about different data retrieval methods, cursor classes, and more.

Happy coding!

Dataquest

About the author

Dataquest

Dataquest teaches through challenging exercises and projects instead of video lectures. It's the most effective way to learn the skills you need to build your data career.