August 22, 2025

Kubernetes Services, Rolling Updates, and Namespaces

In our previous lesson, you saw Kubernetes automatically replace a crashed Pod. That's powerful, but it reveals a fundamental challenge: if Pods come and go with new IP addresses each time, how do other parts of your application find them reliably?

Today we'll solve this networking puzzle and tackle a related production challenge: how do you deploy updates without breaking your users? We'll work with a realistic data pipeline scenario where a PostgreSQL database needs to stay accessible while an ETL application gets updated.

By the end of this tutorial, you'll be able to:

Explain why Services exist and how they provide stable networking for changing Pods
Perform zero-downtime deployments using rolling updates
Use Namespaces to separate different environments
Understand when your applications need these production-grade features

The Moving Target Problem

Let's extend what you built in the previous tutorial to see why we need more than just Pods and Deployments.. You deployed a PostgreSQL database and connected to it directly using kubectl exec. Now imagine you want to add a Python ETL script that connects to that database automatically every hour.

Here's the challenge: your ETL script needs to connect to PostgreSQL, but it doesn't know the database Pod's IP address. Even worse, that IP address changes every time Kubernetes restarts the database Pod.

You could try to hardcode the current Pod IP into your ETL script, but this breaks the moment Kubernetes replaces the Pod. You'd be back to manually updating configuration every time something restarts, which defeats the purpose of container orchestration.

This is where Services come in. A Service acts like a stable phone number for your application. Other Pods can always reach your database using the same address, even when the actual database Pod gets replaced.

How Services Work

Think of a Service as a reliable middleman. When your ETL script wants to talk to PostgreSQL, it doesn't need to hunt down the current Pod's IP address. Instead, it just asks for "postgres" and the Service handles finding and connecting to whichever PostgreSQL Pod is currently running. When you create a Service for your PostgreSQL Deployment:

Kubernetes assigns a stable IP address that never changes
DNS gets configured so other Pods can use a friendly name instead of remembering IP addresses
The Service tracks which Pods are healthy and ready to receive traffic
When Pods change, the Service automatically updates its routing without any manual intervention

Your ETL script can connect to postgres:5432 (a DNS name) instead of an IP address. Kubernetes handles all the complexity of routing that request to whichever PostgreSQL Pod is currently running.

Building a Realistic Pipeline

Let's set up that data pipeline and see Services in action. We'll create both the database and the ETL application, then demonstrate how they communicate reliably even when Pods restart.

Start Your Environment

First, make sure you have a Kubernetes cluster running. A cluster is your pool of computing resources - in Minikube's case, it's a single-node cluster running on your local machine.

If you followed the previous tutorial, you can reuse that environment. If not, you'll need Minikube installed - follow the installation guide if needed.

Start your cluster:

minikube start

Notice in the startup logs how Minikube mentions components like 'kubelet' and 'apiserver' - these are the cluster components working together to create your computing pool.

Set up kubectl access using an alias (this mimics how you'll work with production clusters):

alias kubectl="minikube kubectl --"

Verify your cluster is working:

kubectl get nodes

Deploy PostgreSQL with a Service

Let's start by cleaning up any leftover resources from the previous tutorial and creating our database with proper Service networking:

kubectl delete deployment hello-postgres --ignore-not-found=true

Now create the PostgreSQL deployment:

kubectl create deployment postgres --image=postgres:13
kubectl set env deployment/postgres POSTGRES_DB=pipeline POSTGRES_USER=etl POSTGRES_PASSWORD=mysecretpassword

The key step is creating a Service that other applications can use to reach PostgreSQL:

kubectl expose deployment postgres --port=5432 --target-port=5432 --name=postgres

This creates a ClusterIP Service. ClusterIP is the default type of Service that provides internal networking within your cluster - other Pods can reach it, but nothing outside the cluster can access it directly. The --port=5432 means other applications connect on port 5432, and --target-port=5432 means traffic gets forwarded to port 5432 inside the PostgreSQL Pod.

Verify Service Networking

Let's verify that the Service is working. First, check what Kubernetes created:

kubectl get services

You'll see output like:

NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    1h
postgres     ClusterIP   10.96.123.45    <none>        5432/TCP   30s

The postgres Service has its own stable IP address (10.96.123.45 in this example). This IP never changes, even when the underlying PostgreSQL Pod restarts.

The Service is now ready for other applications to use. Any Pod in your cluster can reach PostgreSQL using the hostname postgres, regardless of which specific Pod is running the database. We'll see this in action when we create the ETL application.

Create the ETL Application

Now let's create an ETL application that connects to our database. We'll use a modified version of the ETL script from our Docker Compose tutorials - it's the same database connection logic, but adapted to run continuously in Kubernetes.
First, clone the tutorial repository and navigate to the ETL application:

git clone https://github.com/dataquestio/tutorials.git
cd tutorials/kubernetes-services-starter

This folder contains two important files:

app.py: the ETL script that connects to PostgreSQL
Dockerfile: instructions for packaging the script in a container

Build the ETL image in Minikube's Docker environment so Kubernetes can run it directly:

# Point your Docker CLI to Minikube's Docker daemon
eval $(minikube -p minikube docker-env)

# Build the image
docker build -t etl-app:v1 .

Using a version tag (v1) instead of latest makes it easier to demonstrate rolling updates later.

Now, create the Deployment and set environment variables so the ETL app can connect to the postgres Service:

kubectl create deployment etl-app --image=etl-app:v1
kubectl set env deployment/etl-app \
  DB_HOST=postgres \
  DB_PORT=5432 \
  DB_USER=etl \
  DB_PASSWORD=mysecretpassword \
  DB_NAME=pipeline

Scale the deployment to 2 replicas:

kubectl scale deployment etl-app --replicas=2

Check that everything is running:

kubectl get pods

You should see the PostgreSQL Pod and two ETL application Pods all in "Running" status.

Verify the Service Connection

Let's quickly verify that our ETL application can reach the database using the Service name by running the ETL script manually:

kubectl exec deployment/etl-app -- python3 app.py

You should see output showing the ETL script successfully connecting to PostgreSQL using postgres as the hostname. This demonstrates the Service providing stable networking - the ETL Pod found the database without needing to know its specific IP address.

Zero-Downtime Updates with Rolling Updates

Here's where Kubernetes really shines in production environments. Let's say you need to deploy a new version of your ETL application. In traditional deployment approaches, you might need to stop all instances, update them, and restart everything. This creates downtime.

Kubernetes rolling updates solve this by gradually replacing old Pods with new ones, ensuring some instances are always running to handle requests.

Watch a Rolling Update in Action

First, let's set up a way to monitor what's happening. Open a second terminal and run:

# Make sure you have the kubectl alias in this terminal too
alias kubectl="minikube kubectl --"

# Watch the logs from all ETL Pods
kubectl logs -f -l app=etl-app --all-containers --tail=50

Leave this running. Back in your main terminal, rebuild a new version and tell Kubernetes to use it:

# Ensure your Docker CLI is still pointed at Minikube
eval $(minikube -p minikube docker-env)

# Build v2 of the image
docker build -t etl-app:v2 .

# Trigger the rolling update to v2
kubectl set image deployment/etl-app etl-app=etl-app:v2

Watch what happens in both terminals:

In the logs terminal: You'll see some Pods stopping and new ones starting with the updated image
In the main terminal: Run kubectl get pods -w to watch Pods being created and terminated in real-time

The -w flag keeps the command running and shows changes as they happen. You'll see something like:

NAME                       READY   STATUS    RESTARTS   AGE
etl-app-5d8c7b4f6d-abc123  1/1     Running   0          2m
etl-app-5d8c7b4f6d-def456  1/1     Running   0          2m
etl-app-7f9a8c5e2b-ghi789  1/1     Running   0          10s    # New Pod
etl-app-5d8c7b4f6d-abc123  1/1     Terminating  0       2m     # Old Pod stopping

Press Ctrl+C to stop watching when the update completes.

What Just Happened?

Kubernetes performed a rolling update with these steps:

Created new Pods with the updated image tag (v2)
Waited for new Pods to be ready and healthy
Terminated old Pods one at a time
Repeated until all Pods were updated

At no point were all your application instances offline. If this were a web service behind a Service, users would never notice the deployment happening.

You can check the rollout status and history:

kubectl rollout status deployment/etl-app
kubectl rollout history deployment/etl-app

The history shows your deployments over time, which is useful for tracking what changed and when.

Environment Separation with Namespaces

So far, everything we've created lives in Kubernetes' "default" namespace. In real projects, you typically want to separate different environments (development, staging, production, CI/CD) or different teams' work. Namespaces provide this isolation.

Think of Namespaces as separate workspaces within the same cluster. Resources in different Namespaces can't directly see each other, which prevents accidental conflicts and makes permissions easier to manage.

This solves real problems you encounter as applications grow. Imagine you're developing a new feature for your ETL pipeline - you want to test it without risking your production data or accidentally breaking the version that's currently processing real business data. With Namespaces, you can run a complete copy of your entire pipeline (database, ETL scripts, everything) in a "staging" environment that's completely isolated from production. You can experiment freely, knowing that crashes or bad data in staging won't affect the production system that your users depend on.

Create a Staging Environment

Let's create a completely separate staging environment for our pipeline:

kubectl create namespace staging

Now deploy the same applications into the staging namespace by adding -n staging to your commands:

# Deploy PostgreSQL in staging
kubectl create deployment postgres --image=postgres:13 -n staging
kubectl set env deployment/postgres \
  POSTGRES_DB=pipeline POSTGRES_USER=etl POSTGRES_PASSWORD=stagingpassword -n staging
kubectl expose deployment postgres --port=5432 --target-port=5432 --name=postgres -n staging

# Deploy ETL app in staging (use the image you built earlier)
kubectl create deployment etl-app --image=etl-app:v1 -n staging
kubectl set env deployment/etl-app \
  DB_HOST=postgres DB_PORT=5432 DB_USER=etl DB_PASSWORD=stagingpassword DB_NAME=pipeline -n staging
kubectl scale deployment etl-app --replicas=2 -n staging

See the Separation in Action

Now you have two complete environments. Compare them:

# Production environment (default namespace)
kubectl get pods

# Staging environment
kubectl get pods -n staging

# All resources in staging
kubectl get all -n staging

# See all Pods across all namespaces at once
kubectl get pods --all-namespaces

Notice that each environment has its own set of Pods, Services, and Deployments. They're completely isolated from each other.

Cross-Namespace DNS

Within the staging namespace, applications still connect to postgres:5432 just like in production. But if you needed an application in staging to connect to a Service in production, you'd use the full DNS name: postgres.default.svc.cluster.local.

The pattern is: <service-name>.<namespace>.svc.<cluster-domain>

Here, svc is a fixed keyword that stands for "service", and cluster.local is the default cluster domain. This reveals an important concept: even though you're running Minikube locally, you're working with a real Kubernetes cluster - it just happens to be a single-node cluster running on your machine. In production, you'd have multiple nodes, but the DNS structure works exactly the same way.

This means:

postgres reaches the postgres Service in the current namespace
postgres.staging.svc reaches the postgres Service in the staging namespace from anywhere
postgres.default.svc reaches the postgres Service in the default namespace from anywhere

Understanding Clusters and Scheduling

Before we wrap up, let's briefly discuss some concepts that are important to understand conceptually, even though you won't work with them directly in local development.

Clusters and Node Pools

As a quick refresher, a Kubernetes cluster is a set of physical or virtual machines that work together to run containerized applications. It’s made up of a control plane that manages the cluster and worker nodes to handle the workload. In production Kubernetes environments (like Google GKE or Amazon EKS), these nodes are often grouped into node pools with different characteristics:

Standard pool: General-purpose nodes for most applications
High-memory pool: Nodes with lots of RAM for data processing jobs
GPU pool: Nodes with graphics cards for machine learning workloads
Spot/preemptible pool: Cheaper nodes that can be interrupted, good for fault-tolerant batch jobs

Pod Scheduling

Kubernetes automatically decides which node should run each Pod based on:

Resource requirements: CPU and memory requests/limits
Node capacity: Available resources on each node
Affinity rules: Preferences about which nodes to use or avoid
Constraints: Requirements like "only run on SSD-equipped nodes"

You rarely need to think about this in local development with Minikube (which only has one node), but it becomes important when running production workloads across multiple machines.

Optional: See Scheduling in Action

If you're curious, you can see a simple example of how scheduling works even in your single-node Minikube cluster:

# "Cordon" your node, marking it as unschedulable for new Pods
kubectl cordon node/minikube

# Try to create a new Pod
kubectl run test-scheduling --image=nginx

# Check if it's stuck in Pending status
kubectl get pods test-scheduling

You should see the Pod stuck in "Pending" status because there are no available nodes to schedule it on.

# "Uncordon" the node to make it schedulable again
kubectl uncordon node/minikube

# The Pod should now get scheduled and start running
kubectl get pods test-scheduling

Clean up the test Pod:

kubectl delete pod test-scheduling

This demonstrates Kubernetes' scheduling system, though you'll mostly encounter this when working with multi-node production clusters.

Cleaning Up

When you're done experimenting:

# Clean up default namespace
kubectl delete deployment postgres etl-app
kubectl delete service postgres

# Clean up staging namespace
kubectl delete namespace staging

# Or stop Minikube entirely
minikube stop

Key Takeaways

You've now experienced three fundamental production capabilities:

Services solve the moving target problem. When Pods restart and get new IP addresses, Services provide stable networking that applications can depend on. Your ETL script connects to postgres:5432 regardless of which specific Pod is running the database.

Rolling updates enable zero-downtime deployments. Instead of stopping everything to deploy updates, Kubernetes gradually replaces old Pods with new ones. This keeps your applications available during deployments.

Namespaces provide environment separation. You can run multiple copies of your entire stack (development, staging, production) in the same cluster while keeping them completely isolated.

These patterns scale from simple applications to complex microservices architectures. A web application with a database uses the same Service networking concepts, just with more components. A data pipeline with multiple processing stages uses the same rolling update strategy for each component.

Next, you'll learn about configuration management with ConfigMaps and Secrets, persistent storage for stateful applications, and resource management to ensure your applications get the CPU and memory they need.

Kubernetes Services, Rolling Updates, and Namespaces

The Moving Target Problem

How Services Work

Building a Realistic Pipeline

Start Your Environment

Deploy PostgreSQL with a Service

Verify Service Networking

Create the ETL Application

Verify the Service Connection

Zero-Downtime Updates with Rolling Updates

Watch a Rolling Update in Action

What Just Happened?

Environment Separation with Namespaces

Create a Staging Environment

See the Separation in Action

Cross-Namespace DNS

Understanding Clusters and Scheduling

Clusters and Node Pools

Pod Scheduling

Optional: See Scheduling in Action

Cleaning Up

Key Takeaways

PySpark Performance Tuning and Optimization

Advanced Concepts in Docker Compose

Kubernetes Services, Rolling Updates, and Namespaces

The Moving Target Problem

How Services Work

Building a Realistic Pipeline

Start Your Environment

Deploy PostgreSQL with a Service

Verify Service Networking

Create the ETL Application

Verify the Service Connection

Zero-Downtime Updates with Rolling Updates

Watch a Rolling Update in Action

What Just Happened?

Environment Separation with Namespaces

Create a Staging Environment

See the Separation in Action

Cross-Namespace DNS

Understanding Clusters and Scheduling

Clusters and Node Pools

Pod Scheduling

Optional: See Scheduling in Action

Cleaning Up

Key Takeaways

More learning resources

PySpark Performance Tuning and Optimization

Advanced Concepts in Docker Compose