Kubernetes Services, Rolling Updates, and Namespaces
In our previous lesson, you saw Kubernetes automatically replace a crashed Pod. That's powerful, but it reveals a fundamental challenge: if Pods come and go with new IP addresses each time, how do other parts of your application find them reliably?
Today we'll solve this networking puzzle and tackle a related production challenge: how do you deploy updates without breaking your users? We'll work with a realistic data pipeline scenario where a PostgreSQL database needs to stay accessible while an ETL application gets updated.
By the end of this tutorial, you'll be able to:
- Explain why Services exist and how they provide stable networking for changing Pods
- Perform zero-downtime deployments using rolling updates
- Use Namespaces to separate different environments
- Understand when your applications need these production-grade features
The Moving Target Problem
Let's extend what you built in the previous tutorial to see why we need more than just Pods and Deployments.. You deployed a PostgreSQL database and connected to it directly using kubectl exec
. Now imagine you want to add a Python ETL script that connects to that database automatically every hour.
Here's the challenge: your ETL script needs to connect to PostgreSQL, but it doesn't know the database Pod's IP address. Even worse, that IP address changes every time Kubernetes restarts the database Pod.
You could try to hardcode the current Pod IP into your ETL script, but this breaks the moment Kubernetes replaces the Pod. You'd be back to manually updating configuration every time something restarts, which defeats the purpose of container orchestration.
This is where Services come in. A Service acts like a stable phone number for your application. Other Pods can always reach your database using the same address, even when the actual database Pod gets replaced.
How Services Work
Think of a Service as a reliable middleman. When your ETL script wants to talk to PostgreSQL, it doesn't need to hunt down the current Pod's IP address. Instead, it just asks for "postgres" and the Service handles finding and connecting to whichever PostgreSQL Pod is currently running. When you create a Service for your PostgreSQL Deployment:
- Kubernetes assigns a stable IP address that never changes
- DNS gets configured so other Pods can use a friendly name instead of remembering IP addresses
- The Service tracks which Pods are healthy and ready to receive traffic
- When Pods change, the Service automatically updates its routing without any manual intervention
Your ETL script can connect to postgres:5432
(a DNS name) instead of an IP address. Kubernetes handles all the complexity of routing that request to whichever PostgreSQL Pod is currently running.
Building a Realistic Pipeline
Let's set up that data pipeline and see Services in action. We'll create both the database and the ETL application, then demonstrate how they communicate reliably even when Pods restart.
Start Your Environment
First, make sure you have a Kubernetes cluster running. A cluster is your pool of computing resources - in Minikube's case, it's a single-node cluster running on your local machine.
If you followed the previous tutorial, you can reuse that environment. If not, you'll need Minikube installed - follow the installation guide if needed.
Start your cluster:
minikube start
Notice in the startup logs how Minikube mentions components like 'kubelet' and 'apiserver' - these are the cluster components working together to create your computing pool.
Set up kubectl access using an alias (this mimics how you'll work with production clusters):
alias kubectl="minikube kubectl --"
Verify your cluster is working:
kubectl get nodes
Deploy PostgreSQL with a Service
Let's start by cleaning up any leftover resources from the previous tutorial and creating our database with proper Service networking:
kubectl delete deployment hello-postgres --ignore-not-found=true
Now create the PostgreSQL deployment:
kubectl create deployment postgres --image=postgres:13
kubectl set env deployment/postgres POSTGRES_DB=pipeline POSTGRES_USER=etl POSTGRES_PASSWORD=mysecretpassword
The key step is creating a Service that other applications can use to reach PostgreSQL:
kubectl expose deployment postgres --port=5432 --target-port=5432 --name=postgres
This creates a ClusterIP Service. ClusterIP is the default type of Service that provides internal networking within your cluster - other Pods can reach it, but nothing outside the cluster can access it directly. The --port=5432
means other applications connect on port 5432, and --target-port=5432
means traffic gets forwarded to port 5432 inside the PostgreSQL Pod.
Verify Service Networking
Let's verify that the Service is working. First, check what Kubernetes created:
kubectl get services
You'll see output like:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1h
postgres ClusterIP 10.96.123.45 <none> 5432/TCP 30s
The postgres
Service has its own stable IP address (10.96.123.45 in this example). This IP never changes, even when the underlying PostgreSQL Pod restarts.
The Service is now ready for other applications to use. Any Pod in your cluster can reach PostgreSQL using the hostname postgres
, regardless of which specific Pod is running the database. We'll see this in action when we create the ETL application.
Create the ETL Application
Now let's create an ETL application that connects to our database. We'll use a modified version of the ETL script from our Docker Compose tutorials - it's the same database connection logic, but adapted to run continuously in Kubernetes.
First, clone the tutorial repository and navigate to the ETL application:
git clone https://github.com/dataquestio/tutorials.git
cd tutorials/kubernetes-services-starter
This folder contains two important files:
app.py
: the ETL script that connects to PostgreSQLDockerfile
: instructions for packaging the script in a container
Build the ETL image in Minikube's Docker environment so Kubernetes can run it directly:
# Point your Docker CLI to Minikube's Docker daemon
eval $(minikube -p minikube docker-env)
# Build the image
docker build -t etl-app:v1 .
Using a version tag (v1
) instead of latest
makes it easier to demonstrate rolling updates later.
Now, create the Deployment and set environment variables so the ETL app can connect to the postgres
Service:
kubectl create deployment etl-app --image=etl-app:v1
kubectl set env deployment/etl-app \
DB_HOST=postgres \
DB_PORT=5432 \
DB_USER=etl \
DB_PASSWORD=mysecretpassword \
DB_NAME=pipeline
Scale the deployment to 2 replicas:
kubectl scale deployment etl-app --replicas=2
Check that everything is running:
kubectl get pods
You should see the PostgreSQL Pod and two ETL application Pods all in "Running" status.
Verify the Service Connection
Let's quickly verify that our ETL application can reach the database using the Service name by running the ETL script manually:
kubectl exec deployment/etl-app -- python3 app.py
You should see output showing the ETL script successfully connecting to PostgreSQL using postgres
as the hostname. This demonstrates the Service providing stable networking - the ETL Pod found the database without needing to know its specific IP address.
Zero-Downtime Updates with Rolling Updates
Here's where Kubernetes really shines in production environments. Let's say you need to deploy a new version of your ETL application. In traditional deployment approaches, you might need to stop all instances, update them, and restart everything. This creates downtime.
Kubernetes rolling updates solve this by gradually replacing old Pods with new ones, ensuring some instances are always running to handle requests.
Watch a Rolling Update in Action
First, let's set up a way to monitor what's happening. Open a second terminal and run:
# Make sure you have the kubectl alias in this terminal too
alias kubectl="minikube kubectl --"
# Watch the logs from all ETL Pods
kubectl logs -f -l app=etl-app --all-containers --tail=50
Leave this running. Back in your main terminal, rebuild a new version and tell Kubernetes to use it:
# Ensure your Docker CLI is still pointed at Minikube
eval $(minikube -p minikube docker-env)
# Build v2 of the image
docker build -t etl-app:v2 .
# Trigger the rolling update to v2
kubectl set image deployment/etl-app etl-app=etl-app:v2
Watch what happens in both terminals:
- In the logs terminal: You'll see some Pods stopping and new ones starting with the updated image
- In the main terminal: Run
kubectl get pods -w
to watch Pods being created and terminated in real-time
The -w
flag keeps the command running and shows changes as they happen. You'll see something like:
NAME READY STATUS RESTARTS AGE
etl-app-5d8c7b4f6d-abc123 1/1 Running 0 2m
etl-app-5d8c7b4f6d-def456 1/1 Running 0 2m
etl-app-7f9a8c5e2b-ghi789 1/1 Running 0 10s # New Pod
etl-app-5d8c7b4f6d-abc123 1/1 Terminating 0 2m # Old Pod stopping
Press Ctrl+C to stop watching when the update completes.
What Just Happened?
Kubernetes performed a rolling update with these steps:
- Created new Pods with the updated image tag (v2)
- Waited for new Pods to be ready and healthy
- Terminated old Pods one at a time
- Repeated until all Pods were updated
At no point were all your application instances offline. If this were a web service behind a Service, users would never notice the deployment happening.
You can check the rollout status and history:
kubectl rollout status deployment/etl-app
kubectl rollout history deployment/etl-app
The history shows your deployments over time, which is useful for tracking what changed and when.
Environment Separation with Namespaces
So far, everything we've created lives in Kubernetes' "default" namespace. In real projects, you typically want to separate different environments (development, staging, production, CI/CD) or different teams' work. Namespaces provide this isolation.
Think of Namespaces as separate workspaces within the same cluster. Resources in different Namespaces can't directly see each other, which prevents accidental conflicts and makes permissions easier to manage.
This solves real problems you encounter as applications grow. Imagine you're developing a new feature for your ETL pipeline - you want to test it without risking your production data or accidentally breaking the version that's currently processing real business data. With Namespaces, you can run a complete copy of your entire pipeline (database, ETL scripts, everything) in a "staging" environment that's completely isolated from production. You can experiment freely, knowing that crashes or bad data in staging won't affect the production system that your users depend on.
Create a Staging Environment
Let's create a completely separate staging environment for our pipeline:
kubectl create namespace staging
Now deploy the same applications into the staging namespace by adding -n staging
to your commands:
# Deploy PostgreSQL in staging
kubectl create deployment postgres --image=postgres:13 -n staging
kubectl set env deployment/postgres \
POSTGRES_DB=pipeline POSTGRES_USER=etl POSTGRES_PASSWORD=stagingpassword -n staging
kubectl expose deployment postgres --port=5432 --target-port=5432 --name=postgres -n staging
# Deploy ETL app in staging (use the image you built earlier)
kubectl create deployment etl-app --image=etl-app:v1 -n staging
kubectl set env deployment/etl-app \
DB_HOST=postgres DB_PORT=5432 DB_USER=etl DB_PASSWORD=stagingpassword DB_NAME=pipeline -n staging
kubectl scale deployment etl-app --replicas=2 -n staging
See the Separation in Action
Now you have two complete environments. Compare them:
# Production environment (default namespace)
kubectl get pods
# Staging environment
kubectl get pods -n staging
# All resources in staging
kubectl get all -n staging
# See all Pods across all namespaces at once
kubectl get pods --all-namespaces
Notice that each environment has its own set of Pods, Services, and Deployments. They're completely isolated from each other.
Cross-Namespace DNS
Within the staging namespace, applications still connect to postgres:5432
just like in production. But if you needed an application in staging to connect to a Service in production, you'd use the full DNS name: postgres.default.svc.cluster.local
.
The pattern is: <service-name>.<namespace>.svc.<cluster-domain>
Here, svc
is a fixed keyword that stands for "service", and cluster.local
is the default cluster domain. This reveals an important concept: even though you're running Minikube locally, you're working with a real Kubernetes cluster - it just happens to be a single-node cluster running on your machine. In production, you'd have multiple nodes, but the DNS structure works exactly the same way.
This means:
postgres
reaches the postgres Service in the current namespacepostgres.staging.svc
reaches the postgres Service in the staging namespace from anywherepostgres.default.svc
reaches the postgres Service in the default namespace from anywhere
Understanding Clusters and Scheduling
Before we wrap up, let's briefly discuss some concepts that are important to understand conceptually, even though you won't work with them directly in local development.
Clusters and Node Pools
As a quick refresher, a Kubernetes cluster is a set of physical or virtual machines that work together to run containerized applications. It’s made up of a control plane that manages the cluster and worker nodes to handle the workload. In production Kubernetes environments (like Google GKE or Amazon EKS), these nodes are often grouped into node pools with different characteristics:
- Standard pool: General-purpose nodes for most applications
- High-memory pool: Nodes with lots of RAM for data processing jobs
- GPU pool: Nodes with graphics cards for machine learning workloads
- Spot/preemptible pool: Cheaper nodes that can be interrupted, good for fault-tolerant batch jobs
Pod Scheduling
Kubernetes automatically decides which node should run each Pod based on:
- Resource requirements: CPU and memory requests/limits
- Node capacity: Available resources on each node
- Affinity rules: Preferences about which nodes to use or avoid
- Constraints: Requirements like "only run on SSD-equipped nodes"
You rarely need to think about this in local development with Minikube (which only has one node), but it becomes important when running production workloads across multiple machines.
Optional: See Scheduling in Action
If you're curious, you can see a simple example of how scheduling works even in your single-node Minikube cluster:
# "Cordon" your node, marking it as unschedulable for new Pods
kubectl cordon node/minikube
# Try to create a new Pod
kubectl run test-scheduling --image=nginx
# Check if it's stuck in Pending status
kubectl get pods test-scheduling
You should see the Pod stuck in "Pending" status because there are no available nodes to schedule it on.
# "Uncordon" the node to make it schedulable again
kubectl uncordon node/minikube
# The Pod should now get scheduled and start running
kubectl get pods test-scheduling
Clean up the test Pod:
kubectl delete pod test-scheduling
This demonstrates Kubernetes' scheduling system, though you'll mostly encounter this when working with multi-node production clusters.
Cleaning Up
When you're done experimenting:
# Clean up default namespace
kubectl delete deployment postgres etl-app
kubectl delete service postgres
# Clean up staging namespace
kubectl delete namespace staging
# Or stop Minikube entirely
minikube stop
Key Takeaways
You've now experienced three fundamental production capabilities:
Services solve the moving target problem. When Pods restart and get new IP addresses, Services provide stable networking that applications can depend on. Your ETL script connects to postgres:5432
regardless of which specific Pod is running the database.
Rolling updates enable zero-downtime deployments. Instead of stopping everything to deploy updates, Kubernetes gradually replaces old Pods with new ones. This keeps your applications available during deployments.
Namespaces provide environment separation. You can run multiple copies of your entire stack (development, staging, production) in the same cluster while keeping them completely isolated.
These patterns scale from simple applications to complex microservices architectures. A web application with a database uses the same Service networking concepts, just with more components. A data pipeline with multiple processing stages uses the same rolling update strategy for each component.
Next, you'll learn about configuration management with ConfigMaps and Secrets, persistent storage for stateful applications, and resource management to ensure your applications get the CPU and memory they need.