September 9, 2025

Kubernetes Configuration and Production Readiness

You've deployed applications to Kubernetes and watched them self-heal. You've set up networking with Services and performed zero-downtime updates. But your applications aren't quite ready for a shared production cluster yet.

Think about what happens when multiple teams share the same Kubernetes cluster. Without proper boundaries, one team's runaway application could consume all available memory, starving everyone else's workloads. When an application crashes, how does Kubernetes know whether to restart it or leave it alone? And what about sensitive configuration like database passwords - surely we don't want those hardcoded in our container images?

Today, we'll add the production safeguards that make applications good citizens in shared clusters. We'll implement health checks that tell Kubernetes when your application is actually ready for traffic, set resource boundaries to prevent noisy neighbor problems, and externalize configuration so you can change settings without rebuilding containers.

By the end of this tutorial, you'll be able to:

Add health checks that prevent broken applications from receiving traffic
Set resource limits to protect your cluster from runaway applications
Run containers as non-root users for better security
Use ConfigMaps and Secrets to manage configuration without rebuilding images
Understand why these patterns matter for production workloads

Why Production Readiness Matters

Let's start with a scenario that shows why default Kubernetes settings aren't enough for production.

You deploy a new version of your ETL application. The container starts successfully, so Kubernetes marks it as ready and starts sending it traffic. But there's a problem: your application needs 30 seconds to warm up its database connection pool and load reference data into memory. During those 30 seconds, any requests fail with connection errors.

Or consider this: your application has a memory leak. Over several days, it slowly consumes more and more RAM until it uses all available memory on the node, causing other applications to crash. Without resource limits, one buggy application can take down everything else running on the same machine.

These aren't theoretical problems. Every production Kubernetes cluster deals with these challenges. The good news is that Kubernetes provides built-in solutions - you just need to configure them.

Health Checks: Teaching Kubernetes About Your Application

By default, Kubernetes considers a container "healthy" if its main process is running. But a running process doesn't mean your application is actually working. Maybe it's still initializing, maybe it lost its database connection, or maybe it's stuck in an infinite loop.

Probes let you teach Kubernetes how to check if your application is actually healthy. There are three types that solve different problems:

Readiness probes answer: "Is this Pod ready to handle requests?" If the probe fails, Kubernetes stops sending traffic to that Pod but leaves it running. This prevents users from hitting broken instances during startup or temporary issues.
Liveness probes answer: "Is this Pod still working?" If the probe fails repeatedly, Kubernetes restarts the Pod. This recovers from situations where your application is stuck but the process hasn't crashed.
Startup probes disable the other probes until your application finishes initializing. Most data processing applications don't need this, but it's useful for applications that take several minutes to start.

The distinction between readiness and liveness is important. Readiness failures are often temporary (like during startup or when a database is momentarily unavailable), so we don't want to restart the Pod. Liveness failures indicate something is fundamentally broken and needs a fresh start.

Setting Up Your Environment

Let's add these production features to the ETL pipeline from previous tutorials. If you're continuing from the last tutorial, make sure your Minikube cluster is running:

minikube start
alias kubectl="minikube kubectl --"

If you're starting fresh, you'll need the ETL application from the previous tutorial. Clone the repository:

git clone https://github.com/dataquestio/tutorials.git
cd tutorials/kubernetes-services-starter

# Point Docker to Minikube's environment
eval $(minikube -p minikube docker-env)

# Build the ETL image (same as tutorial 2)
docker build -t etl-app:v1 .

Clean up any existing deployments so we can start fresh:

kubectl delete deployment etl-app postgres --ignore-not-found=true
kubectl delete service postgres --ignore-not-found=true

Building a Production-Ready Deployment

In this tutorial, we'll build up a single deployment file that incorporates all production best practices. This mirrors how you'd work in a real job - starting with a basic deployment and evolving it as you add features.

Create a file called etl-deployment.yaml with this basic structure:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: etl-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: etl-app
  template:
    metadata:
      labels:
        app: etl-app
    spec:
      containers:
      - name: etl-app
        image: etl-app:v1
        imagePullPolicy: Never
        env:
        - name: DB_HOST
          value: postgres
        - name: DB_PORT
          value: "5432"
        - name: DB_USER
          value: etl
        - name: DB_PASSWORD
          value: mysecretpassword
        - name: DB_NAME
          value: pipeline
        - name: APP_VERSION
          value: v1

This is our starting point. Now we'll add production features one by one.

Adding Health Checks

Kubernetes probes should use lightweight commands that run quickly and reliably. For our ETL application, we need two different types of checks: one to verify our database dependency is available, and another to confirm our processing script is actively working.

First, we need to modify our Python script to include a heartbeat mechanism. This lets us detect when the ETL process gets stuck or stops working, which a simple process check wouldn't catch.

Edit the app.py file and add this heartbeat code:

def update_heartbeat():
    """Write current timestamp to heartbeat file for liveness probe"""
    import time
    with open("/tmp/etl_heartbeat", "w") as f:
        f.write(str(int(time.time())))
        f.write("\n")

# In the main loop, add the heartbeat after successful ETL completion
if __name__ == "__main__":
    while True:
        run_etl()
        update_heartbeat()  # Add this line
        log("Sleeping for 30 seconds...")
        time.sleep(30)

We’ll also need to update our Dockerfile because our readiness probe will use psql, but our base Python image doesn't include PostgreSQL client tools:

FROM python:3.10-slim

WORKDIR /app

# Install PostgreSQL client tools for health checks
RUN apt-get update && apt-get install -y postgresql-client && rm -rf /var/lib/apt/lists/*

COPY app.py .

RUN pip install psycopg2-binary

CMD ["python", "-u", "app.py"]

Now rebuild with the PostgreSQL client tools included:

# Make sure you're still in Minikube's Docker environment
eval $(minikube -p minikube docker-env)
docker build -t etl-app:v1 .

Now edit your etl-deployment.yaml file and add these health checks to the container spec, right after the env section. Make sure the readinessProbe: line starts at the same column as other container properties like image: and env:. YAML indentation errors are common here, so if you get stuck, you can reference the complete working file to check your spacing.

                readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - |
              PGPASSWORD="$DB_PASSWORD" \
              psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -t -c "SELECT 1;" >/dev/null
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 3
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - |
              # get the current time in seconds since 1970
              now=$(date +%s)
              # read the last "heartbeat" timestamp from a file
              # if the file doesn't exist, just pretend it's 0
              hb=$(cat /tmp/etl_heartbeat 2>/dev/null || echo 0)
              # subtract: how many seconds since the last heartbeat?
              # check that it's less than 600 seconds (10 minutes)
              [ $((now - hb)) -lt 600 ]
          initialDelaySeconds: 60
          periodSeconds: 30
          failureThreshold: 2

Let's understand what these probes do:

readinessProbe: Uses psql to test the actual database connection our application needs. This approach works reliably with the security settings we'll add later and tests the same connection path our ETL script uses.
livenessProbe: Verifies our ETL script is actively processing by checking when it last updated a heartbeat file. This catches situations where the script gets stuck in an infinite loop or stops working entirely.

The liveness probe uses generous timing (check every 30 seconds, allow up to 10 minutes between heartbeats) because ETL jobs can legitimately take time to process data, and unnecessary restarts are expensive.

Web applications often use HTTP endpoints for probes (like /readyz for readiness and /livez for liveness, following Kubernetes component naming conventions), but data processing applications typically verify their connections to databases, message queues, or file systems directly.

The timing configuration tells Kubernetes:

readinessProbe: Start checking after 10 seconds, check every 10 seconds with a 3-second timeout per attempt, mark unready after 3 consecutive failures
livenessProbe: Start checking after 60 seconds (giving time for initialization), check every 30 seconds, restart after 2 consecutive failures

Timing Values in Practice: These numbers are example values chosen for this tutorial. In production, you should tune these values based on your actual application behavior. Consider how long your service actually takes to start up (for initialDelaySeconds), how reliable your network connections are (affecting periodSeconds and failureThreshold), and how disruptive false restarts would be to your users. A database might need 60+ seconds to initialize, while a simple API might be ready in 5 seconds. Network-dependent services in flaky environments might need higher failure thresholds to avoid unnecessary restarts.

Now deploy PostgreSQL and then apply your deployment:

# Deploy PostgreSQL
kubectl create deployment postgres --image=postgres:13
kubectl set env deployment/postgres POSTGRES_DB=pipeline POSTGRES_USER=etl POSTGRES_PASSWORD=mysecretpassword
kubectl expose deployment postgres --port=5432

# Deploy ETL app with probes
kubectl apply -f etl-deployment.yaml

# Check the initial status
kubectl get pods

You might initially see the ETL pods showing 0/1 in the READY column. This is expected! The readiness probe is checking if PostgreSQL is available, and it might take a moment for the database to fully start up. Watch the pods transition to 1/1 as PostgreSQL becomes ready:

kubectl get pods -w

Once both PostgreSQL and the ETL pods show 1/1 READY, press Ctrl+C and proceed to the next step.

Testing Probe Behavior

Let's see readiness probes in action. In one terminal, watch the Pod status:

kubectl get pods -w

In another terminal, break the database connection by scaling PostgreSQL to zero:

kubectl scale deployment postgres --replicas=0

Watch what happens to the ETL Pods. You'll see their READY column change from 1/1 to 0/1. The Pods are still running (STATUS remains "Running"), but Kubernetes has marked them as not ready because the readiness probe is failing.

Check the Pod details to see the probe failures:

kubectl describe pod -l app=etl-app | grep -A10 "Readiness"

You'll see events showing readiness probe failures. The output will include lines like:

Readiness probe failed: psql: error: connection to server at "postgres" (10.96.123.45), port 5432 failed: Connection refused

This shows that psql can't connect to the PostgreSQL service, which is exactly what we expect when the database isn't running.

Now restore PostgreSQL:

kubectl scale deployment postgres --replicas=1

Within about 15 seconds, the ETL Pods should return to READY status as their readiness probes start succeeding again. Press Ctrl+C to stop watching.

Understanding What Just Happened

This demonstrates the power of readiness probes:

When PostgreSQL was available: ETL Pods were marked READY (1/1)
When PostgreSQL went down: ETL Pods automatically became NOT READY (0/1), but kept running
When PostgreSQL returned: ETL Pods automatically became READY again

If these ETL Pods were behind a Service (like a web API), Kubernetes would have automatically stopped routing traffic to them during the database outage, then resumed traffic when the database returned. The application didn't crash or restart unnecessarily. Instead, it just waited for its dependency to become available again.

The liveness probe continues running in the background. You can verify it's working by checking for successful probe events:

kubectl get events --field-selector reason=Unhealthy -o wide

If you don't see any recent "Unhealthy" events related to liveness probes, that means they're passing successfully. You can also verify the heartbeat mechanism by checking the Pod logs to confirm the ETL script is running its normal cycle:

kubectl logs deployment/etl-app --tail=10

You should see regular "ETL cycle complete" and "Sleeping for 30 seconds" messages, which indicates the script is actively running and would be updating its heartbeat file.

This demonstrates how probes enable intelligent application lifecycle management. Kubernetes makes smart decisions about what's broken and how to fix it.

Resource Management: Being a Good Neighbor

In a shared Kubernetes cluster, multiple applications run on the same nodes. Without resource limits, one application can monopolize CPU or memory, starving others. This is the "noisy neighbor" problem.

Kubernetes uses resource requests and limits to solve this:

Requests tell Kubernetes how much CPU/memory your Pod needs to run properly. Kubernetes uses this for scheduling decisions.
Limits set hard caps on how much CPU/memory your Pod can use. If a Pod exceeds its memory limit, it gets killed.

A note about ephemeral storage: You can also set requests and limits for ephemeral-storage, which controls temporary disk space inside containers. This becomes important for applications that generate lots of log files, cache data locally, or create temporary files during processing. Without ephemeral storage limits, a runaway process that fills up disk space can cause confusing Pod evictions that are hard to debug. While we won't add storage limits to our ETL example, keep this in mind for data processing jobs that work with large temporary files.

Adding Resource Controls

Now let's add resource controls to prevent our application from consuming too many cluster resources. Edit your etl-deployment.yaml file and add a resources section right after the environment variables. The resources section should align with other container properties like image and env. Make sure resources: starts at the same column as those properties (8 spaces from the left margin):

                resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"

Apply the updated configuration:

kubectl apply -f etl-deployment.yaml

The resource specifications mean:

requests: The Pod needs at least 128MB RAM and 0.1 CPU cores to run
limits: The Pod cannot use more than 256MB RAM or 0.5 CPU cores

CPU is measured in "millicores" where 1000m = 1 CPU core. Memory uses standard units (Mi = mebibytes).

Check that Kubernetes scheduled your Pods with these constraints:

kubectl describe pod -l app=etl-app | grep -A3 "Limits"

You'll see output showing your resource configuration for each Pod. Kubernetes uses these requests to decide if a node has enough free resources to run your Pod. If your cluster doesn't have enough resources available, Pods stay in the Pending state until resources free up.

Understanding Resource Impact

Resources affect two critical behaviors:

Scheduling: When Kubernetes needs to place a Pod, it only considers nodes with enough unreserved resources to meet your requests. If you request 4GB of RAM but all nodes only have 2GB free, your Pod won't schedule.
Runtime enforcement: If your Pod tries to use more memory than its limit, Kubernetes kills it (OOMKilled status). CPU limits work differently - instead of killing the Pod, Kubernetes throttles it to stay within the limit. Be aware that heavy CPU throttling can slow down probe responses, which might cause Kubernetes to restart the Pod if health checks start timing out.

Quality of Service (QoS): Your resource configuration determines how Kubernetes prioritizes your Pod during resource pressure. You can see this in action:

kubectl describe pod -l app=etl-app | grep "QoS Class"

You'll likely see "Burstable" because our requests are lower than our limits. This means the Pod can use extra resources when available, but might get evicted if the node runs short. For critical production workloads, you often want "Guaranteed" QoS by setting requests equal to limits, which provides more predictable performance and better protection from eviction.

This is why setting appropriate values matters. Too low and your application crashes or runs slowly. Too high and you waste resources that other applications could use.

Security: Running as Non-Root

By default, containers often run as root (user ID 0). This is a security risk - if someone exploits your application, they have root privileges inside the container. While container isolation provides some protection, defense in depth means we should run as non-root users whenever possible.

Configuring Non-Root Execution

Edit your etl-deployment.yaml file and add a securityContext section inside the existing Pod template spec. Find the section that looks like this:

  template:
    metadata:
      labels:
        app: etl-app

    spec:
      containers:

Add the securityContext right after the spec: line and before the containers: line:

    template:
    metadata:
      labels:
        app: etl-app
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      # ... rest of container spec

Apply the secure configuration:

kubectl apply -f etl-deployment.yaml

The securityContext settings:

runAsNonRoot: Prevents the container from running as root
runAsUser: Specifies user ID 1000 (a non-privileged user)
fsGroup: Sets the group ownership for mounted volumes

Since we changed the Pod template, Kubernetes needs to create new Pods with the security context. Check that the rollout completes:

kubectl rollout status deployment/etl-app

You should see "deployment successfully rolled out" when it's finished. Now verify the container is running as a non-root user:

kubectl exec deployment/etl-app -- id

You should see uid=1000, not uid=0(root).

Configuration Without Rebuilds

So far, we've hardcoded configuration like database passwords directly in our deployment YAML. This is problematic for several reasons:

Changing configuration requires updating deployment files
Sensitive values like passwords are visible in plain text
Different environments (development, staging, production) need different values

Kubernetes provides ConfigMaps for non-sensitive configuration and Secrets for sensitive data. Both let you change configuration without rebuilding containers, but they offer different ways to deliver that configuration to your applications.

Creating ConfigMaps and Secrets

First, create a ConfigMap for non-sensitive configuration:

kubectl create configmap app-config \
  --from-literal=DB_HOST=postgres \
  --from-literal=DB_PORT=5432 \
  --from-literal=DB_NAME=pipeline \
  --from-literal=LOG_LEVEL=INFO

Now create a Secret for sensitive data:

kubectl create secret generic db-credentials \
  --from-literal=DB_USER=etl \
  --from-literal=DB_PASSWORD=mysecretpassword

Secrets are base64 encoded (not encrypted) by default. In production, you'd use additional tools for encryption at rest.

View what was created:

kubectl get configmap app-config -o yaml
kubectl get secret db-credentials -o yaml

Notice that the Secret values are base64 encoded. You can decode them:

echo "bXlzZWNyZXRwYXNzd29yZA==" | base64 -d

Using Environment Variables

Kubernetes gives you two main ways to use ConfigMaps and Secrets in your applications: as environment variables (which we'll use) or as mounted files inside your containers. Environment variables work well for simple key-value configuration like database connections. Volume mounts are better for complex configuration files, certificates, or when you need to rotate secrets without restarting containers. We'll stick with environment variables to keep things focused, but keep volume mounts in mind for more advanced scenarios.

Edit your etl-deployment.yaml file to use these external configurations. Replace the hardcoded env section with:

                envFrom:
        - configMapRef:
            name: app-config
        - secretRef:
            name: db-credentials
        env:
        - name: APP_VERSION
          value: v1

The key change is envFrom, which loads all key-value pairs from the ConfigMap and Secret as environment variables.

Apply the final configuration:

kubectl apply -f etl-deployment.yaml

Updating Configuration Without Rebuilds

Here's where ConfigMaps and Secrets shine. Let's change the log level without touching the container image:

kubectl edit configmap app-config

Change LOG_LEVEL from INFO to DEBUG and save.

ConfigMap changes don't automatically restart Pods, so trigger a rollout:

kubectl rollout restart deployment/etl-app
kubectl rollout status deployment/etl-app

Verify the new configuration is active:

kubectl exec deployment/etl-app -- env | grep LOG_LEVEL

You just changed application configuration without rebuilding the container image or modifying deployment files. This pattern becomes powerful when you have dozens of configuration values that differ between environments.

Cleaning Up

When you're done experimenting:

# Delete deployments and services
kubectl delete deployment etl-app postgres
kubectl delete service postgres

# Delete configuration
kubectl delete configmap app-config
kubectl delete secret db-credentials

# Stop Minikube
minikube stop

Production Patterns in Action

You've transformed a basic Kubernetes deployment into something ready for production. Your application now:

Communicates its health to Kubernetes through readiness and liveness probes
Respects resource boundaries to be a good citizen in shared clusters
Runs securely as a non-root user
Accepts configuration changes without rebuilding containers

These patterns follow real production practices you'll see in enterprise Kubernetes deployments. Health checks prevent cascading failures when dependencies have issues. Resource limits prevent cluster instability when applications misbehave. Non-root execution reduces security risks if vulnerabilities get exploited. External configuration enables GitOps workflows where you manage settings separately from code.

These same patterns scale from simple applications to complex microservices architectures. A small ETL pipeline uses the same production readiness features as a system handling millions of requests per day.

Every production Kubernetes deployment needs these safeguards. Without health checks, broken Pods receive traffic. Without resource limits, one application can destabilize an entire cluster. Without external configuration, simple changes require complex rebuilds.

Next Steps

Now that your applications are production-ready, you can explore advanced Kubernetes features:

Horizontal Pod Autoscaling (HPA): Automatically scale replicas based on CPU/memory usage
Persistent Volumes: Handle stateful applications that need durable storage
Network Policies: Control which Pods can communicate with each other
Pod Disruption Budgets: Ensure minimum availability during cluster maintenance
Service Mesh: Add advanced networking features like circuit breakers and retries

The patterns you've learned here remain the same whether you're running on Minikube, Amazon EKS, Google GKE, or your own Kubernetes cluster. Start with these fundamentals, and add complexity only when your requirements demand it.

Remember that Kubernetes is a powerful tool, but not every application needs all its features. Use health checks and resource limits everywhere. Add other features based on actual requirements, not because they seem interesting. The best Kubernetes deployments are often the simplest ones that solve real problems.

Kubernetes Configuration and Production Readiness

Why Production Readiness Matters

Health Checks: Teaching Kubernetes About Your Application

Setting Up Your Environment

Building a Production-Ready Deployment

Adding Health Checks

Testing Probe Behavior

Understanding What Just Happened

Resource Management: Being a Good Neighbor

Adding Resource Controls

Understanding Resource Impact

Security: Running as Non-Root

Configuring Non-Root Execution

Configuration Without Rebuilds

Creating ConfigMaps and Secrets

Using Environment Variables

Updating Configuration Without Rebuilds

Cleaning Up

Production Patterns in Action

Next Steps

Working with DataFrames in PySpark

PySpark Tutorial for Beginners – Install and Learn Apache Spark with Python

Kubernetes Configuration and Production Readiness

Why Production Readiness Matters

Health Checks: Teaching Kubernetes About Your Application

Setting Up Your Environment

Building a Production-Ready Deployment

Adding Health Checks

Testing Probe Behavior

Understanding What Just Happened

Resource Management: Being a Good Neighbor

Adding Resource Controls

Understanding Resource Impact

Security: Running as Non-Root

Configuring Non-Root Execution

Configuration Without Rebuilds

Creating ConfigMaps and Secrets

Using Environment Variables

Updating Configuration Without Rebuilds

Cleaning Up

Production Patterns in Action

Next Steps

More learning resources

Working with DataFrames in PySpark

PySpark Tutorial for Beginners – Install and Learn Apache Spark with Python