Part 8: Persistent Storage
Introduction
Containers are ephemeral by design, but many applications require persistent data storage. Kubernetes provides a sophisticated storage system with Volumes, PersistentVolumes (PV), PersistentVolumeClaims (PVC), and StorageClasses to manage stateful workloads.
Understanding Kubernetes Storage
Storage Lifecycle
- Volume - Storage directly defined in pod spec (lifecycle tied to pod)
- PersistentVolume (PV) - Cluster-level storage resource (lifecycle independent of pods)
- PersistentVolumeClaim (PVC) - Request for storage by a user
- StorageClass - Dynamic provisioning of PVs
Volumes
Volumes are directories accessible to containers in a pod.
1. emptyDir
Empty directory created when pod is assigned to a node, deleted when pod is removed.
# emptydir-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: emptydir-pod
spec:
containers:
- name: writer
image: busybox:latest
command: ["sh", "-c", "while true; do echo $(date) >> /data/log.txt; sleep 5; done"]
volumeMounts:
- name: shared-data
mountPath: /data
- name: reader
image: busybox:latest
command: ["sh", "-c", "while true; do cat /data/log.txt; sleep 10; done"]
volumeMounts:
- name: shared-data
mountPath: /data
volumes:
- name: shared-data
emptyDir: {}
# Apply the pod
kubectl apply -f emptydir-pod.yaml
# Check writer logs
kubectl logs emptydir-pod -c writer
# Check reader logs
kubectl logs emptydir-pod -c reader
# Delete pod - data is lost
kubectl delete pod emptydir-pod
emptyDir with memory backing:
# emptydir-memory.yaml
apiVersion: v1
kind: Pod
metadata:
name: emptydir-memory
spec:
containers:
- name: app
image: nginx:alpine
volumeMounts:
- name: cache
mountPath: /cache
volumes:
- name: cache
emptyDir:
medium: Memory
sizeLimit: 128Mi
2. hostPath
Mounts a file or directory from the host node’s filesystem.
# hostpath-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: hostpath-pod
spec:
containers:
- name: app
image: nginx:alpine
volumeMounts:
- name: host-data
mountPath: /data
volumes:
- name: host-data
hostPath:
path: /mnt/data
type: DirectoryOrCreate
# Apply the pod
kubectl apply -f hostpath-pod.yaml
# Get node where pod is running
NODE=$(kubectl get pod hostpath-pod -o jsonpath='{.spec.nodeName}')
echo "Pod is running on: $NODE"
# Write data from pod
kubectl exec hostpath-pod -- sh -c "echo 'Hello from pod' > /data/test.txt"
# SSH to node and verify (if you have access)
# ssh $NODE
# cat /mnt/data/test.txt
Warning: hostPath is not recommended for production. Data is node-specific and not portable.
3. ConfigMap Volume
# configmap-volume.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
config.json: |
{
"database": "postgres",
"port": 5432
}
settings.ini: |
[app]
debug=false
---
apiVersion: v1
kind: Pod
metadata:
name: configmap-volume-pod
spec:
containers:
- name: app
image: nginx:alpine
volumeMounts:
- name: config
mountPath: /etc/config
readOnly: true
volumes:
- name: config
configMap:
name: app-config
# Apply and test
kubectl apply -f configmap-volume.yaml
kubectl exec configmap-volume-pod -- ls /etc/config
kubectl exec configmap-volume-pod -- cat /etc/config/config.json
4. Secret Volume
# secret-volume.yaml
apiVersion: v1
kind: Secret
metadata:
name: db-secret
type: Opaque
stringData:
username: admin
password: secretpass123
---
apiVersion: v1
kind: Pod
metadata:
name: secret-volume-pod
spec:
containers:
- name: app
image: nginx:alpine
volumeMounts:
- name: secrets
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secrets
secret:
secretName: db-secret
defaultMode: 0400
# Apply and test
kubectl apply -f secret-volume.yaml
kubectl exec secret-volume-pod -- ls -la /etc/secrets
kubectl exec secret-volume-pod -- cat /etc/secrets/username
PersistentVolumes (PV) and PersistentVolumeClaims (PVC)
PersistentVolumes are cluster resources that exist independent of pods.
Static Provisioning
1. Create PersistentVolume
# pv-local.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-pv-1
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /mnt/data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node-1 # Replace with your node name
# Create directory on node
# ssh node-1
# sudo mkdir -p /mnt/data
# sudo chmod 777 /mnt/data
# Apply PV
kubectl apply -f pv-local.yaml
# View PV
kubectl get pv
kubectl describe pv local-pv-1
2. Create PersistentVolumeClaim
# pvc-local.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: local-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: local-storage
# Apply PVC
kubectl apply -f pvc-local.yaml
# View PVC
kubectl get pvc
kubectl describe pvc local-pvc
# Check binding status
kubectl get pv,pvc
3. Use PVC in Pod
# pod-with-pvc.yaml
apiVersion: v1
kind: Pod
metadata:
name: app-with-storage
spec:
containers:
- name: app
image: nginx:alpine
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumes:
- name: data
persistentVolumeClaim:
claimName: local-pvc
# Apply the pod
kubectl apply -f pod-with-pvc.yaml
# Write data to volume
kubectl exec app-with-storage -- sh -c "echo '<h1>Persistent Data</h1>' > /usr/share/nginx/html/index.html"
# Delete and recreate pod
kubectl delete pod app-with-storage
kubectl apply -f pod-with-pvc.yaml
# Verify data persists
kubectl exec app-with-storage -- cat /usr/share/nginx/html/index.html
Access Modes
- ReadWriteOnce (RWO) - Volume can be mounted read-write by a single node
- ReadOnlyMany (ROX) - Volume can be mounted read-only by many nodes
- ReadWriteMany (RWX) - Volume can be mounted read-write by many nodes
- ReadWriteOncePod (RWOP) - Volume can be mounted read-write by a single pod (K8s 1.22+)
Reclaim Policies
- Retain - Manual reclamation (PV kept after PVC deletion)
- Delete - PV and underlying storage deleted with PVC
- Recycle - Basic scrub (
rm -rf /volume/*
) - deprecated
# pv-with-policies.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-retain
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain # Data kept after claim deletion
storageClassName: standard
hostPath:
path: /mnt/pv-retain
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-delete
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete # Data deleted with claim
storageClassName: standard
hostPath:
path: /mnt/pv-delete
StorageClasses and Dynamic Provisioning
StorageClasses enable dynamic PV creation.
1. Create StorageClass
# storageclass-local.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-storage
provisioner: kubernetes.io/no-provisioner # For local volumes
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
# Apply StorageClass
kubectl apply -f storageclass-local.yaml
# View StorageClasses
kubectl get storageclass
kubectl describe storageclass fast-storage
2. Dynamic Provisioning Example (AWS EBS)
# storageclass-aws.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: aws-ebs-gp3
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
# pvc-dynamic.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dynamic-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: aws-ebs-gp3
resources:
requests:
storage: 20Gi
# Apply PVC - PV will be created automatically
kubectl apply -f pvc-dynamic.yaml
# Watch PV creation
kubectl get pvc dynamic-pvc -w
# View created PV
kubectl get pv
3. Volume Expansion
# Check if StorageClass allows expansion
kubectl get storageclass aws-ebs-gp3 -o jsonpath='{.allowVolumeExpansion}'
# Expand PVC
kubectl patch pvc dynamic-pvc -p '{"spec":{"resources":{"requests":{"storage":"30Gi"}}}}'
# Watch expansion
kubectl get pvc dynamic-pvc -w
# Verify new size
kubectl describe pvc dynamic-pvc
StatefulSets
StatefulSets manage stateful applications with stable network identities and persistent storage.
StatefulSet Features
- Stable, unique network identifiers
- Stable, persistent storage
- Ordered, graceful deployment and scaling
- Ordered, automated rolling updates
1. Basic StatefulSet
# statefulset-basic.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx-headless
spec:
clusterIP: None
selector:
app: nginx-sts
ports:
- port: 80
name: web
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx-sts
spec:
serviceName: nginx-headless
replicas: 3
selector:
matchLabels:
app: nginx-sts
template:
metadata:
labels:
app: nginx-sts
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
name: web
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 1Gi
# Apply StatefulSet
kubectl apply -f statefulset-basic.yaml
# Watch pods being created (ordered)
kubectl get pods -l app=nginx-sts -w
# View created PVCs
kubectl get pvc
# Check pod DNS names
kubectl run dns-test --image=busybox:latest -it --rm --restart=Never -- nslookup nginx-sts-0.nginx-headless
# Write unique data to each pod
for i in 0 1 2; do
kubectl exec nginx-sts-$i -- sh -c "echo 'Pod nginx-sts-$i' > /usr/share/nginx/html/index.html"
done
# Verify unique data
for i in 0 1 2; do
echo "Data from nginx-sts-$i:"
kubectl exec nginx-sts-$i -- cat /usr/share/nginx/html/index.html
done
# Delete and verify persistence
kubectl delete pod nginx-sts-1
# Wait for pod to recreate
kubectl wait --for=condition=Ready pod/nginx-sts-1 --timeout=60s
# Verify data persists
kubectl exec nginx-sts-1 -- cat /usr/share/nginx/html/index.html
2. Scaling StatefulSet
# Scale up
kubectl scale statefulset nginx-sts --replicas=5
# Watch pods being created (ordered: 3, then 4)
kubectl get pods -l app=nginx-sts -w
# Scale down
kubectl scale statefulset nginx-sts --replicas=2
# Watch pods being deleted (reverse order: 4, then 3)
kubectl get pods -l app=nginx-sts -w
# Note: PVCs are NOT deleted automatically
kubectl get pvc
3. Update Strategy
# statefulset-update.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx-sts
spec:
serviceName: nginx-headless
replicas: 3
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0 # Update all pods
selector:
matchLabels:
app: nginx-sts
template:
metadata:
labels:
app: nginx-sts
spec:
containers:
- name: nginx
image: nginx:1.25-alpine # Updated version
ports:
- containerPort: 80
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 1Gi
# Apply update
kubectl apply -f statefulset-update.yaml
# Watch rolling update (reverse order: 2, 1, 0)
kubectl rollout status statefulset nginx-sts
# Check image versions
kubectl get pods -l app=nginx-sts -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'
4. Practical Example: PostgreSQL StatefulSet
# postgres-statefulset.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
data:
POSTGRES_DB: "myapp"
POSTGRES_USER: "myuser"
---
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
type: Opaque
stringData:
POSTGRES_PASSWORD: "mysecretpassword"
---
apiVersion: v1
kind: Service
metadata:
name: postgres-headless
spec:
clusterIP: None
selector:
app: postgres
ports:
- port: 5432
name: postgres
---
apiVersion: v1
kind: Service
metadata:
name: postgres
spec:
selector:
app: postgres
type: ClusterIP
ports:
- port: 5432
targetPort: 5432
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres-headless
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15-alpine
ports:
- containerPort: 5432
name: postgres
envFrom:
- configMapRef:
name: postgres-config
- secretRef:
name: postgres-secret
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
subPath: postgres
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U myuser -d myapp
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U myuser -d myapp
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 10Gi
# Apply PostgreSQL StatefulSet
kubectl apply -f postgres-statefulset.yaml
# Wait for pod to be ready
kubectl wait --for=condition=Ready pod/postgres-0 --timeout=120s
# Connect to database
kubectl exec -it postgres-0 -- psql -U myuser -d myapp
# In psql:
# CREATE TABLE users (id SERIAL PRIMARY KEY, name VARCHAR(100));
# INSERT INTO users (name) VALUES ('Alice'), ('Bob'), ('Charlie');
# SELECT * FROM users;
# \q
# Test data persistence
kubectl delete pod postgres-0
kubectl wait --for=condition=Ready pod/postgres-0 --timeout=120s
# Verify data persists
kubectl exec -it postgres-0 -- psql -U myuser -d myapp -c "SELECT * FROM users;"
Volume Snapshots
VolumeSnapshots allow you to take snapshots of PVs.
1. Install Snapshot CRDs and Controller
# Install VolumeSnapshot CRDs
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
# Install snapshot controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
2. Create VolumeSnapshotClass
# volumesnapshotclass.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-snapshot-class
driver: ebs.csi.aws.com # Use your CSI driver
deletionPolicy: Delete
3. Create Snapshot
# volumesnapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgres-snapshot
spec:
volumeSnapshotClassName: csi-snapshot-class
source:
persistentVolumeClaimName: postgres-data-postgres-0
# Create snapshot
kubectl apply -f volumesnapshot.yaml
# Check snapshot status
kubectl get volumesnapshot
kubectl describe volumesnapshot postgres-snapshot
4. Restore from Snapshot
# pvc-from-snapshot.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-restored
spec:
accessModes:
- ReadWriteOnce
storageClassName: aws-ebs-gp3
resources:
requests:
storage: 10Gi
dataSource:
name: postgres-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
Volume Cloning
Clone an existing PVC to a new PVC.
# pvc-clone.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-clone
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 10Gi
dataSource:
name: postgres-data-postgres-0
kind: PersistentVolumeClaim
# Create clone
kubectl apply -f pvc-clone.yaml
# Wait for clone
kubectl get pvc postgres-clone -w
# Use cloned volume
Storage Best Practices
1. Resource Management
# Use resource requests and limits
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
limits:
storage: 20Gi # Maximum size if expansion enabled
storageClassName: fast-storage
2. Backup Strategy
# Create backup script
cat > backup-postgres.sh << 'EOF'
#!/bin/bash
BACKUP_NAME="postgres-backup-$(date +%Y%m%d-%H%M%S)"
# Create snapshot
cat <<YAML | kubectl apply -f -
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: $BACKUP_NAME
spec:
volumeSnapshotClassName: csi-snapshot-class
source:
persistentVolumeClaimName: postgres-data-postgres-0
YAML
echo "Backup created: $BACKUP_NAME"
EOF
chmod +x backup-postgres.sh
./backup-postgres.sh
3. Monitoring Storage
# Check PVC usage
kubectl get pvc -A
# Describe PVC for details
kubectl describe pvc <pvc-name>
# Check storage capacity
kubectl get pv -o custom-columns=NAME:.metadata.name,CAPACITY:.spec.capacity.storage,STATUS:.status.phase
# Find unused PVCs
kubectl get pvc -A -o json | jq -r '.items[] | select(.status.phase=="Bound") | select(.spec.volumeName as $pv | .metadata.namespace as $ns | ([.items[].spec.volumes[]? | select(.persistentVolumeClaim.claimName)] | length) == 0) | "\(.metadata.namespace)/\(.metadata.name)"'
Troubleshooting Storage Issues
# PVC stuck in Pending
kubectl describe pvc <pvc-name>
# Check: No matching PV, StorageClass issues, insufficient resources
# View storage events
kubectl get events --sort-by='.lastTimestamp' | grep -i "persistentvolume"
# Check PV status
kubectl get pv
kubectl describe pv <pv-name>
# Verify StorageClass
kubectl get storageclass
kubectl describe storageclass <sc-name>
# Check CSI driver (if applicable)
kubectl get pods -n kube-system | grep csi
# Manually clean up PVs in Released state
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": null}}'
# Force delete stuck PVC
kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'
kubectl delete pvc <pvc-name> --force --grace-period=0
Summary
In this part, you learned:
- Different volume types (emptyDir, hostPath, configMap, secret)
- PersistentVolumes and PersistentVolumeClaims
- StorageClasses and dynamic provisioning
- Access modes and reclaim policies
- StatefulSets for stateful applications
- Volume snapshots and cloning
- PostgreSQL deployment with persistent storage
- Storage monitoring and troubleshooting
Key Commands:
# PV/PVC operations
kubectl get pv
kubectl get pvc
kubectl describe pv <name>
kubectl describe pvc <name>
# StorageClass
kubectl get storageclass
kubectl describe storageclass <name>
# StatefulSet operations
kubectl get statefulset
kubectl scale statefulset <name> --replicas=<n>
kubectl rollout status statefulset <name>
# Volume snapshots
kubectl get volumesnapshot
kubectl describe volumesnapshot <name>
In Part 9, we’ll explore advanced topics including Ingress, Jobs, CronJobs, DaemonSets, and resource management.