Monitoring K8S Service Resource Usage: A Complete Guide
•3 min read
kubernetesk8sdevopscontainer
Overview
Monitoring Kubernetes resource usage is required for reliability, performance, and cost control. This guide covers baseline checks, deeper analysis, and simple automation scripts for daily operations.
Basic Resource Monitoring
kubectl top
# Pod usage
kubectl top pods
# Node usage
kubectl top nodes
# Real-time watch
kubectl top pods --watch
# Namespace scope
kubectl top pods -n namespace-name
# Sort by usage
kubectl top pods --sort-by=memory
kubectl top pods --sort-by=cpu
Quotas and Limits
# Namespace quotas
kubectl get quota -n namespace-name
# Limits in namespace
kubectl get limits -n namespace-name
# Pod requests/limits
kubectl get pods -o custom-columns=NAME:.metadata.name,CPU:.spec.containers[*].resources.requests.cpu,MEM:.spec.containers[*].resources.requests.memory,CPULIMIT:.spec.containers[*].resources.limits.cpu,MEMLIMIT:.spec.containers[*].resources.limits.memory
Advanced Resource Analysis
Pod-Level Details
kubectl get pods -o yaml | grep -E "(resources|requests|limits)"
kubectl top pods --containers
Node-Level Details
kubectl describe nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,ALLOCATABLECPU:.status.allocatable.cpu,ALLOCATABLEMEM:.status.allocatable.memory,ALLOCATEDCPU:.status.capacity.cpu,ALLOCATEDMEM:.status.capacity.memory
kubectl get nodes -o jsonpath='{.items[*].status.capacity}{"\n"}'
kubectl get nodes -o jsonpath='{.items[*].status.allocatable}{"\n"}'
Metrics Server
Install and Verify
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl get deployment -n kube-system metrics-server
kubectl get pods -n kube-system
kubectl top nodes
kubectl top pods
Debug Metrics Server
kubectl logs -n kube-system deployment/metrics-server
kubectl edit deployment -n kube-system metrics-server
Prometheus and Grafana
Prometheus Config Example
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
Grafana Quick Setup
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana grafana/grafana
kubectl port-forward svc/grafana 3000:80
Critical Checks
Hotspot Detection
kubectl top pods --sort-by=cpu | head -10
kubectl top pods --sort-by=memory | head -10
kubectl get pods --field-selector=status.phase=Pending
Efficiency Review
kubectl top nodes | awk '
{
cpu_usage = $3
cpu_total = $4
mem_usage = $5
mem_total = $6
cpu_util = cpu_usage / cpu_total * 100
mem_util = mem_usage / mem_total * 100
printf "Node: %s - CPU: %.1f%% Memory: %.1f%%\n", $1, cpu_util, mem_util
}'
Resource Governance Best Practices
Requests and Limits Example
apiVersion: v1
kind: Pod
metadata:
name: resource-demo
spec:
containers:
- name: frontend
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- name: backend
image: redis
resources:
requests:
memory: "128Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: "1000m"
HPA Example
kubectl autoscale deployment my-app --cpu-percent=70 --min=2 --max=10
kubectl get hpa
kubectl describe hpa my-app
Troubleshooting
# Evicted pods
kubectl get pods --all-namespaces | grep Evicted
# Node pressure indicators
kubectl describe node <node-name> | grep -i pressure
# Scheduling failures
kubectl get events --field-selector=reason=FailedScheduling -A
Automation Scripts
Daily Report Script
#!/bin/bash
echo "=== K8S Resource Usage Report ==="
echo "Date: $(date)"
echo ""
echo "=== Node Resource Usage ==="
kubectl top nodes | column -t
echo ""
echo "=== Top CPU Pods ==="
kubectl top pods --sort-by=cpu | head -10
echo ""
echo "=== Top Memory Pods ==="
kubectl top pods --sort-by=memory | head -10
echo ""
echo "=== Pending Pods ==="
kubectl get pods --field-selector=status.phase=Pending
Threshold Alert Script
#!/bin/bash
NAMESPACES=("production" "staging" "development")
CPU_THRESHOLD=80
MEMORY_THRESHOLD=85
for namespace in "${NAMESPACES[@]}"; do
echo "Checking namespace: $namespace"
kubectl top nodes --no-headers | while read -r name cpu mem rest; do
cpu_usage=$(echo "$cpu" | sed 's/%//')
mem_usage=$(echo "$mem" | sed 's/%//')
if (( $(echo "$cpu_usage > $CPU_THRESHOLD" | bc -l) )) || (( $(echo "$mem_usage > $MEMORY_THRESHOLD" | bc -l) )); then
echo "WARNING: $name CPU=$cpu Memory=$mem"
fi
done
done
Closing Notes
Track trends, not just snapshots. The combination of requests/limits governance, metrics collection, and automated reports creates a reliable foundation for proactive cluster operations.