How to Monitor Applications in Kubernetes?
Monitoring is like a car dashboard. The speedometer shows speed (CPU usage), the temperature gauge — overheating (memory usage), the CHECK ENGINE light — something is broken (er...
Junior Level
Simple Definition
Monitoring in Kubernetes is the practice of collecting, storing, and analyzing data about the state of the cluster, containers, and applications. Monitoring answers three questions: “What is happening?”, “Why did it happen?” and “How to prevent it?”
Analogy
Monitoring is like a car dashboard. The speedometer shows speed (CPU usage), the temperature gauge — overheating (memory usage), the CHECK ENGINE light — something is broken (error rate). Without a dashboard, you’re driving blind.
Example: Installing Prometheus Stack (Helm)
# Add repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install kube-prometheus-stack
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace
Example: Spring Boot Actuator + Micrometer
<!-- pom.xml -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
# application.yml
management:
endpoints:
web:
exposure:
include: health,info,prometheus
metrics:
export:
prometheus:
enabled: true
kubectl Example
# Open Grafana dashboard
kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring
# http://localhost:3000 (admin/prom-operator)
# Open Prometheus UI
kubectl port-forward svc/monitoring-prometheus 9090:9090 -n monitoring
# List monitoring Pods
kubectl get pods -n monitoring
# prometheus-monitoring-0
# grafana-xxxxx
# alertmanager-monitoring-0
When to Use
- Always in production — without monitoring you are blind
- For tracking CPU, memory, disk, network usage
- For error detection and performance degradation
- For alerting: SMS/email/Slack when problems occur
Middle Level
How it Works
Monitoring in Kubernetes is built on four components:
- Prometheus — time-series database, collects metrics via pull method (polls endpoints every 15-30 seconds)
Prometheus – de facto standard for monitoring in K8s. Pull model: Prometheus scrapes metrics from Pods via HTTP endpoint (/metrics).
- Node Exporter — DaemonSet on each node, collects OS metrics (CPU, RAM, disk, network)
- cAdvisor — built into kubelet, collects container metrics (CPU, memory, network per container)
- kube-state-metrics — Deployment, generates metrics about K8s objects (Pod status, Deployment replicas, PVC usage)
- Grafana — visualization, dashboards, alerting
Metrics collection chain:
App (/actuator/prometheus) ← Prometheus scrapes every 15s
Node Exporter (DaemonSet) ← Prometheus scrapes every 15s
cAdvisor (kubelet) ← Prometheus scrapes every 15s
kube-state-metrics ← Prometheus scrapes every 15s
↓
Prometheus TSDB (stores 15 days)
↓
Grafana (dashboards)
↓
Alertmanager (notifications)
Practical Scenarios
Scenario 1: Java application monitoring
# JVM memory usage
jvm_memory_used_bytes{area="heap"}
# GC pause time
rate(jvm_gc_pause_seconds_sum[5m])
# HTTP request rate
rate(http_server_requests_seconds_count[5m])
# HTTP error rate (5xx)
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
# Request latency p99
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m]))
Scenario 2: Alert on 5xx error growth
# PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alerts
spec:
groups:
- name: app.rules
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
/ sum(rate(http_server_requests_seconds_count[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "5xx error rate > 5% for 5 minutes"
Scenario 3: Logging via Loki
# Helm values for Loki
loki:
commonConfig:
replication_factor: 1
storage:
type: filesystem
promtail:
enabled: true
Promtail (DaemonSet) collects logs from each node -> sends to Loki -> Grafana visualizes.
Common Mistakes Table
| Mistake | Consequence | Solution |
|---|---|---|
| Only infrastructure metrics, no business metrics | CPU looks fine, but payments aren’t going through | Add business metrics: payments_per_second, order_processing_time |
| Prometheus stores metrics only 15 days | Can’t analyze long-term trends | Use Thanos/Cortex for long-term storage (S3) |
| Too many alerts (alert fatigue) | Team ignores alerts, misses critical ones | Reduce to 5-10 critical alerts, use alert grouping |
| No tracing (only metrics and logs) | Can see request is slow, but not where | Add OpenTelemetry + Jaeger/Tempo |
| Prometheus single point of failure | When Prometheus goes down — no metrics, no alerts | Use Prometheus HA (replica 2, Alertmanager cluster) |
| Metrics collected too often (every 5s) | High load on Prometheus storage and network | Set 15-30s for most metrics, 5s for critical ones |
Comparison: Observability Stack
| Component | Metrics | Logs | Traces |
|---|---|---|---|
| Tool | Prometheus | Loki / ELK | Jaeger / Tempo |
| What it shows | Numbers (CPU, latency, errors) | Events (log lines, stack traces) | Request path through services |
| Collection | Pull (scrape) | Push (agent -> storage) | Push (SDK -> collector) |
| Storage | TSDB (15 days) | Index + chunks (30 days) | Trace index + span store |
| Query language | PromQL | LogQL | Trace ID lookup |
| When to use | Trends, alerting, dashboards | Debug, audit, compliance | Distributed tracing, bottleneck detection |
Monitoring (Prometheus/Grafana) – metrics: CPU, RAM, latency, error rate. Logging (ELK/Loki) – logs: container stdout/stderr. Tracing (Jaeger/Zipkin) – distributed tracing: request path through services.
When NOT to Use
- Dev/local development — too heavy, use simple logs and health endpoints
- Very small clusters (1-2 Pods) — Prometheus overhead may exceed the benefit
- When no team to support it — monitoring requires maintenance: updating dashboards, configuring alerts, managing storage
- For business analytics — Prometheus is not a data warehouse. Use ClickHouse/BigQuery for business analytics
Senior Level
Deep Mechanics: Prometheus TSDB, Scraping, and Controller Reconciliation
Prometheus Architecture:
Prometheus works on a pull model — it polls targets via HTTP /metrics endpoint.
- Service Discovery: Prometheus discovers targets via Kubernetes API:
kubernetes_sd_configswith rolepod,service,endpoints,node- Automatically finds Pods with annotation
prometheus.io/scrape: "true" - Updates target list on Pod changes (via Kubernetes informer)
- Scraping: Every
scrape_interval(15s by default):- HTTP GET
/metricsendpoint - Parse text format (Prometheus exposition format)
- Store in TSDB (Time-Series Database)
- HTTP GET
- TSDB (Time-Series Database):
- Data stored as blocks (2-hour chunks) on disk
- Each block: index (series -> chunks), chunks (raw samples), meta.json
- WAL (Write-Ahead Log) for durability on crash
- Compaction: 2h blocks -> 4h -> 8h -> … (cardinality reduction)
- Memory-mapped files for fast reading
- Prometheus Operator (CoreOS):
- Kubernetes Operator managing Prometheus via CRD:
Prometheus,ServiceMonitor,PodMonitor,PrometheusRule ServiceMonitor— declarative scrape target definition (instead of manual config)- Automatically generates Prometheus config from CRD
- Kubernetes Operator managing Prometheus via CRD:
kube-state-metrics Internals: kube-state-metrics connects to API Server via informers and generates metrics about K8s object states:
kube_pod_status_phase{namespace="default", pod="my-app", phase="Running"} -> 1
kube_deployment_status_replicas{namespace="default", deployment="my-app"} -> 3
kube_persistentvolumeclaim_status_phase{namespace="default", pvc="data", phase="Bound"} -> 1
Alertmanager: Prometheus sends firing alerts to Alertmanager. Alertmanager:
- Groups: Groups similar alerts (by label)
- Inhibition: Suppresses dependent alerts (if node down, don’t alert on every Pod on that node)
- Routing: Sends to the right receiver (Slack, PagerDuty, email)
- Deduplication: HA Prometheus (replica 2) sends identical alerts, Alertmanager deduplicates
OpenTelemetry (Tracing): OpenTelemetry SDK instruments the application, collects spans (individual operations) and groups them into traces (full request path). Spans are sent via OTLP protocol to collector -> Jaeger/Tempo for storage and visualization.
Trade-offs
| Aspect | Trade-off |
|---|---|
| Pull vs Push | Pull (Prometheus) = simpler service discovery, but doesn’t work for ephemeral targets. Push (StatsD) = works for batch jobs, but needs gateway |
| Prometheus vs VictoriaMetrics | Prometheus = standard, huge community. VictoriaMetrics = better performance, less RAM, but less mature ecosystem |
| Loki vs ELK | Loki = lighter, cheaper, better for K8s. ELK = more powerful full-text search, but heavier (Elasticsearch JVM) |
| Prometheus storage local vs remote | Local = simpler, but limited to 15-30 days. Remote (Thanos/Cortex) = long-term storage, but more complex |
| High cardinality labels | More labels = more precise queries, but exponentially more series -> more RAM/CPU |
| Scrape interval | Short (5s) = more precise, but higher load. Long (30s) = less load, but may miss spikes |
Prometheus stores data locally (usually 15-30 days). For long-term storage, use Thanos or Cortex.
Edge Cases (7+)
Edge Case 1: High Cardinality Explosion
# BAD: http_requests_total{path="/users/123", method="GET", status="200"}
# path contains ID -> unique series per user -> millions of series
Cardinality explosion: Prometheus stores each unique label combination as a separate time series. 1000 users x 10 endpoints x 5 statuses = 50,000 series. This eats RAM and slows queries. Solution: use path="/users/:id" (grouping), not specific IDs.
Edge Case 2: Prometheus OOM on large series count
Prometheus stores all active series in RAM. With 10 million series, Prometheus requires ~20-30GB RAM. If namespace ResourceQuota limit is lower, Prometheus is OOMKilled. Solution: --storage.tsdb.max-block-duration=2h, cardinality limits, or VictoriaMetrics (smaller RAM footprint).
Edge Case 3: Scraping target disappears before scrape completes
Pod deleted during scrape. Prometheus gets connection refused or partial response. Metrics for that scrape are lost. With frequent deployments (100+ Pods/day), this creates gaps in metrics. Solution: honor_labels: true + scrape_timeout < scrape_interval.
Edge Case 4: Alertmanager notification flooding
Node down -> 50 Pods on node not ready -> 50 alerts firing simultaneously. Alertmanager sends 50 Slack messages. Team gets alert fatigue. Solution: alert grouping (group_by: ['node']), inhibition rules (if node down, suppress pod alerts).
Edge Case 5: Tracing overhead
OpenTelemetry with sampling: 1.0 (100% of traces) adds 5-15% overhead to each request’s latency. At high load (10K RPS), this is significant degradation. Solution: probabilistic sampling (0.1-1%), or adaptive sampling (increase sampling rate for errors).
Edge Case 6: Loki label cardinality
Loki indexes only labels, not log line content. If pod_name is used as a label, each unique label combination = separate stream. With 1000 Pods = 1000 streams. Solution: use labels with lower cardinality (app, namespace), not pod_name.
Edge Case 7: Thanos/Cortex complexity Thanos adds sidecar (to Prometheus), query gateway, store gateway (S3), compactor, ruler. This is 5+ additional components. For a team of 3 DevOps, this may be overhead. Solution: start with Prometheus + 30-day retention, move to Thanos only when long-term storage is needed.
Edge Case 8: kube-state-metrics API Server load
kube-state-metrics watches all K8s objects via API Server informers. With 5000 Pods, 1000 Services, 500 Deployments, informer cache takes ~500MB RAM. List/Watch operations add load to API Server. Solution: --resources flag to limit watching only needed resources.
Performance Numbers
| Metric | Value |
|---|---|
| Prometheus scrape latency | 5-50ms per target (depends on metric count) |
| Prometheus RAM per 1M series | ~2-3GB |
| Prometheus disk per 1M series/day | ~5-10GB (after compaction) |
| Max series per Prometheus instance | ~10-20 million (depends on RAM) |
| Query latency (simple) | 10-100ms |
| Query latency (complex, 7d range) | 1-10 seconds |
| Alertmanager notification latency | 1-5 seconds (from firing to notification) |
| Loki ingestion latency | 1-3 seconds (from log write to queryable) |
| OpenTelemetry overhead (0.1% sampling) | <0.1% latency increase |
| OpenTelemetry overhead (100% sampling) | 5-15% latency increase |
| kube-state-metrics RAM (5000 Pods) | ~500MB-1GB |
Security
- Prometheus endpoint must not be public —
/metricsexposes internal application structure. Restrict via NetworkPolicy - mTLS for scraping — if using Istio, Prometheus scrape must be excluded from mTLS or use sidecar injection
- RBAC for Prometheus SA — Prometheus Service Account requires
get,list,watchon Pods, Services, Endpoints. Restrict to only needed namespaces - Alertmanager webhook authentication — Slack/PagerDuty webhooks should use authentication tokens, not plaintext URLs
- Loki log sanitization — logs may contain sensitive data (PII, credentials). Use log redaction (Promtail pipeline stages) before sending to Loki
- Thanos/S3 encryption — long-term metric storage in S3 should be encrypted (SSE-S3 or SSE-KMS)
- OpenTelemetry collector authentication — OTLP endpoint should require authentication (API key, mTLS), otherwise anyone can send fake spans
Production War Story
Situation: SaaS platform, 1000-Pod cluster, Prometheus + Grafana + Alertmanager. Application: Java/Spring Boot microservices with OpenTelemetry tracing.
Incident:
- Developer added metric
http_requests_total{path="/users/{id}", user_id="<actual-id>", ...}— user_id was the actual user ID, not a template - In 2 hours, cardinality grew from 500K to 15 million series (100K unique user_id)
- Prometheus RAM usage grew from 8GB to 25GB
- Prometheus OOMKilled (ResourceQuota limit: 20GB)
- Alertmanager stopped receiving alerts — team didn’t learn about the problem
- After 30 minutes, on-duty engineer noticed Grafana dashboards were empty
- Prometheus restarted, but on startup replayed WAL (Write-Ahead Log) -> OOM again -> crash loop
- Monitoring outage for 4 hours until high-cardinality metric was removed and RAM increased
Post-mortem and fix:
- Cardinality guard — Prometheus config
--storage.tsdb.max-block-duration=2h+ alert on series growth rate - Metric naming convention —
path="/users/:id"(template), not specific IDs. Code review for new metrics - Prometheus HA — 2 Prometheus replicas with Alertmanager deduplication
- ResourceQuota for Prometheus namespace — separate namespace with guaranteed resources
- Alert on Prometheus health — external health check (via synthetic monitoring), not dependent on Prometheus
- Thanos for long-term storage — sidecar uploads to S3, even if Prometheus crashed
Monitoring after fix:
# Alert: Prometheus series growth rate
rate(prometheus_tsdb_head_series[1h]) > 100000 # >100K new series/hour
# Alert: Prometheus memory usage
process_resident_memory_bytes{job="prometheus"} / 20e9 > 0.8 # >80% of 20GB
# Alert: Prometheus down (external check)
up{job="prometheus"} == 0
# Alert: Alertmanager not receiving alerts
rate(alertmanager_notifications_total{status="success"}[5m]) == 0
Monitoring (Prometheus/Grafana)
Key metrics for monitoring the monitoring:
# Prometheus health
up{job="prometheus"}
# Series count (cardinality)
prometheus_tsdb_head_series
# Scrape duration
rate(prometheus_target_interval_length_seconds_sum[5m])
/ rate(prometheus_target_interval_length_seconds_count[5m])
# TSDB compaction
rate(prometheus_tsdb_compactions_total[1h])
# Alertmanager alerts
alertmanager_alerts{state="firing"}
# kube-state-metrics latency
kube_state_metrics_list_duration_seconds
Key metrics for application (Golden Signals):
# 1. Latency (p50, p95, p99)
histogram_quantile(0.50, rate(http_server_requests_seconds_bucket[5m]))
histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m]))
histogram_quantile(0.99, rate(http_server_requests_seconds_bucket[5m]))
# 2. Traffic (RPS)
sum(rate(http_server_requests_seconds_count[5m])) by (service)
# 3. Errors (5xx rate)
sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
/ sum(rate(http_server_requests_seconds_count[5m]))
# 4. Saturation (CPU, memory, disk)
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
container_memory_working_set_bytes / container_spec_memory_limit_bytes
Grafana Dashboard panels:
- Golden Signals Overview: Latency p50/p95/p99, Traffic RPS, Error Rate, Saturation (CPU/Memory)
- JVM Metrics: Heap/Non-Heap memory, GC pause time, thread count, class loading
- Kubernetes Overview: Pod status, Deployment replicas, PVC usage, Node resources
- Alerting Overview: Firing alerts by severity, alert rate, alertmanager notification latency
- Tracing Overview (Tempo/Jaeger): Trace count, error trace rate, slowest endpoints
- Prometheus Self-Monitoring: Series count, scrape duration, TSDB size, memory usage
Highload Best Practices
- Golden Signals: Latency, Traffic, Errors, Saturation — always monitor these 4 metrics
- Cardinality management — don’t use high-cardinality labels (user_id, request_id). Use templates:
path="/users/:id" - Prometheus HA — 2 Prometheus replicas + Alertmanager with deduplication
- Scrape interval: 15s for most, 5s for critical — balance between accuracy and load
- Retention: 15 days local + Thanos/Cortex for long-term — S3 storage for compliance and trend analysis
- Alert routing by severity:
- Critical -> PagerDuty (immediately)
- Warning -> Slack (during work hours)
- Info -> Email (daily digest)
- Inhibition rules — if node down, don’t alert on every Pod on that node
- OpenTelemetry sampling: 0.1-1% for production, 100% for errors
- Loki label cardinality — use
app,namespace, notpod_name - Monitor the monitoring — external health check for Prometheus/Alertmanager, not dependent on them
- Prometheus ResourceQuota — separate namespace with guaranteed 20-30GB RAM for 10M series
- Dashboard as code — Grafana dashboards in Git (JSON), deployed via CI/CD
- SLO/SLI tracking — define Service Level Objectives (99.9% availability, p99 < 500ms) and track error budget
- Regular alert review — monthly review of firing alerts, remove noisy alerts, add missing ones
Interview Cheat Sheet
Must know:
- Observability = Metrics (Prometheus) + Logs (Loki/ELK) + Traces (Jaeger/Tempo)
- Prometheus — pull model, time-series DB; scrapes
/metricsevery 15-30 seconds - Golden Signals: Latency, Traffic, Errors, Saturation — always monitor
- kube-prometheus-stack = Prometheus + Grafana + Alertmanager + Node Exporter + kube-state-metrics
- Cardinality explosion — main Prometheus problem (high-cardinality labels = OOM)
- For Java: JVM memory, GC pause, HTTP error rate, request latency (histogram_quantile)
- Alert routing by severity: Critical -> PagerDuty, Warning -> Slack, Info -> digest
Common follow-up questions:
- “Why is cardinality explosion dangerous?” — Each unique label combination = series; millions of series -> OOM
- “Prometheus HA — why?” — Single point of failure; 2 replicas + Alertmanager deduplication
- “Pull vs Push?” — Pull (Prometheus) = simpler service discovery; Push (StatsD) = for batch jobs
- “How to monitor the monitoring?” — External health check for Prometheus/Alertmanager, not dependent on them
Red flags (DO NOT say):
- “Prometheus stores metrics forever” (locally 15-30 days; Thanos/Cortex for long-term)
- “I only monitor CPU/RAM” (need business metrics: error rate, latency, throughput)
- “100% sampling for tracing in production” (5-15% overhead; use 0.1-1%)
- “Alert fatigue is normal” (reduce to 5-10 critical; otherwise team ignores)
Related topics:
- [[Why are health checks needed]] — health endpoints for monitoring
- [[How does scaling work in Kubernetes]] — custom metrics for HPA
- [[What is Kubernetes and why is it needed]] — Control Plane monitoring