What is Kubernetes and why is it needed?
("I want 3 copies of the app"), and K8s automatically maintains it.
Junior Level
Simple Explanation
Kubernetes (K8s) is a container orchestrator. You describe the desired state (“I want 3 copies of the app”), and K8s automatically maintains it.
If a Pod crashes — K8s creates a new one. If load grows — K8s adds copies. This is “self-healing.” If Docker is packaging, then Kubernetes is the conveyor and logistics center.
K8s is NOT needed for: a single application on a single server, a simple static page, a weekend prototype.
Analogy
Imagine an orchestra (from Greek “kybernetes” — helmsman, one who steers):
- Docker container = one musician
- Kubernetes = the conductor who knows who plays when, replaces sick musicians, and adds new ones as the audience grows
Main Tasks of Kubernetes
- Service Discovery — K8s gives each container an IP address and DNS name, distributes traffic.
- Scaling — automatically adds application copies when load increases.
- Self-healing — restarts crashed containers, replaces and moves them on failure.
- Rolling updates and rollbacks — updates the application without downtime, can rollback on problems.
- Secret management — stores passwords and tokens without rebuilding images.
Architecture in Simple Words
Control Plane (Head):
- kube-apiserver — entry point for all commands
- etcd — cluster database, stores all configurations and states.
- scheduler — decides which Node to place each Pod on.
- controller-manager — monitors cluster state
Worker Nodes (Working servers):
- kubelet — agent on each Node, reports container status to Control Plane.
- kube-proxy — handles networking and traffic
- Container Runtime — container launch environment (containerd)
Why Does K8s Matter for Business?
- Scalability — K8s monitors Pod CPU/RAM and automatically creates new copies when thresholds are exceeded (Horizontal Pod Autoscaler).
- Resource optimization — dense “packing” of containers to save money
- Cloud Agnostic — easy to migrate between providers (GCP, Azure, AWS)
What to Remember
- Kubernetes is the standard for managing cloud infrastructure
- Key advantages: Self-healing, Auto-scaling, Declarative management
- Based on architecture split into Control Plane and Worker Nodes
- Self-hosted K8s requires deep expertise. Managed K8s (GKE, EKS, AKS) significantly simplifies operations.
Middle Level
Kubernetes Architecture
Control Plane
| Component | Role | What happens if it crashes |
|---|---|---|
| kube-apiserver | REST API, entry point | Can’t manage cluster, running pods continue working |
| etcd | Stores all data | Loss of cluster state. Critical component! |
| kube-scheduler | Chooses Node for Pods | New pods don’t start, existing ones work |
| kube-controller-manager | Maintains desired state | Pods don’t recover after crash |
Worker Nodes
| Component | Role | What happens if it crashes |
|---|---|---|
| kubelet | Agent on Node, manages Pods | Pods on this Node become unmanaged |
| kube-proxy | Network rules (iptables/IPVS) | Network communication between services breaks |
| containerd | Container runtime | All containers on Node stop |
Typical Mistakes
| Mistake | Consequence | How to avoid |
|---|---|---|
| No requests/limits on Pods | One Pod takes all resources, others Killed | Always specify resources.requests and resources.limits |
Using latest tag |
Can’t rollback, non-determinism | Pin image tags |
| Single Pod replica | No HA, downtime on update | Minimum 2 replicas, PodDisruptionBudget |
| No liveness/readiness probes | K8s doesn’t know if the app is alive | Configure both probes |
| Storing state in Pods | Data lost on restart | Use StatefulSet + PersistentVolume |
Main Kubernetes Objects
| Object | Purpose |
|---|---|
| Pod | Smallest unit — one or more containers |
| Deployment | Manages stateless Pods, rolling updates |
| Service | Stable IP/DNS for a group of Pods (L4 load balancing) |
| Ingress | HTTP routing (L7), SSL termination |
| ConfigMap | Configuration (key-value) |
| Secret | Secrets (base64 encoded) |
| PersistentVolume | Persistent data storage |
| Namespace | Logical resource separation |
When is K8s Overkill?
- Simple monoliths on 1-2 servers
- Small teams without DevOps engineers
- Better alternatives: Heroku, AWS Elastic Beanstalk, Docker Swarm
What to Remember
- Understanding architecture is critical for troubleshooting
- etcd is the most critical component, requires backups
- Always configure resources, probes, replicas
- K8s is overkill for simple monoliths
- For small teams, Managed K8s (GKE, EKS, AKS) is better
Senior Level
Deep Control Plane Architecture
kube-apiserver
- Stateless HTTP/HTTPS server, handles REST requests
- Authentication (X.509, Bearer tokens, OIDC), authorization (RBAC, ABAC, Webhook), Admission Controllers (Mutating + Validating)
- Scales horizontally behind a load balancer
- All Control Plane components communicate only through the API Server
etcd
- Distributed key-value store based on Raft consensus algorithm
- Requires odd number of nodes (3 or 5) for quorum
- Criticality: losing etcd = losing entire cluster state
- Backups:
etcdctl snapshot save— mandatory in CI/CD and on schedule - Performance: etcd is sensitive to disk latency. Use SSDs. etcd stores all changes — compaction is necessary.
kube-scheduler
- Multi-pass scheduling: Filter (filters unsuitable Nodes) → Score (ranks remaining)
- Considers: resource requests, node selectors, taints/tolerations, affinity/anti-affinity, pod topology spread
- Can be customized through Scheduler Framework plugins
kube-controller-manager
- Set of controllers, each monitoring its own object:
- ReplicaSet Controller — maintains desired number of Pods
- Node Controller — monitors Node state
- Endpoint Controller — updates Endpoints for Service
- ServiceAccount Controller — creates default ServiceAccount
Control Plane HA (High Availability)
[Load Balancer]
/ | \
apiserver apiserver apiserver
| | |
etcd ------ etcd ------ etcd (Raft consensus)
- Multi-master: 3x API Server + 3/5 etcd + 2x scheduler (active-passive) + 2x controller-manager (active-passive)
- etcd quorum: with 3 nodes tolerates 1 failure, with 5 — 2 failures
- etcd latency: < 10ms between nodes, otherwise Raft consensus degrades
Trade-offs
| Aspect | Self-hosted K8s | Managed K8s (GKE/EKS/AKS) |
|---|---|---|
| Control | Full | Limited (no API server tuning) |
| Complexity | Very high | Medium |
| Cost | Lower (own infrastructure) | Higher (managed fee) |
| Updates | Manual | Automatic |
| etcd management | Yourself | Provider |
| SLA | Yours | 99.95%+ |
Edge Cases
- etcd fragmentation: with frequent updates etcd fragments. Periodic defragmentation needed (
etcdctl defrag). - API Server overload: with 5000+ Pods and frequent updates the API Server can become a bottleneck. Solution: horizontal scaling, tuning
--max-requests-inflight. - NotReady Node timeout: Node went NotReady. Control Plane waits 5 minutes (
pod-eviction-timeout) before recreating Pods. In critical systems the timeout needs tuning. - Kernel Panic on Node: Kubelet stops sending heartbeats. K8s can’t always distinguish node crash from network issues. Pods may get stuck in Unknown status.
- Pod stuck in Terminating: Volume can’t unmount, finalizer doesn’t complete. Solution:
kubectl patch pod <name> -p '{"metadata":{"finalizers":null}}'.
Performance and Scaling
| Parameter | Default | Maximum (tested) |
|---|---|---|
| Pods per Node | 110 | 250+ (depends on CNI) |
| Pods in cluster | - | 150,000 |
| Nodes in cluster | - | 5,000 |
| Namespaces | - | Tens of thousands |
| etcd size | - | < 8GB (recommended) |
Limits:
- Number of Pods per Node is limited by available IPs (CNI), kubelet load, available PIDs.
- etcd stores all objects. At > 8GB database size performance degrades.
- API Server throughput: tuning
--max-requests-inflightand--max-mutating-requests-inflight.
Security
Defense in Depth:
- Network Policies — microsegmentation, zero-trust networking
- RBAC — least privilege
- Pod Security Standards — restricted, baseline, privileged
- Admission Controllers — OPA/Gatekeeper for policy enforcement
- Image Policy — only signed images from trusted registry
- Secrets encryption at rest — etcd data encryption
- Audit logging — all API requests are logged
Production Story
A large fintech company deployed a self-hosted K8s cluster (10 masters, 50 workers). First 6 months — stable operation. Then: etcd began degrading (fragmentation, latency > 50ms). Cause: frequent Deployment updates (every 5 minutes) + no compaction. Solution: periodic defragmentation, batch updates, monitoring etcd latency. Second incident: API Server overload from a “watch storm” — 10,000 clients reconnected simultaneously after a network blip. Solution: tuning --max-requests-inflight, horizontal API Server scaling, connection multiplexing.
Monitoring
- Control Plane: API Server latency/p99, etcd disk latency, etcd leader elections, scheduler scheduling duration, controller-manager queue depth
- Nodes: CPU/Memory pressure, disk pressure, PID pressure, kubelet runtime operations, kube-proxy sync latency
- Pods: restart count, OOMKilled, CrashLoopBackOff, container waiting time, resource usage vs requests
- Stack: Prometheus + kube-prometheus-stack (AlertManager, Grafana), cAdvisor for container metrics
- Golden Signals: latency, traffic, errors, saturation — at cluster, namespace, deployment, pod level
Summary
- Kubernetes is the standard for managing cloud infrastructure with Control Plane and Worker Nodes split.
- etcd is the most critical component. Backups are mandatory.
- Key advantages: Self-healing, Auto-scaling, Declarative management.
- Self-hosted K8s requires a team of 3-5 SRE engineers. Managed K8s reduces operational overhead.
- Always configure: resource requests/limits, liveness/readiness probes, PodDisruptionBudget, NetworkPolicies.
- K8s is overkill for monoliths. Indispensable for microservices at scale.
- Understanding internal architecture (API Server → etcd → Scheduler → Controller → Kubelet) is critical for troubleshooting.
Interview Cheat Sheet
Must know:
- Kubernetes — container orchestrator: self-healing, auto-scaling, declarative management
- Control Plane: API Server (entry point), etcd (storage), Scheduler, Controller Manager
- Worker Nodes: kubelet (agent), kube-proxy (network), container runtime (containerd)
- Pod — smallest launch unit; Service — stable address; Deployment — manages replicas
- etcd — most critical component; losing etcd = losing cluster state
- Self-hosted K8s requires an SRE team; Managed (GKE/EKS/AKS) is significantly simpler
- K8s is overkill for monoliths; indispensable for microservices at scale
Frequent follow-up questions:
- “What happens if etcd crashes?” — Loss of entire cluster state; backups are mandatory
- “Why is kube-apiserver stateless?” — Can scale horizontally behind a load balancer
- “K8s for a single application?” — Overkill; better with Heroku, ECS, or a simple server
- “What is a reconciliation loop?” — K8s constantly compares desired state with actual and corrects discrepancies
Red flags (DO NOT say):
- “Every project needs K8s” (overkill for monoliths and small teams)
- “etcd doesn’t need backups” (losing etcd = total cluster loss)
- “Kubelet runs on Control Plane” (kubelet only on Worker Nodes)
- “K8s is secure by itself” (requires RBAC, NetworkPolicies, Pod Security)
Related topics:
- [[What is Pod in Kubernetes]] — smallest launch unit
- [[What is Service in Kubernetes]] — network abstraction
- [[How scaling works in Kubernetes]] — HPA, VPA, Cluster Autoscaler