What is Kubernetes and why is it needed?

Junior Level

Simple Explanation

Kubernetes (K8s) is a container orchestrator. You describe the desired state (“I want 3 copies of the app”), and K8s automatically maintains it.

If a Pod crashes — K8s creates a new one. If load grows — K8s adds copies. This is “self-healing.” If Docker is packaging, then Kubernetes is the conveyor and logistics center.

K8s is NOT needed for: a single application on a single server, a simple static page, a weekend prototype.

Analogy

Imagine an orchestra (from Greek “kybernetes” — helmsman, one who steers):

Docker container = one musician
Kubernetes = the conductor who knows who plays when, replaces sick musicians, and adds new ones as the audience grows

Main Tasks of Kubernetes

Service Discovery — K8s gives each container an IP address and DNS name, distributes traffic.
Scaling — automatically adds application copies when load increases.
Self-healing — restarts crashed containers, replaces and moves them on failure.
Rolling updates and rollbacks — updates the application without downtime, can rollback on problems.
Secret management — stores passwords and tokens without rebuilding images.

Architecture in Simple Words

Control Plane (Head):

kube-apiserver — entry point for all commands
etcd — cluster database, stores all configurations and states.
scheduler — decides which Node to place each Pod on.
controller-manager — monitors cluster state

Worker Nodes (Working servers):

kubelet — agent on each Node, reports container status to Control Plane.
kube-proxy — handles networking and traffic
Container Runtime — container launch environment (containerd)

Why Does K8s Matter for Business?

Scalability — K8s monitors Pod CPU/RAM and automatically creates new copies when thresholds are exceeded (Horizontal Pod Autoscaler).
Resource optimization — dense “packing” of containers to save money
Cloud Agnostic — easy to migrate between providers (GCP, Azure, AWS)

What to Remember

Kubernetes is the standard for managing cloud infrastructure
Key advantages: Self-healing, Auto-scaling, Declarative management
Based on architecture split into Control Plane and Worker Nodes
Self-hosted K8s requires deep expertise. Managed K8s (GKE, EKS, AKS) significantly simplifies operations.

Middle Level

Kubernetes Architecture

Control Plane

Component	Role	What happens if it crashes
kube-apiserver	REST API, entry point	Can’t manage cluster, running pods continue working
etcd	Stores all data	Loss of cluster state. Critical component!
kube-scheduler	Chooses Node for Pods	New pods don’t start, existing ones work
kube-controller-manager	Maintains desired state	Pods don’t recover after crash

Worker Nodes

Component	Role	What happens if it crashes
kubelet	Agent on Node, manages Pods	Pods on this Node become unmanaged
kube-proxy	Network rules (iptables/IPVS)	Network communication between services breaks
containerd	Container runtime	All containers on Node stop

Typical Mistakes

Mistake	Consequence	How to avoid
No requests/limits on Pods	One Pod takes all resources, others Killed	Always specify `resources.requests` and `resources.limits`
Using `latest` tag	Can’t rollback, non-determinism	Pin image tags
Single Pod replica	No HA, downtime on update	Minimum 2 replicas, PodDisruptionBudget
No liveness/readiness probes	K8s doesn’t know if the app is alive	Configure both probes
Storing state in Pods	Data lost on restart	Use StatefulSet + PersistentVolume

Main Kubernetes Objects

Object	Purpose
Pod	Smallest unit — one or more containers
Deployment	Manages stateless Pods, rolling updates
Service	Stable IP/DNS for a group of Pods (L4 load balancing)
Ingress	HTTP routing (L7), SSL termination
ConfigMap	Configuration (key-value)
Secret	Secrets (base64 encoded)
PersistentVolume	Persistent data storage
Namespace	Logical resource separation

When is K8s Overkill?

Simple monoliths on 1-2 servers
Small teams without DevOps engineers
Better alternatives: Heroku, AWS Elastic Beanstalk, Docker Swarm

What to Remember

Understanding architecture is critical for troubleshooting
etcd is the most critical component, requires backups
Always configure resources, probes, replicas
K8s is overkill for simple monoliths
For small teams, Managed K8s (GKE, EKS, AKS) is better

Senior Level

Deep Control Plane Architecture

kube-apiserver

Stateless HTTP/HTTPS server, handles REST requests
Authentication (X.509, Bearer tokens, OIDC), authorization (RBAC, ABAC, Webhook), Admission Controllers (Mutating + Validating)
Scales horizontally behind a load balancer
All Control Plane components communicate only through the API Server

etcd

Distributed key-value store based on Raft consensus algorithm
Requires odd number of nodes (3 or 5) for quorum
Criticality: losing etcd = losing entire cluster state
Backups: etcdctl snapshot save — mandatory in CI/CD and on schedule
Performance: etcd is sensitive to disk latency. Use SSDs. etcd stores all changes — compaction is necessary.

kube-scheduler

Multi-pass scheduling: Filter (filters unsuitable Nodes) → Score (ranks remaining)
Considers: resource requests, node selectors, taints/tolerations, affinity/anti-affinity, pod topology spread
Can be customized through Scheduler Framework plugins

kube-controller-manager

Set of controllers, each monitoring its own object:
- ReplicaSet Controller — maintains desired number of Pods
- Node Controller — monitors Node state
- Endpoint Controller — updates Endpoints for Service
- ServiceAccount Controller — creates default ServiceAccount

Control Plane HA (High Availability)

                    [Load Balancer]
                   /       |        \
          apiserver   apiserver   apiserver
               |          |          |
              etcd ------ etcd ------ etcd   (Raft consensus)

Multi-master: 3x API Server + 3/5 etcd + 2x scheduler (active-passive) + 2x controller-manager (active-passive)
etcd quorum: with 3 nodes tolerates 1 failure, with 5 — 2 failures
etcd latency: < 10ms between nodes, otherwise Raft consensus degrades

Trade-offs

Aspect	Self-hosted K8s	Managed K8s (GKE/EKS/AKS)
Control	Full	Limited (no API server tuning)
Complexity	Very high	Medium
Cost	Lower (own infrastructure)	Higher (managed fee)
Updates	Manual	Automatic
etcd management	Yourself	Provider
SLA	Yours	99.95%+

Edge Cases

etcd fragmentation: with frequent updates etcd fragments. Periodic defragmentation needed (etcdctl defrag).
API Server overload: with 5000+ Pods and frequent updates the API Server can become a bottleneck. Solution: horizontal scaling, tuning --max-requests-inflight.
NotReady Node timeout: Node went NotReady. Control Plane waits 5 minutes (pod-eviction-timeout) before recreating Pods. In critical systems the timeout needs tuning.
Kernel Panic on Node: Kubelet stops sending heartbeats. K8s can’t always distinguish node crash from network issues. Pods may get stuck in Unknown status.
Pod stuck in Terminating: Volume can’t unmount, finalizer doesn’t complete. Solution: kubectl patch pod <name> -p '{"metadata":{"finalizers":null}}'.

Performance and Scaling

Parameter	Default	Maximum (tested)
Pods per Node	110	250+ (depends on CNI)
Pods in cluster	-	150,000
Nodes in cluster	-	5,000
Namespaces	-	Tens of thousands
etcd size	-	< 8GB (recommended)

Limits:

Number of Pods per Node is limited by available IPs (CNI), kubelet load, available PIDs.
etcd stores all objects. At > 8GB database size performance degrades.
API Server throughput: tuning --max-requests-inflight and --max-mutating-requests-inflight.

Security

Defense in Depth:

Network Policies — microsegmentation, zero-trust networking
RBAC — least privilege
Pod Security Standards — restricted, baseline, privileged
Admission Controllers — OPA/Gatekeeper for policy enforcement
Image Policy — only signed images from trusted registry
Secrets encryption at rest — etcd data encryption
Audit logging — all API requests are logged

Production Story

A large fintech company deployed a self-hosted K8s cluster (10 masters, 50 workers). First 6 months — stable operation. Then: etcd began degrading (fragmentation, latency > 50ms). Cause: frequent Deployment updates (every 5 minutes) + no compaction. Solution: periodic defragmentation, batch updates, monitoring etcd latency. Second incident: API Server overload from a “watch storm” — 10,000 clients reconnected simultaneously after a network blip. Solution: tuning --max-requests-inflight, horizontal API Server scaling, connection multiplexing.

Monitoring

Control Plane: API Server latency/p99, etcd disk latency, etcd leader elections, scheduler scheduling duration, controller-manager queue depth
Nodes: CPU/Memory pressure, disk pressure, PID pressure, kubelet runtime operations, kube-proxy sync latency
Pods: restart count, OOMKilled, CrashLoopBackOff, container waiting time, resource usage vs requests
Stack: Prometheus + kube-prometheus-stack (AlertManager, Grafana), cAdvisor for container metrics
Golden Signals: latency, traffic, errors, saturation — at cluster, namespace, deployment, pod level

Summary

Kubernetes is the standard for managing cloud infrastructure with Control Plane and Worker Nodes split.
etcd is the most critical component. Backups are mandatory.
Key advantages: Self-healing, Auto-scaling, Declarative management.
Self-hosted K8s requires a team of 3-5 SRE engineers. Managed K8s reduces operational overhead.
Always configure: resource requests/limits, liveness/readiness probes, PodDisruptionBudget, NetworkPolicies.
K8s is overkill for monoliths. Indispensable for microservices at scale.
Understanding internal architecture (API Server → etcd → Scheduler → Controller → Kubelet) is critical for troubleshooting.

Interview Cheat Sheet

Must know:

Kubernetes — container orchestrator: self-healing, auto-scaling, declarative management
Control Plane: API Server (entry point), etcd (storage), Scheduler, Controller Manager
Worker Nodes: kubelet (agent), kube-proxy (network), container runtime (containerd)
Pod — smallest launch unit; Service — stable address; Deployment — manages replicas
etcd — most critical component; losing etcd = losing cluster state
Self-hosted K8s requires an SRE team; Managed (GKE/EKS/AKS) is significantly simpler
K8s is overkill for monoliths; indispensable for microservices at scale

Frequent follow-up questions:

“What happens if etcd crashes?” — Loss of entire cluster state; backups are mandatory
“Why is kube-apiserver stateless?” — Can scale horizontally behind a load balancer
“K8s for a single application?” — Overkill; better with Heroku, ECS, or a simple server
“What is a reconciliation loop?” — K8s constantly compares desired state with actual and corrects discrepancies

Red flags (DO NOT say):

“Every project needs K8s” (overkill for monoliths and small teams)
“etcd doesn’t need backups” (losing etcd = total cluster loss)
“Kubelet runs on Control Plane” (kubelet only on Worker Nodes)
“K8s is secure by itself” (requires RBAC, NetworkPolicies, Pod Security)

Related topics:

[[What is Pod in Kubernetes]] — smallest launch unit
[[What is Service in Kubernetes]] — network abstraction
[[How scaling works in Kubernetes]] — HPA, VPA, Cluster Autoscaler