Question 8 · Section 14

What is Kubernetes and why is it needed?

("I want 3 copies of the app"), and K8s automatically maintains it.

Language versions: English Russian Ukrainian

Junior Level

Simple Explanation

Kubernetes (K8s) is a container orchestrator. You describe the desired state (“I want 3 copies of the app”), and K8s automatically maintains it.

If a Pod crashes — K8s creates a new one. If load grows — K8s adds copies. This is “self-healing.” If Docker is packaging, then Kubernetes is the conveyor and logistics center.

K8s is NOT needed for: a single application on a single server, a simple static page, a weekend prototype.

Analogy

Imagine an orchestra (from Greek “kybernetes” — helmsman, one who steers):

  • Docker container = one musician
  • Kubernetes = the conductor who knows who plays when, replaces sick musicians, and adds new ones as the audience grows

Main Tasks of Kubernetes

  1. Service Discovery — K8s gives each container an IP address and DNS name, distributes traffic.
  2. Scaling — automatically adds application copies when load increases.
  3. Self-healing — restarts crashed containers, replaces and moves them on failure.
  4. Rolling updates and rollbacks — updates the application without downtime, can rollback on problems.
  5. Secret management — stores passwords and tokens without rebuilding images.

Architecture in Simple Words

Control Plane (Head):

  • kube-apiserver — entry point for all commands
  • etcd — cluster database, stores all configurations and states.
  • scheduler — decides which Node to place each Pod on.
  • controller-manager — monitors cluster state

Worker Nodes (Working servers):

  • kubelet — agent on each Node, reports container status to Control Plane.
  • kube-proxy — handles networking and traffic
  • Container Runtime — container launch environment (containerd)

Why Does K8s Matter for Business?

  • Scalability — K8s monitors Pod CPU/RAM and automatically creates new copies when thresholds are exceeded (Horizontal Pod Autoscaler).
  • Resource optimization — dense “packing” of containers to save money
  • Cloud Agnostic — easy to migrate between providers (GCP, Azure, AWS)

What to Remember

  • Kubernetes is the standard for managing cloud infrastructure
  • Key advantages: Self-healing, Auto-scaling, Declarative management
  • Based on architecture split into Control Plane and Worker Nodes
  • Self-hosted K8s requires deep expertise. Managed K8s (GKE, EKS, AKS) significantly simplifies operations.

Middle Level

Kubernetes Architecture

Control Plane

Component Role What happens if it crashes
kube-apiserver REST API, entry point Can’t manage cluster, running pods continue working
etcd Stores all data Loss of cluster state. Critical component!
kube-scheduler Chooses Node for Pods New pods don’t start, existing ones work
kube-controller-manager Maintains desired state Pods don’t recover after crash

Worker Nodes

Component Role What happens if it crashes
kubelet Agent on Node, manages Pods Pods on this Node become unmanaged
kube-proxy Network rules (iptables/IPVS) Network communication between services breaks
containerd Container runtime All containers on Node stop

Typical Mistakes

Mistake Consequence How to avoid
No requests/limits on Pods One Pod takes all resources, others Killed Always specify resources.requests and resources.limits
Using latest tag Can’t rollback, non-determinism Pin image tags
Single Pod replica No HA, downtime on update Minimum 2 replicas, PodDisruptionBudget
No liveness/readiness probes K8s doesn’t know if the app is alive Configure both probes
Storing state in Pods Data lost on restart Use StatefulSet + PersistentVolume

Main Kubernetes Objects

Object Purpose
Pod Smallest unit — one or more containers
Deployment Manages stateless Pods, rolling updates
Service Stable IP/DNS for a group of Pods (L4 load balancing)
Ingress HTTP routing (L7), SSL termination
ConfigMap Configuration (key-value)
Secret Secrets (base64 encoded)
PersistentVolume Persistent data storage
Namespace Logical resource separation

When is K8s Overkill?

  • Simple monoliths on 1-2 servers
  • Small teams without DevOps engineers
  • Better alternatives: Heroku, AWS Elastic Beanstalk, Docker Swarm

What to Remember

  • Understanding architecture is critical for troubleshooting
  • etcd is the most critical component, requires backups
  • Always configure resources, probes, replicas
  • K8s is overkill for simple monoliths
  • For small teams, Managed K8s (GKE, EKS, AKS) is better

Senior Level

Deep Control Plane Architecture

kube-apiserver

  • Stateless HTTP/HTTPS server, handles REST requests
  • Authentication (X.509, Bearer tokens, OIDC), authorization (RBAC, ABAC, Webhook), Admission Controllers (Mutating + Validating)
  • Scales horizontally behind a load balancer
  • All Control Plane components communicate only through the API Server

etcd

  • Distributed key-value store based on Raft consensus algorithm
  • Requires odd number of nodes (3 or 5) for quorum
  • Criticality: losing etcd = losing entire cluster state
  • Backups: etcdctl snapshot save — mandatory in CI/CD and on schedule
  • Performance: etcd is sensitive to disk latency. Use SSDs. etcd stores all changes — compaction is necessary.

kube-scheduler

  • Multi-pass scheduling: Filter (filters unsuitable Nodes) → Score (ranks remaining)
  • Considers: resource requests, node selectors, taints/tolerations, affinity/anti-affinity, pod topology spread
  • Can be customized through Scheduler Framework plugins

kube-controller-manager

  • Set of controllers, each monitoring its own object:
    • ReplicaSet Controller — maintains desired number of Pods
    • Node Controller — monitors Node state
    • Endpoint Controller — updates Endpoints for Service
    • ServiceAccount Controller — creates default ServiceAccount

Control Plane HA (High Availability)

                    [Load Balancer]
                   /       |        \
          apiserver   apiserver   apiserver
               |          |          |
              etcd ------ etcd ------ etcd   (Raft consensus)
  • Multi-master: 3x API Server + 3/5 etcd + 2x scheduler (active-passive) + 2x controller-manager (active-passive)
  • etcd quorum: with 3 nodes tolerates 1 failure, with 5 — 2 failures
  • etcd latency: < 10ms between nodes, otherwise Raft consensus degrades

Trade-offs

Aspect Self-hosted K8s Managed K8s (GKE/EKS/AKS)
Control Full Limited (no API server tuning)
Complexity Very high Medium
Cost Lower (own infrastructure) Higher (managed fee)
Updates Manual Automatic
etcd management Yourself Provider
SLA Yours 99.95%+

Edge Cases

  • etcd fragmentation: with frequent updates etcd fragments. Periodic defragmentation needed (etcdctl defrag).
  • API Server overload: with 5000+ Pods and frequent updates the API Server can become a bottleneck. Solution: horizontal scaling, tuning --max-requests-inflight.
  • NotReady Node timeout: Node went NotReady. Control Plane waits 5 minutes (pod-eviction-timeout) before recreating Pods. In critical systems the timeout needs tuning.
  • Kernel Panic on Node: Kubelet stops sending heartbeats. K8s can’t always distinguish node crash from network issues. Pods may get stuck in Unknown status.
  • Pod stuck in Terminating: Volume can’t unmount, finalizer doesn’t complete. Solution: kubectl patch pod <name> -p '{"metadata":{"finalizers":null}}'.

Performance and Scaling

Parameter Default Maximum (tested)
Pods per Node 110 250+ (depends on CNI)
Pods in cluster - 150,000
Nodes in cluster - 5,000
Namespaces - Tens of thousands
etcd size - < 8GB (recommended)

Limits:

  • Number of Pods per Node is limited by available IPs (CNI), kubelet load, available PIDs.
  • etcd stores all objects. At > 8GB database size performance degrades.
  • API Server throughput: tuning --max-requests-inflight and --max-mutating-requests-inflight.

Security

Defense in Depth:

  1. Network Policies — microsegmentation, zero-trust networking
  2. RBAC — least privilege
  3. Pod Security Standards — restricted, baseline, privileged
  4. Admission Controllers — OPA/Gatekeeper for policy enforcement
  5. Image Policy — only signed images from trusted registry
  6. Secrets encryption at rest — etcd data encryption
  7. Audit logging — all API requests are logged

Production Story

A large fintech company deployed a self-hosted K8s cluster (10 masters, 50 workers). First 6 months — stable operation. Then: etcd began degrading (fragmentation, latency > 50ms). Cause: frequent Deployment updates (every 5 minutes) + no compaction. Solution: periodic defragmentation, batch updates, monitoring etcd latency. Second incident: API Server overload from a “watch storm” — 10,000 clients reconnected simultaneously after a network blip. Solution: tuning --max-requests-inflight, horizontal API Server scaling, connection multiplexing.

Monitoring

  • Control Plane: API Server latency/p99, etcd disk latency, etcd leader elections, scheduler scheduling duration, controller-manager queue depth
  • Nodes: CPU/Memory pressure, disk pressure, PID pressure, kubelet runtime operations, kube-proxy sync latency
  • Pods: restart count, OOMKilled, CrashLoopBackOff, container waiting time, resource usage vs requests
  • Stack: Prometheus + kube-prometheus-stack (AlertManager, Grafana), cAdvisor for container metrics
  • Golden Signals: latency, traffic, errors, saturation — at cluster, namespace, deployment, pod level

Summary

  • Kubernetes is the standard for managing cloud infrastructure with Control Plane and Worker Nodes split.
  • etcd is the most critical component. Backups are mandatory.
  • Key advantages: Self-healing, Auto-scaling, Declarative management.
  • Self-hosted K8s requires a team of 3-5 SRE engineers. Managed K8s reduces operational overhead.
  • Always configure: resource requests/limits, liveness/readiness probes, PodDisruptionBudget, NetworkPolicies.
  • K8s is overkill for monoliths. Indispensable for microservices at scale.
  • Understanding internal architecture (API Server → etcd → Scheduler → Controller → Kubelet) is critical for troubleshooting.

Interview Cheat Sheet

Must know:

  • Kubernetes — container orchestrator: self-healing, auto-scaling, declarative management
  • Control Plane: API Server (entry point), etcd (storage), Scheduler, Controller Manager
  • Worker Nodes: kubelet (agent), kube-proxy (network), container runtime (containerd)
  • Pod — smallest launch unit; Service — stable address; Deployment — manages replicas
  • etcd — most critical component; losing etcd = losing cluster state
  • Self-hosted K8s requires an SRE team; Managed (GKE/EKS/AKS) is significantly simpler
  • K8s is overkill for monoliths; indispensable for microservices at scale

Frequent follow-up questions:

  • “What happens if etcd crashes?” — Loss of entire cluster state; backups are mandatory
  • “Why is kube-apiserver stateless?” — Can scale horizontally behind a load balancer
  • “K8s for a single application?” — Overkill; better with Heroku, ECS, or a simple server
  • “What is a reconciliation loop?” — K8s constantly compares desired state with actual and corrects discrepancies

Red flags (DO NOT say):

  • “Every project needs K8s” (overkill for monoliths and small teams)
  • “etcd doesn’t need backups” (losing etcd = total cluster loss)
  • “Kubelet runs on Control Plane” (kubelet only on Worker Nodes)
  • “K8s is secure by itself” (requires RBAC, NetworkPolicies, Pod Security)

Related topics:

  • [[What is Pod in Kubernetes]] — smallest launch unit
  • [[What is Service in Kubernetes]] — network abstraction
  • [[How scaling works in Kubernetes]] — HPA, VPA, Cluster Autoscaler