Question 10 · Section 14

What is Node in Kubernetes?

If Kubernetes is a construction company:

Language versions: English Russian Ukrainian

Junior Level

Simple Explanation

Node is a worker server where K8s “places” Pods. The Scheduler decides which Node to place each Pod on. In Kubernetes, a Node is the “workhorse” that executes commands from the control center (Control Plane).

Simple Analogy

If Kubernetes is a construction company:

  • Control Plane — managers’ office (makes decisions)
  • Node — construction site (where the real work happens)
  • Pod — crew of workers on the site

What Every Node Has

  1. kubelet — agent that monitors containers and reports to Control Plane
  2. kube-proxy — handles network routing rules
  3. Container Runtime — program for running containers (usually containerd)

Node Types

  • Worker Node — runs applications (Pods)
  • Control Plane Node — manages the cluster (usually doesn’t run applications)

What Does a Node Do?

  • Receives commands from Control Plane
  • Launches Pods
  • Monitors their health
  • Sends resource information (CPU, RAM)

What a Junior Developer Should Remember

  • Node is a server where containers run
  • Each Node has: kubelet, kube-proxy, container runtime
  • Worker Nodes run applications
  • Control Plane Nodes manage the cluster
  • If a Node goes down, K8s waits ~5 minutes (pod-eviction-timeout) to make sure the Node is truly dead, not just temporarily lost network. Only then does it move Pods.

Middle Level

Internal Node Components

kubelet

kubelet — agent on each Node that “reports” container status to Control Plane.

The most important agent on the Node:

  • Monitors Pods assigned to this Node
  • Communicates with the API server
  • Starts and stops containers
  • Performs health checks
  • Sends Node status (Ready, NotReady)

kube-proxy

Network proxy:

  • Maintains routing rules (iptables/IPVS)
  • Enables Service operation (traffic load balancing)
  • Forwards traffic to the right Pods

Container Runtime

Container launch environment:

  • Modern standard: containerd or CRI-O
  • Docker-shim was removed in K8s v1.24+. Instead of Docker, containerd or CRI-O is used.

Node Resources

Capacity:          8 CPU, 32 GiB RAM    (physical resources)
kube-reserved:     1 CPU, 2 GiB RAM     (reserved for K8s)
system-reserved:   0.5 CPU, 1 GiB RAM   (reserved for OS)
─────────────────────────────────────────
Allocatable:       6.5 CPU, 29 GiB RAM  (available for Pods)

Pods can only use Allocatable resources.

Node Conditions

Condition What it means
Ready Node is healthy and can accept Pods
MemoryPressure Low free RAM
DiskPressure Low disk space
PIDPressure Too many processes
NetworkUnavailable Network problem

Eviction

When Node resources run out, kubelet starts removing Pods:

  1. BestEffort (no requests/limits) — removed first
  2. Burstable (requests < limits) — removed second
  3. Guaranteed (requests == limits) — removed last

Diagnostics

# Node information
kubectl describe node <name>

# Resource consumption
kubectl top node

# kubelet logs
journalctl -u kubelet

What a Middle Developer Should Remember

  • kubelet is the key process; its crash paralyzes the Node
  • Always set Resource Requests & Limits
  • Allocatable < Capacity (part is reserved for OS and K8s)
  • Eviction protects the Node from overload
  • kubectl describe node — main diagnostic command

When NOT to Manage Nodes Manually

DON’T manage Nodes manually in production — use managed node groups (EKS, GKE) or IaC (Terraform).


Senior Level

Node as an Infrastructure Abstraction

A Node is the bridge between physical infrastructure and Kubernetes’ abstract world. Understanding how it works is critical for capacity planning, troubleshooting, and optimization.

Node Pressure and Eviction: Deep Analysis

Thresholds

Kubelet uses thresholds to enter pressure mode:

memory.available < 100Mi       → MemoryPressure
nodefs.available < 15%         → DiskPressure
nodefs.inodesFree < 10%        → DiskPressure (Inodes)
pid.available < 5%             → PIDPressure

Eviction Algorithm

1. Kubelet detects pressure condition
2. Marks Node with corresponding Condition
3. Starts eviction process:
   a. Finds Pods to remove (by QoS class)
   b. Graceful termination (SIGTERM → wait → SIGKILL)
   c. Frees resources
4. If pressure persists — repeats

Graceful vs Forceful Eviction

  • Graceful: kubelet sends SIGTERM, waits for terminationGracePeriodSeconds
  • Forceful: If Node is NotReady for longer than pod-eviction-timeout (5 min by default), Control Plane forcefully removes Pods

Node Allocatable: Detailed Analysis

# kubelet configuration
kubeReserved:
  cpu: "500m"
  memory: "1Gi"
  ephemeral-storage: "1Gi"
systemReserved:
  cpu: "200m"
  memory: "500Mi"
evictionHard:
  memory.available: "200Mi"
  nodefs.available: "10%"

Important: kubeReserved and systemReserved don’t enforce real reservation — they are informational values for the scheduler. For hard limits, cgroups are needed.

Highload Considerations

Density

  • By default K8s limits to ~110 Pods per Node
  • Limiting factors:
    • Number of IPs in VPC (AWS)
    • kube-proxy load (iptables rules)
    • etcd load (more Pods = more objects)
    • CPU overhead from kubelet

HugePages

For heavy applications (DBs, in-memory caches):

resources:
  limits:
    hugepages-2Mi: 1Gi
    memory: 4Gi

Requires OS-level configuration:

sysctl -w vm.nr_hugepages=512

Edge Cases

NotReady Node

Timeline:
T+0s:    Node stops sending heartbeats
T+40s:   Node condition = NotReady
T+5m:    Pod eviction begins
T+5m+:   Pods recreated on other Nodes

Tuning:

# kube-controller-manager flags
--node-monitor-period=5s
--node-monitor-grace-period=40s
--pod-eviction-timeout=5m

Kernel Panic

If the kernel crashes — kubelet can’t report it. Control Plane waits for heartbeat timeout. This is the hardest case for automation.

Node Affinity and Taints

Taints (Repulsion)

# Taint a Node
kubectl taint nodes node1 gpu=true:NoSchedule

# Only Pods with matching toleration will run
tolerations:
- key: "gpu"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"

Node Affinity (Attraction)

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: disktype
          operator: In
          values: [ssd]

Node Problem Detector

DaemonSet for monitoring Node health:

  • Monitors kernel, disks, network
  • Reports issues to the API
  • Integrates with Prometheus

Node Autoscaling

Cluster Autoscaler:

  • Adds Nodes when Pods are Pending
  • Removes underutilized Nodes
  • Integrates with cloud provider

Karpenter (AWS):

  • Faster and more flexible
  • Chooses optimal instance type
  • Supports Spot instances

Summary for Senior

  • Node — abstraction over “hardware.” kubelet is the key process.
  • Always configure Resource Requests & Limits for correct eviction.
  • Distinguish Capacity (physical resource) and Allocatable (available for Pods).
  • Eviction algorithm prioritizes by QoS class.
  • At highload: density limits, HugePages, node affinity.
  • NotReady timeout (5 min) — tunable parameter for critical systems.
  • Taints/Tolerations + Node Affinity — tools for specialized Nodes.

Interview Cheat Sheet

Must know:

  • Node — K8s worker server; Worker Node runs Pods, Control Plane manages the cluster
  • Node components: kubelet (agent), kube-proxy (network), container runtime (containerd)
  • Allocatable < Capacity — part of resources is reserved for OS and K8s
  • Eviction removes Pods by QoS class: BestEffort → Burstable → Guaranteed
  • NotReady Node: K8s waits ~5 minutes (pod-eviction-timeout) before moving Pods
  • Taints (repulsion) + Tolerations (admission) + Node Affinity (attraction) — scheduling
  • Cluster Autoscaler / Karpenter — automatic Node addition/removal

Frequent follow-up questions:

  • “What happens if kubelet crashes?” — Pods on Node become unmanaged; Node → NotReady
  • “What is Allocatable?” — Resources available for Pods (Capacity minus OS and K8s reserve)
  • “When does Pod eviction trigger?” — On MemoryPressure, DiskPressure, PIDPressure on Node
  • “How does Cluster Autoscaler differ from Karpenter?” — Karpenter is faster, smarter at choosing instances

Red flags (DO NOT say):

  • “Node = Pod” (Node is the server, Pod is the launch unit on it)
  • “kubelet runs on Control Plane” (kubelet only on Worker Nodes)
  • “Capacity = resources for Pods” (Allocatable, not Capacity)
  • “Docker-shim is still used” (removed in K8s v1.24; containerd/CRI-O)

Related topics:

  • [[What is Pod in Kubernetes]] — what runs on Node
  • [[What is Kubernetes and why is it needed]] — general architecture
  • [[How scaling works in Kubernetes]] — Cluster Autoscaler