Question 1 · Section 14

What is containerization and why is it needed?

You write a Java application. It works on your machine, but when you pass it to a colleague or deploy it to a server — errors occur: "I have a different Java version", "missing...

Language versions: English Russian Ukrainian

Junior Level

Simple Explanation

Containerization is a technology for packaging an application together with all its dependencies (libraries, configurations, system utilities) into a single isolated block — a container.

A container is not a file and not an archive. It is a regular Linux process that the kernel allows to see only its own files, its own network, and its own processes (through namespaces and cgroups).

You write a Java application. It works on your machine, but when you pass it to a colleague or deploy it to a server — errors occur: “I have a different Java version”, “missing library”, “different environment settings”. Containerization solves this problem: you package everything needed into a single image, and it works the same everywhere.

An image is a template/class (like a class in OOP). A container is a running instance of an image (like an object).

Analogy

A container is like a shipping container. It doesn’t matter what’s inside (electronics, clothes, food) — cranes and ships handle it the same way. Similarly with software containers: it doesn’t matter what application is inside (Java, Python, Node.js) — Docker handles them uniformly.

Example

# Simple Dockerfile for a Java application
FROM openjdk:17-jdk-slim
COPY myapp.jar /app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "/app.jar"]
# Build and run
docker build -t myapp .
docker run -p 8080:8080 myapp

Key Benefits

  1. “Works on my machine” is no longer a problem — the same image runs on the developer’s machine, test environment, and production.
  2. Fast startup — a container starts in seconds because it doesn’t need to boot its own OS — it uses the already running host kernel. VMs boot a full OS.
  3. Lightweight — containers take megabytes, not gigabytes.
  4. Isolation — applications don’t conflict with each other over libraries and versions.

What to Remember

  • Container = application + all its dependencies in one package
  • Containers run the same on any machine with Docker
  • Containers are lightweight and fast
  • Kubernetes is used to manage many containers

When NOT to Use Containers

  1. GUI applications — containers are optimized for headless services
  2. Realtime systems with strict kernel requirements
  3. Applications requiring full isolation — VMs are better

Middle Level

How Containerization Works Under the Hood

Containerization is based on two key mechanisms of the Linux kernel:

1. Namespaces — provide isolation

Each container gets its own isolated space:

Namespace What it isolates
PID Processes (container sees only its own processes)
NET Network (its own network interfaces and ports)
MNT Filesystem (its own mount point)
UTS Hostname
IPC Inter-process communication
USER Users and groups

2. Control Groups (cgroups) — provide resource limits

Cgroups allow setting hard limits: how much CPU and RAM a container can use, disk I/O limits, network limits. Without cgroups, one “greedy” container could take all system resources.

Container vs Virtual Machine

Characteristic Containers Virtual Machines
Architecture Shared OS kernel Each VM has its own Guest OS
Isolation Process-level Hardware-level
Startup speed Seconds Minutes
Image size Tens/hundreds of MB Gigabytes
Performance Nearly native Hypervisor overhead

Typical Mistakes

Mistake Consequence How to avoid
Storing data inside container Data lost on restart Use Volumes
Running as root Security risk Use USER nonroot
Using latest tag Non-deterministic builds Pin image versions
Ignoring .dockerignore Slow build, extra files Create .dockerignore

Volume — external storage that survives container restarts.

Why Containerization Matters in Real Projects

  1. Immutable Infrastructure — we don’t change code on the server. We create a new image, test it, and deploy. Rollback is instant.
  2. Environments Parity — developer, QA, and Production use the same artifact.
  3. Microservices architecture — containers are ideal for running hundreds of small independent components.
  4. CI/CD pipelines — containers became the standard delivery method: build → test → image → deploy.

What to Remember

  • Containerization is based on Namespaces (isolation) and Cgroups (resources)
  • Containers are lighter and faster than VMs but have less strict isolation
  • Containerization is the foundation of microservices and CI/CD
  • Main risk: shared kernel and state management complexity

Senior Level

Deep Internal Architecture

A container is a regular Linux process restricted through kernel system calls. No emulation, no intermediate layers. The sequence when starting a container:

  1. Clone syscall with flags CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC — creates a process with new namespaces.
  2. Chroot/pivot_root — changes the container’s root filesystem.
  3. Cgroups setup — sets CPU, memory, I/O limits via cgroup v2 (unified hierarchy).
  4. Seccomp/AppArmor — applies security profiles, restricting available syscalls.

Trade-offs

Aspect Advantage Disadvantage
Shared kernel Minimal overhead (< 1% CPU) Kernel vulnerability affects all containers
Ephemeral nature Easy to replace, scale Cannot store state inside
UnionFS (Overlay2) Layer reuse, disk/RAM savings Disk I/O overhead with many layers
Network namespace Network isolation Complexity: NAT, DNS, service discovery

Edge Cases

  • Container Escape: through kernel vulnerabilities (Dirty COW, CVE-2019-5736 runc), misconfigured capabilities, mounting /proc//sys. Mitigation: seccomp profiles, AppArmor/SELinux, read-only rootfs, drop all capabilities.
  • Zombie processes: PID 1 process in a container doesn’t receive SIGTERM by default and doesn’t reap child processes. Solution: exec form or tini.
  • Clock skew: containers may have clock desynchronization with the host, critical for TLS and distributed consensus.
  • Inode exhaustion: containers with many small files can exhaust inodes on the host, even if disk space is available.

Performance

Metric Containers VM
CPU overhead < 1% 5-15%
Memory overhead Few MB per container GB per Guest OS
Network Nearly native NIC virtualization
Disk I/O Small overhead from Overlay2 Storage controller virtualization
Density 100+ per server 10-30 per server
CPU utilization 60-80% 20-40%

Production Security

# Production-ready approach
FROM gcr.io/distroless/java17-debian12  # minimal attack surface
RUN addgroup --system appgroup && adduser --system appuser --ingroup appgroup
USER appuser                              # non-root
COPY --chown=appuser:appgroup app.jar /app.jar
# Read-only filesystem configured in docker run / K8s
# Seccomp profile, AppArmor profile — at runtime level

Key principles:

  • Don’t use --privileged
  • Drop all capabilities, add only needed ones
  • Read-only root filesystem where possible
  • Image scanning in CI/CD (Trivy, Snyk)
  • Runtime protection (Falco)

Production Story

A company migrated a monolith from VMs to containers. Result: deployment time reduced from 45 minutes to 90 seconds, packing density grew 4x, infrastructure costs dropped 40%. But it required: reworking logging (stdout → ELK), setting up health checks, implementing centralized monitoring (Prometheus + Grafana), rewriting state management (external volumes), training the team on Kubernetes. Containerization is not just technology, but also process change.

Monitoring

  • Golden Signals: latency, traffic, errors, saturation
  • Tools: Prometheus (metrics), cAdvisor (container metrics), Jaeger (tracing), ELK/EFK (logs)
  • Key metrics: restart count, memory usage vs limit, CPU throttling, network I/O, container uptime
  • Alerting: OOMKilled, CrashLoopBackOff, HighRestartRate

Summary

  • Containerization is the modern software delivery standard. Foundation: Namespaces + Cgroups.
  • Main advantage: reproducibility and speed. Main risk: shared kernel and state management complexity.
  • Containers affect architecture: applications must be ephemeral, externally configurable, self-healing.
  • At scale, an orchestrator (Kubernetes) is required, adding its own complexity.
  • For multi-tenancy and strict compliance, consider MicroVMs (Firecracker, Kata Containers).

Interview Cheat Sheet

Must know:

  • Container = Linux process, restricted via namespaces + cgroups
  • Namespaces provide isolation (PID, NET, MNT), cgroups — resource limits
  • Containers share host kernel, VMs have their own — hence the speed and size difference
  • Image = template (class), container = running instance (object)
  • Containers are ephemeral — state is stored in external volumes
  • For production: non-root user, read-only FS, image scanning
  • Containerization is the foundation of microservices and CI/CD

Frequent follow-up questions:

  • “Why are containers faster than VMs?” — No Guest OS boot, shared kernel, starts in seconds
  • “What is a namespace?” — Linux kernel mechanism that isolates process resources (PID, network, FS)
  • “Can you run a Linux container on Windows?” — Yes, via WSL2 or a VM with Linux kernel
  • “What is Overlay2?” — Layered FS that reuses base layers between images

Red flags (DO NOT say):

  • “A container is a lightweight VM” (no, fundamentally different architecture)
  • “Containers are fully isolated” (shared kernel = container escape risk)
  • “Data inside containers persists” (ephemeral, need volumes)
  • “Container = a file” (it’s a Linux process with namespaces/cgroups)

Related topics:

  • [[What is the difference between container and virtual machine]] — detailed comparison
  • [[What is Dockerfile]] — how to create an image
  • [[What is Kubernetes and why is it needed]] — container orchestration