What is containerization and why is it needed?
You write a Java application. It works on your machine, but when you pass it to a colleague or deploy it to a server — errors occur: "I have a different Java version", "missing...
Junior Level
Simple Explanation
Containerization is a technology for packaging an application together with all its dependencies (libraries, configurations, system utilities) into a single isolated block — a container.
A container is not a file and not an archive. It is a regular Linux process that the kernel allows to see only its own files, its own network, and its own processes (through namespaces and cgroups).
You write a Java application. It works on your machine, but when you pass it to a colleague or deploy it to a server — errors occur: “I have a different Java version”, “missing library”, “different environment settings”. Containerization solves this problem: you package everything needed into a single image, and it works the same everywhere.
An image is a template/class (like a class in OOP). A container is a running instance of an image (like an object).
Analogy
A container is like a shipping container. It doesn’t matter what’s inside (electronics, clothes, food) — cranes and ships handle it the same way. Similarly with software containers: it doesn’t matter what application is inside (Java, Python, Node.js) — Docker handles them uniformly.
Example
# Simple Dockerfile for a Java application
FROM openjdk:17-jdk-slim
COPY myapp.jar /app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "/app.jar"]
# Build and run
docker build -t myapp .
docker run -p 8080:8080 myapp
Key Benefits
- “Works on my machine” is no longer a problem — the same image runs on the developer’s machine, test environment, and production.
- Fast startup — a container starts in seconds because it doesn’t need to boot its own OS — it uses the already running host kernel. VMs boot a full OS.
- Lightweight — containers take megabytes, not gigabytes.
- Isolation — applications don’t conflict with each other over libraries and versions.
What to Remember
- Container = application + all its dependencies in one package
- Containers run the same on any machine with Docker
- Containers are lightweight and fast
- Kubernetes is used to manage many containers
When NOT to Use Containers
- GUI applications — containers are optimized for headless services
- Realtime systems with strict kernel requirements
- Applications requiring full isolation — VMs are better
Middle Level
How Containerization Works Under the Hood
Containerization is based on two key mechanisms of the Linux kernel:
1. Namespaces — provide isolation
Each container gets its own isolated space:
| Namespace | What it isolates |
|---|---|
PID |
Processes (container sees only its own processes) |
NET |
Network (its own network interfaces and ports) |
MNT |
Filesystem (its own mount point) |
UTS |
Hostname |
IPC |
Inter-process communication |
USER |
Users and groups |
2. Control Groups (cgroups) — provide resource limits
Cgroups allow setting hard limits: how much CPU and RAM a container can use, disk I/O limits, network limits. Without cgroups, one “greedy” container could take all system resources.
Container vs Virtual Machine
| Characteristic | Containers | Virtual Machines |
|---|---|---|
| Architecture | Shared OS kernel | Each VM has its own Guest OS |
| Isolation | Process-level | Hardware-level |
| Startup speed | Seconds | Minutes |
| Image size | Tens/hundreds of MB | Gigabytes |
| Performance | Nearly native | Hypervisor overhead |
Typical Mistakes
| Mistake | Consequence | How to avoid |
|---|---|---|
| Storing data inside container | Data lost on restart | Use Volumes |
| Running as root | Security risk | Use USER nonroot |
Using latest tag |
Non-deterministic builds | Pin image versions |
Ignoring .dockerignore |
Slow build, extra files | Create .dockerignore |
Volume — external storage that survives container restarts.
Why Containerization Matters in Real Projects
- Immutable Infrastructure — we don’t change code on the server. We create a new image, test it, and deploy. Rollback is instant.
- Environments Parity — developer, QA, and Production use the same artifact.
- Microservices architecture — containers are ideal for running hundreds of small independent components.
- CI/CD pipelines — containers became the standard delivery method: build → test → image → deploy.
What to Remember
- Containerization is based on Namespaces (isolation) and Cgroups (resources)
- Containers are lighter and faster than VMs but have less strict isolation
- Containerization is the foundation of microservices and CI/CD
- Main risk: shared kernel and state management complexity
Senior Level
Deep Internal Architecture
A container is a regular Linux process restricted through kernel system calls. No emulation, no intermediate layers. The sequence when starting a container:
- Clone syscall with flags
CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC— creates a process with new namespaces. - Chroot/pivot_root — changes the container’s root filesystem.
- Cgroups setup — sets CPU, memory, I/O limits via cgroup v2 (unified hierarchy).
- Seccomp/AppArmor — applies security profiles, restricting available syscalls.
Trade-offs
| Aspect | Advantage | Disadvantage |
|---|---|---|
| Shared kernel | Minimal overhead (< 1% CPU) | Kernel vulnerability affects all containers |
| Ephemeral nature | Easy to replace, scale | Cannot store state inside |
| UnionFS (Overlay2) | Layer reuse, disk/RAM savings | Disk I/O overhead with many layers |
| Network namespace | Network isolation | Complexity: NAT, DNS, service discovery |
Edge Cases
- Container Escape: through kernel vulnerabilities (Dirty COW, CVE-2019-5736 runc), misconfigured capabilities, mounting
/proc//sys. Mitigation: seccomp profiles, AppArmor/SELinux, read-only rootfs, drop all capabilities. - Zombie processes: PID 1 process in a container doesn’t receive SIGTERM by default and doesn’t reap child processes. Solution: exec form or tini.
- Clock skew: containers may have clock desynchronization with the host, critical for TLS and distributed consensus.
- Inode exhaustion: containers with many small files can exhaust inodes on the host, even if disk space is available.
Performance
| Metric | Containers | VM |
|---|---|---|
| CPU overhead | < 1% | 5-15% |
| Memory overhead | Few MB per container | GB per Guest OS |
| Network | Nearly native | NIC virtualization |
| Disk I/O | Small overhead from Overlay2 | Storage controller virtualization |
| Density | 100+ per server | 10-30 per server |
| CPU utilization | 60-80% | 20-40% |
Production Security
# Production-ready approach
FROM gcr.io/distroless/java17-debian12 # minimal attack surface
RUN addgroup --system appgroup && adduser --system appuser --ingroup appgroup
USER appuser # non-root
COPY --chown=appuser:appgroup app.jar /app.jar
# Read-only filesystem configured in docker run / K8s
# Seccomp profile, AppArmor profile — at runtime level
Key principles:
- Don’t use
--privileged - Drop all capabilities, add only needed ones
- Read-only root filesystem where possible
- Image scanning in CI/CD (Trivy, Snyk)
- Runtime protection (Falco)
Production Story
A company migrated a monolith from VMs to containers. Result: deployment time reduced from 45 minutes to 90 seconds, packing density grew 4x, infrastructure costs dropped 40%. But it required: reworking logging (stdout → ELK), setting up health checks, implementing centralized monitoring (Prometheus + Grafana), rewriting state management (external volumes), training the team on Kubernetes. Containerization is not just technology, but also process change.
Monitoring
- Golden Signals: latency, traffic, errors, saturation
- Tools: Prometheus (metrics), cAdvisor (container metrics), Jaeger (tracing), ELK/EFK (logs)
- Key metrics: restart count, memory usage vs limit, CPU throttling, network I/O, container uptime
- Alerting: OOMKilled, CrashLoopBackOff, HighRestartRate
Summary
- Containerization is the modern software delivery standard. Foundation: Namespaces + Cgroups.
- Main advantage: reproducibility and speed. Main risk: shared kernel and state management complexity.
- Containers affect architecture: applications must be ephemeral, externally configurable, self-healing.
- At scale, an orchestrator (Kubernetes) is required, adding its own complexity.
- For multi-tenancy and strict compliance, consider MicroVMs (Firecracker, Kata Containers).
Interview Cheat Sheet
Must know:
- Container = Linux process, restricted via namespaces + cgroups
- Namespaces provide isolation (PID, NET, MNT), cgroups — resource limits
- Containers share host kernel, VMs have their own — hence the speed and size difference
- Image = template (class), container = running instance (object)
- Containers are ephemeral — state is stored in external volumes
- For production: non-root user, read-only FS, image scanning
- Containerization is the foundation of microservices and CI/CD
Frequent follow-up questions:
- “Why are containers faster than VMs?” — No Guest OS boot, shared kernel, starts in seconds
- “What is a namespace?” — Linux kernel mechanism that isolates process resources (PID, network, FS)
- “Can you run a Linux container on Windows?” — Yes, via WSL2 or a VM with Linux kernel
- “What is Overlay2?” — Layered FS that reuses base layers between images
Red flags (DO NOT say):
- “A container is a lightweight VM” (no, fundamentally different architecture)
- “Containers are fully isolated” (shared kernel = container escape risk)
- “Data inside containers persists” (ephemeral, need volumes)
- “Container = a file” (it’s a Linux process with namespaces/cgroups)
Related topics:
- [[What is the difference between container and virtual machine]] — detailed comparison
- [[What is Dockerfile]] — how to create an image
- [[What is Kubernetes and why is it needed]] — container orchestration