What is Dockerfile?
It is a declarative file: you describe what the image should look like, and Docker figures out how to build it. It's like a recipe: you write the steps, and Docker assembles the...
Junior Level
Simple Explanation
Dockerfile is a text file with instructions for building a Docker image.
It is a declarative file: you describe what the image should look like, and Docker figures out how to build it. It’s like a recipe: you write the steps, and Docker assembles them into a ready-to-run image that can be launched as a container.
Analogy
A Dockerfile is a cooking recipe. FROM is the base (e.g., dough), COPY is adding ingredients, RUN is the cooking process, ENTRYPOINT is how the dish is served. From one recipe you can cook as many identical dishes (containers) as you want.
Example
# 1. Base image
FROM openjdk:17-jdk-slim
# 2. Working directory
WORKDIR /app
# 3. Copy file
COPY myapp.jar app.jar
# 4. Application port
EXPOSE 8080
# 5. Launch command
ENTRYPOINT ["java", "-jar", "app.jar"]
# Build image from Dockerfile
docker build -t myapp .
# Run container
docker run -p 8080:8080 myapp
Main Instructions
| Instruction | What it does |
|---|---|
FROM |
Specifies the base image (e.g., openjdk:17) |
WORKDIR |
Sets the working directory |
COPY |
Copies files into the image |
RUN |
Executes a command during build |
EXPOSE |
Documents the application port |
ENTRYPOINT |
Application launch command |
What to Remember
- Dockerfile is a recipe for creating a Docker image
- Each instruction creates a new layer in the image
FROMis always the first instruction- Use specific image versions, not
latest docker buildassembles the image,docker runlaunches the container
When You DON’t Need a Dockerfile
If you use platform-as-a-service (Heroku, Railway, Render), you might not need a Dockerfile — the platform builds the image from code itself.
Middle Level
How Image Building Works
When you run docker build .:
- Docker client sends the contents of the current directory (Build Context) to the Docker daemon.
- The daemon executes instructions from the Dockerfile step by step.
- Each instruction creates a new layer in the image.
- Layers are cached — if an instruction hasn’t changed, Docker uses the cache.
Build Context and .dockerignore
Build Context is all the directory contents passed to Docker. To avoid passing unnecessary files (.git, target/, logs), use .dockerignore:
.git
target/
*.log
.idea/
Layers and Caching
Docker uses a layered filesystem (UnionFS/Overlay2):
- Each command (
RUN,COPY,ADD) creates a new layer - Layers are cached — if the instruction and files haven’t changed, Docker takes the ready layer from cache
Optimization rule: Place rarely-changing instructions at the beginning, frequently-changing ones at the end.
# BAD: every code change invalidates the dependency cache
COPY src /app/src
COPY pom.xml /app
RUN mvn -f /app/pom.xml clean package
# GOOD: dependencies cached separately
COPY pom.xml /app
RUN mvn -f /app/pom.xml dependency:go-offline
// Docker caches the layer if the instruction and ALL previous layers are unchanged.
// pom.xml changed — invalidates all subsequent layers.
COPY src /app/src
RUN mvn -f /app/pom.xml clean package
Typical Mistakes
| Mistake | Consequence | How to avoid |
|---|---|---|
FROM openjdk:latest |
Non-deterministic builds | FROM openjdk:17-jdk-slim |
No .dockerignore |
Slow builds, large images | Create .dockerignore |
| All commands in one RUN | Can’t partially use cache | Separate logical steps |
| Running app as root | Security risk | USER appuser |
| One huge layer | No caching, slow rebuild | Separate dependencies and code |
Key Principles of a Good Dockerfile
- Single responsibility — one container, one process.
- Minimize size — use lightweight base images (
alpine,slim,distroless).
Alpine — minimal Linux distribution (~5 MB), often used as a base for images.
- Security — don’t run the application as
root. UseUSER. - Specific versions — always pin exact versions of base images.
Multi-stage Build
Allows compiling code in one temporary image and copying only the artifact into the final one:
# Stage 1: Build
FROM maven:3.8-openjdk-17 AS build
COPY src /app/src
COPY pom.xml /app
RUN mvn -f /app/pom.xml clean package -DskipTests
# Stage 2: Runtime
FROM openjdk:17-jdk-slim
COPY --from=build /app/target/app.jar /app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "/app.jar"]
What to Remember
- Each instruction = new layer
- Caching is the key to fast builds
- Instruction order matters for performance
- Use
.dockerignoreto exclude unnecessary files - Multi-stage build is the standard for production images
Senior Level
Dockerfile as Infrastructure as Code
A Dockerfile is not just a build script — it is the foundation of Infrastructure as Code (IaC) at the application level. It determines reproducibility, security, and delivery efficiency.
Deep Analysis of the Build Process
Build Context and Its Impact on CI/CD
docker build . → tar archive of entire directory → sent to Docker daemon
Problems: large context = slow transfer (especially with remote daemon). .dockerignore works like .gitignore but for Docker. In CI/CD the context may include artifacts from previous builds.
Best Practice:
# Exclude everything except what's needed
**
!src/
!pom.xml
!Dockerfile
Caching Mechanism: Deep Understanding
Docker computes a hash for each layer based on: the instruction itself, hashes of previous layers, file contents (for COPY and ADD).
Cache invalidation happens when: the instruction changed, copied files changed, parent layer hash changed.
“Dependency Layer” pattern for Java:
FROM maven:3.8-openjdk-17 AS build
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline -B # rarely changes → cached
COPY src ./src
RUN mvn package -DskipTests # often changes
This gives 90%+ cache hit rate for builds without dependency changes.
Trade-offs
| Decision | Plus | Minus |
|---|---|---|
| Alpine images | Minimal size (~5MB base) | musl libc ≠ glibc, issues with native libraries |
| Slim images | glibc compatibility, small size | Larger than alpine |
| Distroless | Minimal attack surface | No shell for debugging, need ephemeral debug containers |
| Many layers | Better caching | More metadata, slower pull |
| Few layers | Fast pull | Worse caching |
Edge Cases
- Alpine + native libraries: Alpine uses musl libc. JNI libraries (Netty epoll, PostgreSQL native) may require glibc. Solution: use
debian-slimor build for musl. - Build timeouts:
RUN apt-get updatemay hang due to network issues. Solution: use mirrors, retry logic. - Non-deterministic builds:
apt-get updatewithout pinning package versions gives different results. Solution:apt-get install -y package=1.2.3-1. - Cross-platform builds: Building amd64 image on ARM (Apple Silicon). Solution:
docker buildxwith QEMU emulation or remote builders.
Dockerfile Security
# Production-ready Dockerfile for Spring Boot
FROM maven:3.9-eclipse-temurin-17 AS build
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline -B
COPY src ./src
RUN mvn package -DskipTests -B
FROM eclipse-temurin:17-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
WORKDIR /app
COPY --from=build /build/target/*.jar app.jar
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s \
CMD wget -qO- http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "-XX:+UseG1GC", "-jar", "app.jar"]
Critical rules:
- Don’t use
latest - Don’t run as root
- Read-only filesystem where possible
- Minimize attack surface
- Use
HEALTHCHECK
Impact on CI/CD Pipeline
developer → git push → CI runs docker build →
cache hit? → fast (seconds) : slow (minutes) →
docker push → CD deploys
CI optimization: registry cache (pull previous image), BuildKit cache, layer sharing between images.
BuildKit (Docker 23+): parallel execution of independent steps, secret management (--mount=type=secret), SSH forwarding (--mount=type=ssh), cache mounts (--mount=type=cache).
Performance
| Base image | Size | Pull time | Start time |
|---|---|---|---|
openjdk:17 |
~500 MB | ~30s | ~5s |
openjdk:17-slim |
~300 MB | ~15s | ~5s |
eclipse-temurin:17-jre-alpine |
~100 MB | ~5s | ~4s |
distroless/java17 |
~80 MB | ~4s | ~4s |
Monitoring
docker history <image>— analyze layers and their sizesdocker build --progress=plain— detailed build outputdive <image>— interactive layer analyzer- BuildKit
--progress=trace— tracing each step
Production Story
A team of 50 developers faced CI builds taking 12 minutes. Analysis showed: every time all Maven dependencies were downloaded from scratch. Implementing multi-stage build with separate pom.xml caching reduced time to 2 minutes (83% improvement). Additionally: switching to slim image reduced size from 650MB to 280MB, which sped up deployment 2.3x and saved 40% storage in the registry.
Summary
- Dockerfile is a recipe for creating an immutable artifact. Understanding layer caching is key to fast CI/CD.
- Always aim to minimize layers and use Multi-stage builds.
- Dockerfile must be deterministic (specific versions, not
latest). - Security: non-root user, minimal base image, health checks.
- At scale, Dockerfile optimization saves hundreds of CI/CD hours and gigabytes of storage.
Interview Cheat Sheet
Must know:
- Dockerfile — declarative recipe for creating an immutable Docker image
- Each instruction (RUN, COPY, ADD) creates a new read-only layer
- Layer caching is key to fast builds: instruction order is critical
- Multi-stage build — production standard: build in one image, runtime in another
- Exec form
["cmd", "arg"]is required for CMD/ENTRYPOINT (signal handling) - Security: non-root user, specific tags, minimal base image
- BuildKit: secrets (
--mount=type=secret), SSH forwarding, cache mounts
Frequent follow-up questions:
- “Why does instruction order matter?” — Changing an instruction invalidates all subsequent cache layers
- “What is .dockerignore?” — Excludes files from build context (like .gitignore for Docker)
- “Why can Alpine be a problem?” — musl libc ≠ glibc; JNI libraries may not work
- “How does COPY differ from ADD?” — ADD can unpack archives and download from URL, but COPY is preferred
Red flags (DO NOT say):
- “I use
latesttag for convenience” (non-deterministic builds) - “I pass secrets through ARG” (visible in
docker history) - “I run the app as root in the container” (security risk)
- “ADD is better than COPY” (COPY is best practice in 95% of cases)
Related topics:
- [[What are the main instructions used in Dockerfile]] — detailed instruction breakdown
- [[What is multi-stage build]] — image size optimization
- [[What is the difference between CMD and ENTRYPOINT]] — container launch