What is Dockerfile?

Junior Level

Simple Explanation

Dockerfile is a text file with instructions for building a Docker image.

It is a declarative file: you describe what the image should look like, and Docker figures out how to build it. It’s like a recipe: you write the steps, and Docker assembles them into a ready-to-run image that can be launched as a container.

Analogy

A Dockerfile is a cooking recipe. FROM is the base (e.g., dough), COPY is adding ingredients, RUN is the cooking process, ENTRYPOINT is how the dish is served. From one recipe you can cook as many identical dishes (containers) as you want.

Example

# 1. Base image
FROM openjdk:17-jdk-slim

# 2. Working directory
WORKDIR /app

# 3. Copy file
COPY myapp.jar app.jar

# 4. Application port
EXPOSE 8080

# 5. Launch command
ENTRYPOINT ["java", "-jar", "app.jar"]

# Build image from Dockerfile
docker build -t myapp .

# Run container
docker run -p 8080:8080 myapp

Main Instructions

Instruction	What it does
`FROM`	Specifies the base image (e.g., `openjdk:17`)
`WORKDIR`	Sets the working directory
`COPY`	Copies files into the image
`RUN`	Executes a command during build
`EXPOSE`	Documents the application port
`ENTRYPOINT`	Application launch command

What to Remember

Dockerfile is a recipe for creating a Docker image
Each instruction creates a new layer in the image
FROM is always the first instruction
Use specific image versions, not latest
docker build assembles the image, docker run launches the container

When You DON’t Need a Dockerfile

If you use platform-as-a-service (Heroku, Railway, Render), you might not need a Dockerfile — the platform builds the image from code itself.

Middle Level

How Image Building Works

When you run docker build .:

Docker client sends the contents of the current directory (Build Context) to the Docker daemon.
The daemon executes instructions from the Dockerfile step by step.
Each instruction creates a new layer in the image.
Layers are cached — if an instruction hasn’t changed, Docker uses the cache.

Build Context and .dockerignore

Build Context is all the directory contents passed to Docker. To avoid passing unnecessary files (.git, target/, logs), use .dockerignore:

.git
target/
*.log
.idea/

Layers and Caching

Docker uses a layered filesystem (UnionFS/Overlay2):

Each command (RUN, COPY, ADD) creates a new layer
Layers are cached — if the instruction and files haven’t changed, Docker takes the ready layer from cache

Optimization rule: Place rarely-changing instructions at the beginning, frequently-changing ones at the end.

# BAD: every code change invalidates the dependency cache
COPY src /app/src
COPY pom.xml /app
RUN mvn -f /app/pom.xml clean package

# GOOD: dependencies cached separately
COPY pom.xml /app
RUN mvn -f /app/pom.xml dependency:go-offline
// Docker caches the layer if the instruction and ALL previous layers are unchanged.
// pom.xml changed — invalidates all subsequent layers.
COPY src /app/src
RUN mvn -f /app/pom.xml clean package

Typical Mistakes

Mistake	Consequence	How to avoid
`FROM openjdk:latest`	Non-deterministic builds	`FROM openjdk:17-jdk-slim`
No `.dockerignore`	Slow builds, large images	Create `.dockerignore`
All commands in one RUN	Can’t partially use cache	Separate logical steps
Running app as root	Security risk	`USER appuser`
One huge layer	No caching, slow rebuild	Separate dependencies and code

Key Principles of a Good Dockerfile

Single responsibility — one container, one process.
Minimize size — use lightweight base images (alpine, slim, distroless).

Alpine — minimal Linux distribution (~5 MB), often used as a base for images.

Security — don’t run the application as root. Use USER.
Specific versions — always pin exact versions of base images.

Multi-stage Build

Allows compiling code in one temporary image and copying only the artifact into the final one:

# Stage 1: Build
FROM maven:3.8-openjdk-17 AS build
COPY src /app/src
COPY pom.xml /app
RUN mvn -f /app/pom.xml clean package -DskipTests

# Stage 2: Runtime
FROM openjdk:17-jdk-slim
COPY --from=build /app/target/app.jar /app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "/app.jar"]

What to Remember

Each instruction = new layer
Caching is the key to fast builds
Instruction order matters for performance
Use .dockerignore to exclude unnecessary files
Multi-stage build is the standard for production images

Senior Level

Dockerfile as Infrastructure as Code

A Dockerfile is not just a build script — it is the foundation of Infrastructure as Code (IaC) at the application level. It determines reproducibility, security, and delivery efficiency.

Deep Analysis of the Build Process

Build Context and Its Impact on CI/CD

docker build . → tar archive of entire directory → sent to Docker daemon

Problems: large context = slow transfer (especially with remote daemon). .dockerignore works like .gitignore but for Docker. In CI/CD the context may include artifacts from previous builds.

Best Practice:

# Exclude everything except what's needed
**
!src/
!pom.xml
!Dockerfile

Caching Mechanism: Deep Understanding

Docker computes a hash for each layer based on: the instruction itself, hashes of previous layers, file contents (for COPY and ADD).

Cache invalidation happens when: the instruction changed, copied files changed, parent layer hash changed.

“Dependency Layer” pattern for Java:

FROM maven:3.8-openjdk-17 AS build
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline -B   # rarely changes → cached
COPY src ./src
RUN mvn package -DskipTests        # often changes

This gives 90%+ cache hit rate for builds without dependency changes.

Trade-offs

Decision	Plus	Minus
Alpine images	Minimal size (~5MB base)	musl libc ≠ glibc, issues with native libraries
Slim images	glibc compatibility, small size	Larger than alpine
Distroless	Minimal attack surface	No shell for debugging, need ephemeral debug containers
Many layers	Better caching	More metadata, slower pull
Few layers	Fast pull	Worse caching

Edge Cases

Alpine + native libraries: Alpine uses musl libc. JNI libraries (Netty epoll, PostgreSQL native) may require glibc. Solution: use debian-slim or build for musl.
Build timeouts: RUN apt-get update may hang due to network issues. Solution: use mirrors, retry logic.
Non-deterministic builds: apt-get update without pinning package versions gives different results. Solution: apt-get install -y package=1.2.3-1.
Cross-platform builds: Building amd64 image on ARM (Apple Silicon). Solution: docker buildx with QEMU emulation or remote builders.

Dockerfile Security

# Production-ready Dockerfile for Spring Boot
FROM maven:3.9-eclipse-temurin-17 AS build
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline -B
COPY src ./src
RUN mvn package -DskipTests -B

FROM eclipse-temurin:17-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
WORKDIR /app
COPY --from=build /build/target/*.jar app.jar
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget -qO- http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "-XX:+UseG1GC", "-jar", "app.jar"]

Critical rules:

Don’t use latest
Don’t run as root
Read-only filesystem where possible
Minimize attack surface
Use HEALTHCHECK

Impact on CI/CD Pipeline

developer → git push → CI runs docker build →
  cache hit? → fast (seconds) : slow (minutes) →
  docker push → CD deploys

CI optimization: registry cache (pull previous image), BuildKit cache, layer sharing between images.

BuildKit (Docker 23+): parallel execution of independent steps, secret management (--mount=type=secret), SSH forwarding (--mount=type=ssh), cache mounts (--mount=type=cache).

Performance

Base image	Size	Pull time	Start time
`openjdk:17`	~500 MB	~30s	~5s
`openjdk:17-slim`	~300 MB	~15s	~5s
`eclipse-temurin:17-jre-alpine`	~100 MB	~5s	~4s
`distroless/java17`	~80 MB	~4s	~4s

Monitoring

docker history <image> — analyze layers and their sizes
docker build --progress=plain — detailed build output
dive <image> — interactive layer analyzer
BuildKit --progress=trace — tracing each step

Production Story

A team of 50 developers faced CI builds taking 12 minutes. Analysis showed: every time all Maven dependencies were downloaded from scratch. Implementing multi-stage build with separate pom.xml caching reduced time to 2 minutes (83% improvement). Additionally: switching to slim image reduced size from 650MB to 280MB, which sped up deployment 2.3x and saved 40% storage in the registry.

Summary

Dockerfile is a recipe for creating an immutable artifact. Understanding layer caching is key to fast CI/CD.
Always aim to minimize layers and use Multi-stage builds.
Dockerfile must be deterministic (specific versions, not latest).
Security: non-root user, minimal base image, health checks.
At scale, Dockerfile optimization saves hundreds of CI/CD hours and gigabytes of storage.

Interview Cheat Sheet

Must know:

Dockerfile — declarative recipe for creating an immutable Docker image
Each instruction (RUN, COPY, ADD) creates a new read-only layer
Layer caching is key to fast builds: instruction order is critical
Multi-stage build — production standard: build in one image, runtime in another
Exec form ["cmd", "arg"] is required for CMD/ENTRYPOINT (signal handling)
Security: non-root user, specific tags, minimal base image
BuildKit: secrets (--mount=type=secret), SSH forwarding, cache mounts

Frequent follow-up questions:

“Why does instruction order matter?” — Changing an instruction invalidates all subsequent cache layers
“What is .dockerignore?” — Excludes files from build context (like .gitignore for Docker)
“Why can Alpine be a problem?” — musl libc ≠ glibc; JNI libraries may not work
“How does COPY differ from ADD?” — ADD can unpack archives and download from URL, but COPY is preferred

Red flags (DO NOT say):

“I use latest tag for convenience” (non-deterministic builds)
“I pass secrets through ARG” (visible in docker history)
“I run the app as root in the container” (security risk)
“ADD is better than COPY” (COPY is best practice in 95% of cases)

Related topics:

[[What are the main instructions used in Dockerfile]] — detailed instruction breakdown
[[What is multi-stage build]] — image size optimization
[[What is the difference between CMD and ENTRYPOINT]] — container launch