Question 3 · Section 14

What is Dockerfile?

It is a declarative file: you describe what the image should look like, and Docker figures out how to build it. It's like a recipe: you write the steps, and Docker assembles the...

Language versions: English Russian Ukrainian

Junior Level

Simple Explanation

Dockerfile is a text file with instructions for building a Docker image.

It is a declarative file: you describe what the image should look like, and Docker figures out how to build it. It’s like a recipe: you write the steps, and Docker assembles them into a ready-to-run image that can be launched as a container.

Analogy

A Dockerfile is a cooking recipe. FROM is the base (e.g., dough), COPY is adding ingredients, RUN is the cooking process, ENTRYPOINT is how the dish is served. From one recipe you can cook as many identical dishes (containers) as you want.

Example

# 1. Base image
FROM openjdk:17-jdk-slim

# 2. Working directory
WORKDIR /app

# 3. Copy file
COPY myapp.jar app.jar

# 4. Application port
EXPOSE 8080

# 5. Launch command
ENTRYPOINT ["java", "-jar", "app.jar"]
# Build image from Dockerfile
docker build -t myapp .

# Run container
docker run -p 8080:8080 myapp

Main Instructions

Instruction What it does
FROM Specifies the base image (e.g., openjdk:17)
WORKDIR Sets the working directory
COPY Copies files into the image
RUN Executes a command during build
EXPOSE Documents the application port
ENTRYPOINT Application launch command

What to Remember

  • Dockerfile is a recipe for creating a Docker image
  • Each instruction creates a new layer in the image
  • FROM is always the first instruction
  • Use specific image versions, not latest
  • docker build assembles the image, docker run launches the container

When You DON’t Need a Dockerfile

If you use platform-as-a-service (Heroku, Railway, Render), you might not need a Dockerfile — the platform builds the image from code itself.


Middle Level

How Image Building Works

When you run docker build .:

  1. Docker client sends the contents of the current directory (Build Context) to the Docker daemon.
  2. The daemon executes instructions from the Dockerfile step by step.
  3. Each instruction creates a new layer in the image.
  4. Layers are cached — if an instruction hasn’t changed, Docker uses the cache.

Build Context and .dockerignore

Build Context is all the directory contents passed to Docker. To avoid passing unnecessary files (.git, target/, logs), use .dockerignore:

.git
target/
*.log
.idea/

Layers and Caching

Docker uses a layered filesystem (UnionFS/Overlay2):

  • Each command (RUN, COPY, ADD) creates a new layer
  • Layers are cached — if the instruction and files haven’t changed, Docker takes the ready layer from cache

Optimization rule: Place rarely-changing instructions at the beginning, frequently-changing ones at the end.

# BAD: every code change invalidates the dependency cache
COPY src /app/src
COPY pom.xml /app
RUN mvn -f /app/pom.xml clean package

# GOOD: dependencies cached separately
COPY pom.xml /app
RUN mvn -f /app/pom.xml dependency:go-offline
// Docker caches the layer if the instruction and ALL previous layers are unchanged.
// pom.xml changed — invalidates all subsequent layers.
COPY src /app/src
RUN mvn -f /app/pom.xml clean package

Typical Mistakes

Mistake Consequence How to avoid
FROM openjdk:latest Non-deterministic builds FROM openjdk:17-jdk-slim
No .dockerignore Slow builds, large images Create .dockerignore
All commands in one RUN Can’t partially use cache Separate logical steps
Running app as root Security risk USER appuser
One huge layer No caching, slow rebuild Separate dependencies and code

Key Principles of a Good Dockerfile

  1. Single responsibility — one container, one process.
  2. Minimize size — use lightweight base images (alpine, slim, distroless).

Alpine — minimal Linux distribution (~5 MB), often used as a base for images.

  1. Security — don’t run the application as root. Use USER.
  2. Specific versions — always pin exact versions of base images.

Multi-stage Build

Allows compiling code in one temporary image and copying only the artifact into the final one:

# Stage 1: Build
FROM maven:3.8-openjdk-17 AS build
COPY src /app/src
COPY pom.xml /app
RUN mvn -f /app/pom.xml clean package -DskipTests

# Stage 2: Runtime
FROM openjdk:17-jdk-slim
COPY --from=build /app/target/app.jar /app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "/app.jar"]

What to Remember

  • Each instruction = new layer
  • Caching is the key to fast builds
  • Instruction order matters for performance
  • Use .dockerignore to exclude unnecessary files
  • Multi-stage build is the standard for production images

Senior Level

Dockerfile as Infrastructure as Code

A Dockerfile is not just a build script — it is the foundation of Infrastructure as Code (IaC) at the application level. It determines reproducibility, security, and delivery efficiency.

Deep Analysis of the Build Process

Build Context and Its Impact on CI/CD

docker build . → tar archive of entire directory → sent to Docker daemon

Problems: large context = slow transfer (especially with remote daemon). .dockerignore works like .gitignore but for Docker. In CI/CD the context may include artifacts from previous builds.

Best Practice:

# Exclude everything except what's needed
**
!src/
!pom.xml
!Dockerfile

Caching Mechanism: Deep Understanding

Docker computes a hash for each layer based on: the instruction itself, hashes of previous layers, file contents (for COPY and ADD).

Cache invalidation happens when: the instruction changed, copied files changed, parent layer hash changed.

“Dependency Layer” pattern for Java:

FROM maven:3.8-openjdk-17 AS build
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline -B   # rarely changes → cached
COPY src ./src
RUN mvn package -DskipTests        # often changes

This gives 90%+ cache hit rate for builds without dependency changes.

Trade-offs

Decision Plus Minus
Alpine images Minimal size (~5MB base) musl libc ≠ glibc, issues with native libraries
Slim images glibc compatibility, small size Larger than alpine
Distroless Minimal attack surface No shell for debugging, need ephemeral debug containers
Many layers Better caching More metadata, slower pull
Few layers Fast pull Worse caching

Edge Cases

  • Alpine + native libraries: Alpine uses musl libc. JNI libraries (Netty epoll, PostgreSQL native) may require glibc. Solution: use debian-slim or build for musl.
  • Build timeouts: RUN apt-get update may hang due to network issues. Solution: use mirrors, retry logic.
  • Non-deterministic builds: apt-get update without pinning package versions gives different results. Solution: apt-get install -y package=1.2.3-1.
  • Cross-platform builds: Building amd64 image on ARM (Apple Silicon). Solution: docker buildx with QEMU emulation or remote builders.

Dockerfile Security

# Production-ready Dockerfile for Spring Boot
FROM maven:3.9-eclipse-temurin-17 AS build
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline -B
COPY src ./src
RUN mvn package -DskipTests -B

FROM eclipse-temurin:17-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
WORKDIR /app
COPY --from=build /build/target/*.jar app.jar
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget -qO- http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "-XX:+UseG1GC", "-jar", "app.jar"]

Critical rules:

  • Don’t use latest
  • Don’t run as root
  • Read-only filesystem where possible
  • Minimize attack surface
  • Use HEALTHCHECK

Impact on CI/CD Pipeline

developer → git push → CI runs docker build →
  cache hit? → fast (seconds) : slow (minutes) →
  docker push → CD deploys

CI optimization: registry cache (pull previous image), BuildKit cache, layer sharing between images.

BuildKit (Docker 23+): parallel execution of independent steps, secret management (--mount=type=secret), SSH forwarding (--mount=type=ssh), cache mounts (--mount=type=cache).

Performance

Base image Size Pull time Start time
openjdk:17 ~500 MB ~30s ~5s
openjdk:17-slim ~300 MB ~15s ~5s
eclipse-temurin:17-jre-alpine ~100 MB ~5s ~4s
distroless/java17 ~80 MB ~4s ~4s

Monitoring

  • docker history <image> — analyze layers and their sizes
  • docker build --progress=plain — detailed build output
  • dive <image> — interactive layer analyzer
  • BuildKit --progress=trace — tracing each step

Production Story

A team of 50 developers faced CI builds taking 12 minutes. Analysis showed: every time all Maven dependencies were downloaded from scratch. Implementing multi-stage build with separate pom.xml caching reduced time to 2 minutes (83% improvement). Additionally: switching to slim image reduced size from 650MB to 280MB, which sped up deployment 2.3x and saved 40% storage in the registry.

Summary

  • Dockerfile is a recipe for creating an immutable artifact. Understanding layer caching is key to fast CI/CD.
  • Always aim to minimize layers and use Multi-stage builds.
  • Dockerfile must be deterministic (specific versions, not latest).
  • Security: non-root user, minimal base image, health checks.
  • At scale, Dockerfile optimization saves hundreds of CI/CD hours and gigabytes of storage.

Interview Cheat Sheet

Must know:

  • Dockerfile — declarative recipe for creating an immutable Docker image
  • Each instruction (RUN, COPY, ADD) creates a new read-only layer
  • Layer caching is key to fast builds: instruction order is critical
  • Multi-stage build — production standard: build in one image, runtime in another
  • Exec form ["cmd", "arg"] is required for CMD/ENTRYPOINT (signal handling)
  • Security: non-root user, specific tags, minimal base image
  • BuildKit: secrets (--mount=type=secret), SSH forwarding, cache mounts

Frequent follow-up questions:

  • “Why does instruction order matter?” — Changing an instruction invalidates all subsequent cache layers
  • “What is .dockerignore?” — Excludes files from build context (like .gitignore for Docker)
  • “Why can Alpine be a problem?” — musl libc ≠ glibc; JNI libraries may not work
  • “How does COPY differ from ADD?” — ADD can unpack archives and download from URL, but COPY is preferred

Red flags (DO NOT say):

  • “I use latest tag for convenience” (non-deterministic builds)
  • “I pass secrets through ARG” (visible in docker history)
  • “I run the app as root in the container” (security risk)
  • “ADD is better than COPY” (COPY is best practice in 95% of cases)

Related topics:

  • [[What are the main instructions used in Dockerfile]] — detailed instruction breakdown
  • [[What is multi-stage build]] — image size optimization
  • [[What is the difference between CMD and ENTRYPOINT]] — container launch