Question 6 · Section 14

What is multi-stage build?

A Dockerfile can have multiple FROM instructions — each starts a new stage. Only files from the final stage end up in the image. In the first stage you build the application, in...

Language versions: English Russian Ukrainian

Junior Level

Simple Explanation

Multi-stage build is a way to create a small Docker image using multiple stages in a single Dockerfile.

A Dockerfile can have multiple FROM instructions — each starts a new stage. Only files from the final stage end up in the image. In the first stage you build the application, in the second — you run it. Only the ready file ends up in the final image, and all build tools are left “overboard.”

Analogy

Imagine you’re baking a cake. You need: a mixer, a pan, an oven, ingredients. But when you give the cake to the client — you don’t give them the mixer and the pan. The client gets only the finished cake. Multi-stage build is the same: you need Maven and JDK to build, but only JRE and the jar file to run.

The Problem Without Multi-stage

FROM maven:3.8-openjdk-17
COPY src ./src
COPY pom.xml .
RUN mvn package -DskipTests
# Image weighs ~800 MB (Maven + JDK + source code + dependencies)
CMD ["java", "-jar", "target/app.jar"]

How Multi-stage Works

# Stage 1: Build
FROM maven:3.8-openjdk-17-slim AS builder
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests

# Stage 2: Runtime
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=builder /build/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Result: image weighs ~300 MB instead of ~800 MB!

Step-by-Step Explanation

  1. FROM ... AS builder — first stage named “builder”
  2. COPY --from=builder — copies a file from the first stage to the second
  3. The second stage is the final image that goes to production

What to Remember

  • Multi-stage build = multiple FROM in one Dockerfile
  • Each FROM is a separate stage
  • COPY --from=name copies files from another stage
  • The final image contains only the last stage
  • Because only the last FROM ends up in the final image. All previous stages (compilers, dependencies, source code) are left “overboard.”
  • This reduces size and increases security

Middle Level

Why Did Multi-stage Build Appear?

Before this technology there were two paths:

  1. Everything in one image — the final image contained Maven, JDK, source code. Bloated size, large attack surface.
  2. Builder Pattern — complex scripts for transferring artifacts between images. Inconvenient, requires external scripts.

Multi-stage build solved both problems, allowing everything to be described in one Dockerfile.

Detailed Breakdown

# STAGE 1: Build (named 'builder')
FROM maven:3.8-openjdk-17-slim AS builder
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline     # Cache dependencies
COPY src ./src
RUN mvn package -DskipTests

# STAGE 2: Final image (Runtime)
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=builder /build/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Typical Mistakes

Mistake Consequence How to avoid
Wrong path in COPY --from File not found during build Use absolute paths
Copying code before dependencies Cache doesn’t work, slow build pom.xml first, then src
Forgetting to use AS for stages Can’t reference by name Always name your stages
Using latest on any stage Non-determinism Pin versions of all base images
Copying entire /build Extra files in final image Copy only the artifact

Advantages

  1. Minimal size — final image contains only JRE and .jar.
  2. Security — no build tools, smaller attack surface.
  3. CI/CD convenience — entire process in one file.
  4. Caching — Docker caches each stage independently.

Advanced Techniques

Stopping at a specific stage:

docker build --target builder -t my-app-test .

Using external images:

COPY --from=nginx:latest /etc/nginx/nginx.conf /my/path

Multiple intermediate stages:

FROM node:18 AS frontend-build
# ... frontend build

FROM maven:17 AS backend-build
# ... backend build

FROM openjdk:17-slim
COPY --from=backend-build /app.jar .
COPY --from=frontend-build /dist ./static/

Cache Optimization

FROM maven:3.8-openjdk-17-slim AS builder
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline -B  # Dependency cache
COPY src ./src
RUN mvn package -DskipTests -B

When code changes, the dependency layer is taken from cache.

Approach Comparison

Approach Size Security Complexity
Single-stage ~800 MB Low Low
Multi-stage (slim) ~300 MB Medium Medium
Multi-stage (alpine) ~120 MB High Medium
Multi-stage (distroless) ~80 MB Very high High

What to Remember

  • Multi-stage build is an industry standard
  • Separate “build tools” and “runtime environment”
  • Use slim or alpine on the final stage
  • You can copy files from external images
  • --target allows building a specific stage

When NOT to Use Multi-stage Build

For Python/Node.js without a compilation step, multi-stage is often unnecessary — just COPY and RUN. Multi-stage is useful when there is a build step (Java, Go, C++) that isn’t needed at runtime.


Senior Level

Multi-stage Build as an Architectural Pattern

Multi-stage build implements the principle of least privilege at the image level: the runtime image contains only what is necessary for execution, nothing extra.

Security Analysis

Without multi-stage:

Image: maven:3.8-openjdk-17 (~800 MB)
Contains: Maven, Java compiler, source code, all dependencies
Risk: attacker could recompile code, use compiler for exploits

With multi-stage:

Image: openjdk:17-jre-slim (~300 MB)
Contains: JRE + jar file
Risk: minimal — no tools for code modification

Trade-offs

Approach Size Security Complexity Debug
Single-stage ~800 MB Low Low Easy
Multi-stage (slim) ~300 MB Medium Medium Normal
Multi-stage (jlink) ~100 MB High High Hard
Multi-stage (distroless) ~80 MB Very high High Very hard
Multi-stage (native) ~50 MB Maximum Very high Requires debugger

jlink — JDK tool for creating a minimal Java version with only needed modules.

FROM ubuntu AS jre-build
RUN apt-get update && apt-get install -y openjdk-17-jdk-headless
RUN jlink \
    --add-modules java.base,java.sql,java.xml \
    --strip-debug \
    --no-man-pages \
    --no-header-files \
    --compress=2 \
    --output /jre

FROM eclipse-temurin:17-jdk-slim AS builder
# ... build ...

FROM debian:buster-slim
COPY --from=jre-build /jre /jre
COPY --from=builder /build/target/app.jar /app.jar
ENV JAVA_HOME=/jre
ENV PATH="$JAVA_HOME/bin:$PATH"
ENTRYPOINT ["java", "-jar", "/app.jar"]

Custom JRE weighs 40-60 MB instead of 300+ MB full JDK.

Distroless Images

FROM maven:3.9-eclipse-temurin-17 AS build
# ... build ...

FROM gcr.io/distroless/java17-debian12
COPY --from=build /build/target/app.jar /app.jar
ENTRYPOINT ["java", "-jar", "/app.jar"]

Distroless images contain only the runtime and have no shell/package manager. An attacker can’t run sh inside the container.

Edge Cases

  • File not found on second stage: use absolute paths. COPY --from=builder /build/target/*.jar app.jar — if there’s one jar file, glob works. If multiple — all are copied into /app.jar as a directory.
  • Cache doesn’t work: code copied before dependencies. Solution: first pom.xml, then RUN dependency:go-offline, then src.
  • Build temporary files: if RUN creates temporary files, they remain in the builder-stage layer. This is fine — they won’t end up in the final image.
  • Secrets at build stage: if the build stage needs access to a private Maven repo, use BuildKit --mount=type=ssh or --mount=type=secret. Don’t pass tokens via ARG.
  • Cross-compilation: building ARM image on amd64 host. Use docker buildx with QEMU or remote builders.

Native Image (GraalVM) + Multi-stage

FROM ghcr.io/graalvm/native-image:ol8-java17 AS builder
WORKDIR /build
COPY pom.xml .
COPY src ./src
RUN native-image -jar target/app.jar -o app

FROM debian:buster-slim
COPY --from=builder /build/app /app
ENTRYPOINT ["/app"]

Final image: ~50-100 MB, instant startup (< 100ms), minimal memory. Trade-off: long Native Image build time (minutes), not all Spring features are supported (need Spring Native / AOT).

// Native Image — compilation of Java into a native binary. // Plus: startup < 100ms. Minus: build takes minutes, // not all Spring features are supported (reflection, proxies).

Performance

Strategy Image size Startup time RAM footprint
Full JDK ~500 MB ~5-8s ~400-600 MB
JRE slim ~300 MB ~5-8s ~300-500 MB
jlink custom JRE ~100 MB ~4-6s ~200-400 MB
Distroless ~80 MB ~4-6s ~200-400 MB
GraalVM Native ~50 MB < 0.1s ~50-150 MB

Troubleshooting

Problem: file not found.

COPY --from=builder /build/target/app.jar app.jar
# ERROR: file not found

Solution: check absolute paths. Use RUN ls -la /build/target/ on the builder stage for debugging. Or docker build --target builder to inspect the intermediate image.

Problem: cache doesn’t work.

# BAD
COPY src ./src
COPY pom.xml .
RUN mvn package

Solution: swap the order. First pom.xml, then RUN dependency:go-offline, then src.

Production Story

A microservices team (40+ services) used single-stage images of 700-900 MB. Deploying one service took 3-5 minutes (image pull). Total registry consumed 35 GB. Implementing multi-stage with distroless images reduced average size to 90 MB, deploy time to 30 seconds, registry to 4 GB. Savings: 88% storage, 90% deploy time. Additional bonus: distroless images passed security audit without issues — no shell, no package manager, minimal attack surface.

Monitoring

  • docker images — track image sizes
  • dive <image> — analyze content of each layer
  • Registry size metrics — monitor image storage
  • Build cache hit rate — caching efficiency in CI/CD
  • Image scan results (Trivy, Snyk) — number of vulnerabilities per image

Summary

  • Multi-stage build is the standard for creating production images.
  • The best way to balance development convenience with image compactness.
  • Always separate “build tools” and “runtime environment.”
  • For maximum optimization: jlink, distroless, GraalVM Native Image.
  • Caching dependencies separately from code — key to fast CI/CD builds.
  • Security: the smaller the image, the smaller the attack surface.
  • Use --target to test intermediate stages.

Interview Cheat Sheet

Must know:

  • Multi-stage build = multiple FROM in one Dockerfile; only the last one ends up in the final image
  • Separate “build tools” (Maven, JDK) and “runtime environment” (JRE, jar)
  • Size reduction: single-stage ~800MB → multi-stage slim ~300MB → distroless ~80MB
  • Security: no build tools in production image, smaller attack surface
  • Caching dependencies separately from code — key to fast CI/CD builds
  • For maximum optimization: jlink (custom JRE), distroless, GraalVM Native Image
  • --target allows building and testing an intermediate stage

Frequent follow-up questions:

  • “Why does the image shrink?” — Compilers, source code, build dependencies don’t end up in the final image
  • “What is distroless?” — Image without shell/package manager; attacker can’t run sh inside the container
  • “When is multi-stage NOT needed?” — Python/Node.js without compilation step; COPY + RUN is enough
  • “What does jlink give?” — Custom JRE with only needed modules (40-60MB instead of 300MB)

Red flags (DO NOT say):

  • “I use one image for build and runtime” (bloated size, security risk)
  • “Multi-stage slows down builds” (caching makes it faster)
  • “I copy the entire builder directory” (only the artifact is needed)
  • “Distroless images can’t be debugged” (ephemeral debug containers solve this)

Related topics:

  • [[What is Dockerfile]] — Dockerfile basics
  • [[What are the main instructions used in Dockerfile]] — COPY –from instruction
  • [[What is containerization and why is it needed]] — image security