Question 4 · Section 14

What are the main instructions used in Dockerfile?

A Dockerfile consists of instructions, each of which creates a new layer in the image. Here are the main ones:

Language versions: English Russian Ukrainian

Junior Level

Main Instructions

A Dockerfile consists of instructions, each of which creates a new layer in the image. Here are the main ones:

Environment Definition Instructions

Instruction What it does Example
FROM Sets the base image FROM openjdk:17-slim
WORKDIR Sets the working directory WORKDIR /app
ENV Sets environment variables ENV APP_PORT=8080

File Operation Instructions

Instruction What it does
COPY Copies files from host into the image
ADD Like COPY, but can unpack archives

Command Execution Instructions

Instruction What it does
RUN Executes a command during image build
CMD Default command when container starts
ENTRYPOINT Main container launch command

Configuration Instructions

Instruction What it does
EXPOSE Documents the application port
USER Specifies the user for running

Simple Example

FROM openjdk:17-jdk-slim       # Base image
WORKDIR /app                    # Working directory
COPY myapp.jar app.jar          # Copy jar file
EXPOSE 8080                     # Port (documentation)
ENTRYPOINT ["java", "-jar", "app.jar"]  # Launch command

COPY vs ADD

  • COPY — preferred option (95% of cases).
  • ADD — if you need auto-extraction of tar archives or downloading from URL (but for URLs, RUN curl is better).

Exec form ["cmd", "arg"] — command runs directly. Shell form cmd arg — through shell /bin/sh -c. ENV — variable is available both during build and in the running container. ARG — only during build.

What to Remember

  • FROM is always the first instruction
  • RUN — executes during build, CMD/ENTRYPOINT — at launch
  • COPY — preferred option (95% of cases). ADD — if you need auto-extraction of tar archives or URL download.
  • EXPOSE doesn’t open the port, only documents it
  • Use WORKDIR instead of RUN cd ...

Middle Level

Instruction Classification

1. Environment Definition Instructions

FROM — sets the base image. Any Dockerfile starts with this. Best Practice: specify a concrete version (openjdk:17-slim), not latest.

ENV — sets environment variables. Available both during build and in the running container.

ENV JAVA_OPTS="-Xmx512m"
ENV APP_ENV=production

ARG — defines variables available only during the build process.

ARG APP_VERSION=1.0.0
RUN echo "Building version $APP_VERSION"

WORKDIR — sets the working directory. All subsequent commands execute relative to it.

WORKDIR /app    # better than RUN cd /app

2. File Operation Instructions

COPY — copies files from host into the image.

COPY pom.xml /app/
COPY src/ /app/src/

ADD — extended version of COPY. Can unpack archives (.tar.gz) and download files from URLs.

ADD app.tar.gz /app/  # automatically unpacks

3. Command Execution Instructions

RUN — executes a command during build and records the result in a new layer.

# Combine commands to reduce layers
RUN apt-get update && \
    apt-get install -y git curl && \
    rm -rf /var/lib/apt/lists/*
// Deletion in the same RUN is critical: if you delete in a separate RUN,
// files remain in the lower layer and will be in the image.

CMD — sets the default command on container start. Easy to override.

CMD ["--server.port=8080"]

ENTRYPOINT — defines the main launch command. Harder to override.

ENTRYPOINT ["java", "-jar", "/app.jar"]

4. Access Configuration Instructions

EXPOSE — documents the port. Does not actually publish it (you need -p at launch).

USER — specifies the user for running.

RUN useradd -r appuser
USER appuser

VOLUME — creates a mount point for persistent data.

VOLUME ["/data"]

Typical Mistakes

Mistake Consequence How to avoid
RUN apt-get update in a separate layer Cache goes stale, packages not found Combine update && install in one RUN
Using shell form CMD java -jar Signals don’t reach the application Use exec form ["java", "-jar"]
Passing secrets via ARG Passwords visible in docker history Use BuildKit secrets (--mount=type=secret)
Deleting files in a separate RUN Files remain in lower layers Delete in the same RUN where you created them
Multiple CMD/ENTRYPOINT Only the last one counts One CMD, one ENTRYPOINT per file

CMD vs ENTRYPOINT Comparison

Instruction Can override? Main purpose
ENTRYPOINT With difficulty (--entrypoint) Fixed command
CMD Very easily Default parameters

Best Practices

  1. Combine RUN commands via && to reduce layers
  2. Clean package cache in the same RUN layer
  3. Use WORKDIR instead of RUN cd chains
  4. Always use Multi-stage build to separate build and runtime

What to Remember

  • Each RUN, COPY, ADD creates a new layer
  • Instruction order affects caching
  • Use Exec form ["cmd", "arg"] instead of Shell form
  • Clean cache in the same layer as installation
  • ENTRYPOINT + CMD together — best practice

Senior Level

Instruction Architecture and Image Impact

Understanding instruction nuances is critical for creating secure, compact, and fast-to-build images.

Deep Analysis: Layered Model

Each RUN, COPY, ADD instruction creates a new layer. Layers are read-only filesystems combined through UnionFS (Overlay2).

Critical consequence: deleting a file in a new layer doesn’t remove it from the image — only a “whiteout” entry is created. File size in image = sum of all layers where it appears.

# BAD: file remains in lower layer
RUN apt-get update && apt-get install -y package
RUN rm -rf /var/lib/apt/lists/*

# GOOD: one layer, cache cleaned immediately
RUN apt-get update && \
    apt-get install -y package && \
    rm -rf /var/lib/apt/lists/*

Trade-offs

Decision Plus Minus
Shell form Convenience (pipes, variables) PID 1 problem, signals don’t reach
Exec form Proper signal handling No shell functionality
ARG for config Simple Visible in docker history, not runtime
ENV for config Available at runtime Visible in docker inspect
ADD for URL No RUN curl/wget needed Unpredictable cache, no retry
RUN curl/wget Control, retry, checksum Additional layer

ARG vs ENV: Subtleties

Characteristic ARG ENV
Available during build Yes Yes
Available in container No Yes
Visible in docker inspect No Yes
Visible in docker history Yes (value!) Yes

Security warning: ARG values are visible in docker history. Don’t pass secrets through ARG!

# BAD: password visible in docker history
ARG DB_PASSWORD=secret123

# GOOD: BuildKit secrets
RUN --mount=type=secret,id=db_pass cat /run/secrets/db_pass

Shell form vs Exec form: Critical Nuance

Exec form (recommended):

ENTRYPOINT ["java", "-jar", "/app.jar"]

Runs directly as a process with PID 1. Correctly handles signals (SIGTERM, SIGKILL). Critical for graceful shutdown in Kubernetes.

Shell form:

ENTRYPOINT java -jar /app.jar

Runs as a subprocess of /bin/sh -c. OS signals arrive at the sh shell, not the application. The application may be “killed” hard without completing transactions.

PID 1 problem: in Linux, process with PID 1 has special behavior — it doesn’t receive SIGTERM by default. Solution: exec form, tini, or docker run --init.

Edge Cases

  • ONBUILD in multi-stage: ONBUILD instructions execute when the image is used as a base. In multi-stage this can lead to unexpected side effects.
  • Glob patterns in COPY: COPY target/*.jar — if no files exist, build fails. If multiple files, all are copied to the specified directory.
  • Symbolic links: COPY follows symlinks on the host. This may include unexpected files.
  • Timestamps: COPY preserves file mtime. This affects build determinism. BuildKit --metadata-file helps track.
  • ENV and escaping: ENV FOO=bar\ baz — space in value. ENV FOO="bar baz" — quotes are included in the value.

HEALTHCHECK: Production Obligation

HEALTHCHECK --interval=30s --timeout=3s --retries=3 --start-period=60s \
  CMD curl -f http://localhost:8080/actuator/health || exit 1

Without HEALTHCHECK the orchestrator doesn’t know if the application is alive. The container may be Running, but the application inside — dead (deadlock, out of memory).

BuildKit: Advanced Features

# BuildKit syntax
# syntax=docker/dockerfile:1

# Secrets
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc npm install

# SSH forwarding (for private git repos)
RUN --mount=type=ssh git clone git@github.com:org/private-repo.git

# Cache mount (for package managers)
RUN --mount=type=cache,target=/root/.m2 mvn package

# TMPFS mount
RUN --mount=type=tmpfs,target=/tmp make

ONBUILD: Instruction for Parent Images

# In base image
ONBUILD COPY . /app
ONBUILD RUN mvn package

Used for creating parent images that automate steps for descendants. Dangerous in multi-stage: ONBUILD doesn’t execute for subsequent stages.

LABEL: Metadata for CI/CD

LABEL maintainer="team@example.com"
LABEL version="1.0.0"
LABEL org.opencontainers.image.source="https://github.com/org/repo"
LABEL org.opencontainers.image.revision="abc123"

Useful for tracking images in registry, automation, compliance.

Performance

Optimization Impact
Separate pom.xml and src Cache hit rate > 90%
Combine RUN commands 20-40% fewer layers
Alpine instead of full 60-80% smaller size
Multi-stage build 50-70% smaller final image
BuildKit cache mount 50% faster repeated builds

Summary

  • Each RUN, COPY, ADD command creates a new layer — delete temporary files in the same layer.
  • Use Exec form for CMD and ENTRYPOINT — critical for signal handling.
  • ARG is visible in docker history — don’t pass secrets.
  • HEALTHCHECK is a required element of production images.
  • BuildKit (--mount=type=secret/cache/ssh) — modern build standard.
  • Layer optimization = registry space savings + faster deployment.

Interview Cheat Sheet

Must know:

  • RUN executes during build, CMD/ENTRYPOINT — at container launch
  • Each RUN/COPY/ADD creates a new layer; deleting in a separate RUN doesn’t remove the file from the image
  • Exec form ["cmd", "arg"] is critical for signal handling (graceful shutdown)
  • ARG is visible in docker history — don’t pass secrets; use BuildKit secrets
  • ENV is available at runtime, ARG — only during build
  • HEALTHCHECK is required for production images
  • BuildKit: --mount=type=secret/cache/ssh — modern build standard

Frequent follow-up questions:

  • “Why is shell form bad?” — Runs through /bin/sh -c, OS signals don’t reach the application (PID 1 problem)
  • “Why combine RUN commands with &&?” — Each instruction = layer; combining reduces layer count
  • “What does ONBUILD do?” — Instructions execute when the image is used as a base (for parent images)
  • “Does EXPOSE open a port?” — No, only documents; actual mapping via -p at launch

Red flags (DO NOT say):

  • “EXPOSE makes the port accessible from outside” (only documents, need -p)
  • “I pass passwords through ARG” (visible in docker history)
  • “I delete files in a separate RUN layer” (files remain in lower layer)
  • “I use shell form for ENTRYPOINT” (PID 1 problem, signals are lost)

Related topics:

  • [[What is Dockerfile]] — Dockerfile basics
  • [[What is the difference between CMD and ENTRYPOINT]] — details about launching
  • [[What is multi-stage build]] — image optimization