Question 26 Β· Section 17

What tools are used for microservices orchestration

Kubernetes is the de facto standard thanks to: openness (CNCF), support by all cloud providers (GKE, EKS, AKS), huge community, standard API.

Language versions: English Russian Ukrainian

🟒 Junior Level

Orchestration is managing the lifecycle of microservices: starting, stopping, scaling, updating.

Kubernetes is the de facto standard thanks to: openness (CNCF), support by all cloud providers (GKE, EKS, AKS), huge community, standard API.

Main tools

Tool What it does
Kubernetes (K8s) Container orchestration (de facto standard)
Docker Compose Local orchestration for development
Docker Swarm Simple container orchestration
Apache Mesos Cluster orchestration
HashiCorp Nomad Simple Kubernetes alternative

Docker Compose (for development)

version: '3.8'
services:
  api-gateway:
    build: ./gateway
    ports:
      - "8080:8080"
    depends_on:
      - user-service
      - order-service

  user-service:
    build: ./user-service
    environment:
      - DB_HOST=postgres
      - KAFKA_BROKERS=kafka:9092

  order-service:
    build: ./order-service
    environment:
      - DB_HOST=postgres
      - KAFKA_BROKERS=kafka:9092

  postgres:
    image: postgres:15
    # Image versions (cp-kafka:7.5.0, postgres:15) are examples at time of writing.
    # In production, use the latest stable versions.
    environment:
      POSTGRES_PASSWORD: secret

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    ports:
      - "9092:9092"

Kubernetes β€” basic Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
        - name: user-service
          image: registry.example.com/user-service:1.2.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5

When NOT to use Kubernetes

  • Small team (1-3 devs) β€” operational complexity is unjustified
  • Simple applications β€” Docker Compose is sufficient
  • Strict budget constraints β€” K8s requires at least 3 nodes

🟑 Middle Level

Kubernetes β€” main resources

# Service β€” network access to pods
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
    - port: 80
      targetPort: 8080
  type: ClusterIP  # internal access

---
# Ingress β€” external access via API Gateway
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-gateway
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /users
            pathType: Prefix
            backend:
              service:
                name: user-service
                port:
                  number: 80
          - path: /orders
            pathType: Prefix
            backend:
              service:
                name: order-service
                port:
                  number: 80

Helm β€” package manager for K8s

# Chart.yaml
apiVersion: v2
name: microservices-stack
version: 1.0.0
dependencies:
  - name: user-service
    version: 1.0.0
  - name: order-service
    version: 1.0.0
  - name: kafka
    version: 22.0.0
    repository: https://charts.bitnami.com/bitnami

Rolling Update β€” update without downtime

# Update version
kubectl set image deployment/user-service \
  user-service=registry.example.com/user-service:1.3.0

# Rollback on problems
kubectl rollout undo deployment/user-service

# Check status
kubectl rollout status deployment/user-service

Service Mesh β€” Istio

# VirtualService β€” traffic management
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: user-service
spec:
  hosts:
    - user-service
  http:
    - route:
        - destination:
            host: user-service
            subset: v1
          weight: 90
        - destination:
            host: user-service
            subset: v2
          weight: 10  # Canary deployment β€” 10% traffic to v2

---
# DestinationRule β€” subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: user-service
spec:
  host: user-service
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

Tool comparison

Tool Complexity Scale Use Case
Docker Compose Low 1 host Local development
Docker Swarm Medium Multiple hosts Small clusters
Kubernetes High Any Production (de facto standard)
Nomad Medium Any Simple K8s alternative
OpenShift High Enterprise Kubernetes + additional tooling

πŸ”΄ Senior Level

Orchestration architecture

                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚           API Gateway / Ingress          β”‚
                   β”‚        (Nginx, Traefik, Istio)           β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                 β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
    β”‚ User Service  β”‚  β”‚Order Service  β”‚  β”‚Notify Serviceβ”‚
    β”‚  (3 replicas) β”‚  β”‚ (5 replicas)  β”‚  β”‚ (2 replicas) β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
              β”‚                 β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
    β”‚   PostgreSQL  β”‚  β”‚   PostgreSQL  β”‚  β”‚    Kafka     β”‚
    β”‚  (Primary +   β”‚  β”‚  (Primary +   β”‚  β”‚  Cluster     β”‚
    β”‚   Replica)    β”‚  β”‚   Replica)    β”‚  β”‚              β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

HPA β€” Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: user-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      // stabilizationWindowSeconds: cooldown period after scaling.
      // scaleDown: 300s β€” slow scale-down (avoids flapping).
      // scaleUp: 60s β€” fast scale-up (reacts to spikes).
      policies:
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # Don't scale down aggressively
      policies:
        - type: Pods
          value: 1
          periodSeconds: 120

PodDisruptionBudget β€” protection from cascading failures

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: user-service-pdb
spec:
  minAvailable: 2  # Minimum 2 pods always running
  selector:
    matchLabels:
      app: user-service

GitOps β€” ArgoCD / Flux

# ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: user-service
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/company/k8s-manifests
    targetRevision: HEAD
    path: k8s/user-service
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Kustomize β€” environment-specific configuration management

# kustomization.yaml (base)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml
  - ingress.yaml

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
patches:
  - target:
      kind: Deployment
      name: user-service
    patch: |-
      - op: replace
        path: /spec/replicas
        value: 5

Service Mesh Patterns

# Fault Injection β€” resilience testing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: user-service-fault
spec:
  hosts:
    - user-service
  http:
    - fault:
        delay:
          percentage:
            value: 10
          fixedDelay: 5s
        abort:
          percentage:
            value: 5
          httpStatus: 503
      route:
        - destination:
            host: user-service

# Rate Limiting
apiVersion: networking.istio.io/v1beta1
kind: EnvoyFilter
metadata:
  name: rate-limit
spec:
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit

Production Checklist

βœ… Helm Charts for packaging
βœ… GitOps (ArgoCD/Flux) for deployment
βœ… HPA for autoscaling
βœ… PDB for protection from cascading failures
βœ… Pod Anti-Affinity for node distribution
βœ… Resource Limits to prevent noisy neighbor
βœ… Liveness/Readiness/Startup Probes
βœ… Service Mesh (Istio/Linkerd) for observability
βœ… Canary/Blue-Green deployments
βœ… Network Policies for isolation
βœ… Secrets Management (Vault, Sealed Secrets)
βœ… Pod Security Standards

🎯 Interview Cheat Sheet

Must know:

  • Kubernetes β€” de facto standard for production orchestration (CNCF, all cloud providers)
  • Docker Compose β€” local development, not production
  • Main K8s resources: Deployment, Service, Ingress, HPA, PDB
  • HPA β€” automatic scaling by CPU/memory/custom metrics
  • Helm β€” package manager for K8s, dependency management
  • Rolling Update β€” update without downtime, rollback with one command
  • Service Mesh (Istio) β€” canary deployment, fault injection, rate limiting
  • GitOps (ArgoCD/Flux) β€” automatic deployment from git repository
  • Do NOT use K8s for small teams (1-3 devs), simple applications

Frequent follow-up questions:

  • HPA stabilizationWindowSeconds? Cooldown period after scaling. scaleUp: 60s (fast), scaleDown: 300s (slow, avoids flapping).
  • Why PodDisruptionBudget? Guarantees minimum running pods during maintenance β€” minAvailable: 2.
  • Istio fault injection? Intentionally adds delay/abort for resilience testing β€” chaos engineering.
  • GitOps advantages? Audit trail (git history), rollback = git revert, self-healing (ArgoCD syncs).

Red flags (NOT to say):

  • β€œDocker Compose for production” β€” no, only for development
  • β€œK8s is needed for every project” β€” no, operational complexity is unjustified for small teams
  • β€œHPA at 95% CPU β€” efficient” β€” no, won’t have time to scale
  • β€œIstio = Kubernetes replacement” β€” no, Istio runs on top of K8s (service mesh)

Related topics:

  • [[12. How to implement horizontal scaling of microservices]]
  • [[7. What is Service Discovery and why is it needed]]
  • [[9. What is API Gateway and what problems does it solve]]
  • [[21. How to monitor a distributed microservices system]]
  • [[17. How to ensure fault tolerance of microservices]]