Question 12 · Section 17

How to implement horizontal scaling of microservices

Vertical scaling (more CPU/RAM) hits the limit of a single server and requires downtime. Horizontal scaling is theoretically infinite and without downtime.

Language versions: English Russian Ukrainian

Junior Level

Horizontal scaling means adding more instances of a service to handle load.

Vertical scaling (more CPU/RAM) hits the limit of a single server and requires downtime. Horizontal scaling is theoretically infinite and without downtime.

One instance:
Client -> Service

Horizontal scaling:
Client -> Load Balancer -> Service #1
                       -> Service #2
                       -> Service #3

Methods:

  1. Kubernetes — automatic (HPA — Horizontal Pod Autoscaler, K8s automatically adds Pods when load increases)
  2. Docker Composedocker-compose up --scale service=3
  3. Cloud — auto-scaling groups

Middle Level

When NOT to use horizontal scaling

  • Stateful services (WebSocket connections, in-memory caches)
  • Licensed software with per-instance pricing
  • Services with expensive initialization (minutes to start)

Kubernetes HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: user-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
// 70% — headroom for load spikes. At 90% a new instance won't have time to start up
// before the spike. At 50% you'll overpay for extra instances.

Statelessness

For horizontal scaling, services must be stateless:
(Stateless — the service doesn't keep state in memory; any instance is interchangeable.)
✅ No local state
✅ Session in Redis
✅ Data in DB
✅ Configuration from outside

Common mistakes

  1. Stateful services:
    Session in memory -> when scaling, requests go to a different instance -> no session
    Solution: external session storage
    

Senior Level

Custom metrics

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: 1000

Production Experience

Blue-Green Deployment:

(Blue-Green Deployment — deployment strategy without downtime: two environments, traffic switch.)
v1 (Blue) -> production traffic
v2 (Green) -> deployed, being tested
Switch traffic to v2 -> roll back if problems

Best Practices

✅ Stateless services
✅ Health checks
✅ Graceful shutdown
✅ Resource limits
✅ Monitoring + alerting

❌ Stateful without external storage
❌ Without resource limits
❌ Without graceful shutdown

Interview Cheat Sheet

Must know:

  • Horizontal scaling = more instances behind a load balancer
  • Vertical scaling = more CPU/RAM, hits the limit of a single server
  • Services MUST be stateless for horizontal scaling
  • Kubernetes HPA — automatic scaling based on CPU/metrics (70% CPU target)
  • Session in Redis, data in DB, configuration from outside
  • Blue-Green deployment — deployment without downtime
  • NOT suitable for stateful services (WebSocket, in-memory cache)

Common follow-up questions:

  • Why 70% CPU target? Headroom for load spikes — at 90% a new instance won’t have time to scale up.
  • How to make a service stateless? Session in Redis, data in DB, configuration from outside, no local state.
  • What is graceful shutdown? Completing current requests before stopping, deregister from Registry.
  • Custom metrics for HPA? http_requests_per_second, queue length, business metrics.

Red flags (DO NOT say):

  • “Stateful services are easy to scale” — no, external state management is needed
  • “Vertical scaling is always simpler” — yes, but hits a limit
  • “HPA at 90% CPU — efficient” — no, won’t scale fast enough during a spike
  • “Session in memory is fine” — no, requests will go to a different instance

Related topics:

  • [[10. What is sharding]]
  • [[11. What is the difference between sharding and partitioning]]
  • [[26. What tools are used for microservice orchestration]]
  • [[7. What is Service Discovery and why is it needed]]
  • [[13. What is Database per Service pattern]]