Question 22 · Section 17

What is distributed tracing

Trace ID propagates via HTTP headers (W3C Trace Context: traceparent) or via message headers (Kafka headers). Each service adds its own span ID.

Language versions: English Russian Ukrainian

🟢 Junior Level

Distributed tracing is tracking a single request through all the microservices it passes through.

Request: GET /orders/123

Trace ID: abc-123
  Span 1: API Gateway (10ms)
    Span 2: Order Service (50ms)
      Span 3: User Service (20ms)
      Span 4: Payment Service (30ms)

Why: Understand where a problem (latency, error) occurs in a chain of services.

Trace ID propagates via HTTP headers (W3C Trace Context: traceparent) or via message headers (Kafka headers). Each service adds its own span ID.


🟡 Middle Level

Trace, Span, TraceId

Trace — the entire request path (traceId: abc-123)
Span — a single call in the chain (spanId: span-1, span-2...)
TraceId — identifier for the entire trace
ParentSpanId — link to the previous span

Jaeger / Zipkin

// Spring Cloud Sleuth + Zipkin
// ⚠️ Spring Cloud Sleuth is deprecated in Spring Boot 3.0+.
// Use Micrometer Tracing with a backend (Zipkin, Jaeger).
// Automatically adds traceId to logs and headers

@RestController
public class OrderController {
    private final RestTemplate restTemplate;

    @GetMapping("/orders/{id}")
    public Order getOrder(@PathVariable Long id) {
        // Trace ID is automatically added to the HTTP header `traceparent` (W3C Trace Context).
        return restTemplate.getForObject(
            "http://order-service/orders/" + id, Order.class);
    }
}

Common mistakes

  1. Losing traceId:
    Async call → traceId is not propagated
    Solution: use trace context propagation
    

When distributed tracing is NOT needed

  • 2-3 services — logs with correlation ID are sufficient
  • 100% sampling is acceptable with low traffic — no need for sampling

🔴 Senior Level

OpenTelemetry

// OpenTelemetry — the standard for tracing
Tracer tracer = GlobalOpenTelemetry.getTracer("my-service");

Span span = tracer.spanBuilder("processOrder")
    .setSpanKind(SpanKind.SERVER)
    .startSpan();

try (Scope scope = span.makeCurrent()) {
    span.setAttribute("order.id", orderId);
    processOrder(orderId);
    span.setStatus(StatusCode.OK);
} catch (Exception e) {
    span.setStatus(StatusCode.ERROR);
    span.recordException(e);
    throw e;
} finally {
    span.end();
}

Production Experience

Log correlation:

Logs with traceId:
[traceId=abc-123] [spanId=span-1] OrderService - Processing order 123
[traceId=abc-123] [spanId=span-2] UserService - Fetching user 456
[traceId=abc-123] [spanId=span-3] PaymentService - Charging payment

Best Practices

✅ TraceId in all logs
✅ Sampling for high-traffic
✅ Span for all external calls
✅ Tags for business metrics

❌ Without tracing in production
❌ 100% sampling for high-traffic
❌ Without traceId in logs

🎯 Interview Cheat Sheet

Must know:

  • Distributed tracing — tracking a single request through all microservices
  • Trace = entire request path (traceId), Span = single call in the chain (spanId)
  • TraceId propagates via HTTP headers (W3C Trace Context: traceparent) or message headers
  • Jaeger / Zipkin — backend for storing and visualizing traces
  • OpenTelemetry — the tracing standard, replaced OpenTracing/OpenCensus
  • Sampling for high-traffic (not 100%!) — saves storage
  • Log correlation: traceId in every log — links logs to traces
  • NOT needed for 2-3 services — logs with correlation ID are sufficient

Frequent follow-up questions:

  • Trace vs Span? Trace = entire request path through all services, Span = a single call (one service) within the trace.
  • Why sampling? 100% sampling for high-traffic = terabytes of data. 1-10% is enough for analysis.
  • Spring Cloud Sleuth status? Deprecated in Spring Boot 3.0+ — use Micrometer Tracing.
  • What is trace context propagation? Passing traceId/spanId via HTTP headers or message headers between services.

Red flags (NOT to say):

  • “100% sampling for production” — no, terabytes of data at high traffic
  • “Tracing = logging” — no, tracing = request structure, logs = details
  • “TraceId is not needed in logs” — without it, you cannot link logs to traces
  • “Distributed tracing is always needed” — no, for 2-3 services, correlation ID is enough

Related topics:

  • [[21. How to monitor a distributed microservices system]]
  • [[15. How to organize communication between microservices]]
  • [[1. What is Saga pattern and when to use it]]
  • [[3. How to implement distributed transactions in microservices]]
  • [[7. What is Service Discovery and why is it needed]]