Question 18 · Section 7

How to properly log exceptions?

Use a logger instead of printStackTrace():

Language versions: English Russian Ukrainian

Junior Level

Basic rule

Use a logger instead of printStackTrace():

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class OrderService {
    // Logger - static final, one per class
    private static final Logger log = LoggerFactory.getLogger(OrderService.class);

    public void processOrder(Long orderId) {
        try {
            // business logic
        } catch (Exception e) {
            // Exception - last argument, has context
            log.error("Failed to process order id={}", orderId, e);
        }
    }
}

Log levels

  • log.error() - critical errors requiring attention
  • log.warn() - warnings, system is working but there’s risk
  • log.info() - important events (start, completion)
  • log.debug() - detailed debugging

With parameters

log.error("User id={} not found", userId, exception);
// exception will NOT go into {}, it goes to stack trace

How SLF4J works: SLF4J replaces {} with argument values left to right. But there’s a special case: if the last argument is a Throwable, SLF4J does not substitute it into {}, but extracts the full stack trace and adds it to the message.

// {} is replaced with userId, exception - last argument -> stack will be in log
log.error("User id={} not found", userId, exception);

// {} is replaced with e.getMessage(), exception NOT last -> stack will NOT get to log!
log.error("Error: {}", exception, userId);

// No {}, but exception last -> stack still appears
log.error("Critical failure", exception);

Common mistake: log.error("Error: {}", e) - here e is substituted into {} as a string (e.toString()), but stack trace is NOT output. Always pass the exception as the last argument, separate from {}.

What not to do

// BAD - no context, impossible to understand where the error is
log.error(e.getMessage());

// BAD - e as parameter {}, not as exception (stack won't get to log)
log.error("Error: {}", e);

// BAD - logging and printStackTrace duplicate output
log.error("Error", e);
e.printStackTrace();

// GOOD - exception as last argument, has context
log.error("Failed to process user id={}", userId, e);

When NOT to log exceptions

  1. Expected business scenarios - UserNotFoundException on login: this is not a system error, but a normal scenario. Use log.warn() or log.info() instead of log.error(). If every failed login writes ERROR - your alerts will be devalued (alert fatigue)
  2. Retry mechanisms - don’t log every retry attempt at ERROR level, use log.debug(). Otherwise 5 retry attempts = 5 ERROR logs, although the system is still fighting
  3. Batch processing - with 1000 elements and 500 errors, use log sampling (no more than 10 identical stacks per minute)
  4. Health check endpoint - health check errors log at DEBUG, don’t clutter ERROR logs
  5. Circuit breaker open - when CB is open, log once on opening, not every rejected request
  6. Duplicate logging - if exception will be logged at the top level (GlobalExceptionHandler), don’t log it again on each layer. One exception - one ERROR log

Caveat: PII (Personally Identifiable Information) in logs

Never log sensitive data with exceptions. This includes:

  • Passwords, tokens, CVV codes
  • Email, phone, passport data (GDPR!)
  • IP addresses (in some jurisdictions)
// BAD - password in log
log.error("Auth failed for user={} with password={}", username, password);

// GOOD - only context without sensitive data
log.error("Auth failed for user={}", username);

Even if the logger supports masking, it’s better not to pass such data to the logger at all. Configure the log framework for redaction (scrubbing) at the appender level, not in each call separately.


Middle Level

SLF4J magic of the last argument

SLF4J looks for the last argument of the method. If it’s a Throwable, it’s handled specially - stack trace is extracted, even if there are no {} in the string:

log.error("Critical failure", exception); // Stack will be in log

MDC (Mapped Diagnostic Context)

In distributed systems, stack trace is useless without request context:

MDC.put("traceId", traceId);
MDC.put("userId", userId);
try {
    orderService.process(orderId);
    // Log will have traceId and userId
} catch (Exception e) {
    log.error("Failed to process order id={}", orderId, e);
} finally {
    MDC.clear(); // Required - otherwise traceId leaks to next request
}

JSON Layout

In ELK/Splunk, logs should be JSON:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "ERROR",
  "message": "Failed to process order",
  "exception_class": "java.sql.SQLException",
  "stack_trace": "...",
  "traceId": "abc-123"
}

Advantages:

  • Indexing by exception_class
  • Search by stack_trace
  • No problems with multi-line logs

Cause Analysis

Make sure the logger outputs all levels of cause. The root of evil may be at the 5th nesting level.


Senior Level

Log Sampling

Under enormous load and mass errors, you can “drown” in logs (Log Flooding). Advanced loggers configure limits:

<!-- Logback - no more than 10 identical stacks per minute -->
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <filter class="ch.qos.logback.classic.filter.DuplicateMessageFilter">
        <AllowedRepetitions>10</AllowedRepetitions>
    </filter>
</appender>

Async Appenders

Always use async logging - separates disk/network writing from business thread:

<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
    <appender-ref ref="FILE" />
    <includeCallerData>false</includeCallerData> <!-- for speed -->
</appender>

Risk: loss of last logs on JVM crash.

Log Scrubbing (PII)

Never log sensitive data (passwords, CVV, tokens). Modern loggers support masking:

// Logger automatically masks
log.error("Auth failed for user with password={}", password);
// In log: password=****

ShortenedThrowableConverter

Don’t print 200 lines of Hibernate stack. Usually first 20 frames are enough:

<conversionRule conversionWord="ex"
    converterClass="com.example.ShortenedThrowableConverter" />

Generic Catch-all

Logging catch (Exception e) at the very top level is mandatory, but should be accompanied by alerting (Prometheus/Grafana):

meterRegistry.counter("errors.unhandled").increment();
alertService.send("Unhandled exception: " + e.getMessage());

Diagnostics

  • Logback Configuration - configure <shortenedThrowableConverter>
  • Cause Analysis - logger should output all cause levels
  • Structured Logging - JSON format for ELK/Splunk
  • Metric-driven - each exception through Micrometer to Prometheus

Interview Cheat Sheet

Must know:

  • Exception - always the last argument in log.error("msg {}", param, e); SLF4J extracts stack trace from the last argument
  • DON’T use log.error("Error: {}", e) - here e is substituted into {} as a string, stack is NOT output
  • Use MDC (traceId, userId) for context in distributed systems; always MDC.clear() in finally
  • For highload: async appender, log sampling (no more than 10 identical stacks/min), shortenedThrowableConverter
  • Log expected business scenarios at warn/info, not error - otherwise alerts lose value (alert fatigue)
  • Never log PII: passwords, tokens, CVV, email (GDPR!), IP addresses

Frequent follow-up questions:

  • How does SLF4J handle exception? - If last argument is Throwable, SLF4J doesn’t substitute it into {}, but extracts full stack trace
  • Why not duplicate logging? - If GlobalExceptionHandler already logs, don’t log on each layer - one exception = one ERROR log
  • What is Log Sampling? - Limiting identical messages (e.g., 10/min) to not “drown” in logs on mass errors
  • When NOT to log exception? - Expected business scenarios (UserNotFoundException on login), retry attempts, health check, circuit breaker open

Red flags (NOT to say):

  • “I log log.error(e.getMessage()) without context” - Impossible to understand where error is and what caused it
  • “I log password with exception” - Security violation (GDPR, PCI DSS)
  • “I log and printStackTrace simultaneously” - Duplication, clutters logs
  • “Every retry at ERROR level” - 5 attempts = 5 ERROR logs, alerts become useless

Related topics:

  • [[16. What does the printStackTrace() method do]]
  • [[15. What is a stack trace]]
  • [[13. Can you create custom exceptions]]
  • [[19. Why you should not swallow exceptions (catch empty)]]
  • [[18. What is exception wrapping (wrapping)]]