Question 3 · Section 8

What does filter() operation do?

Takes a Predicate — a functional interface with the method boolean test(T t).

Language versions: English Russian Ukrainian

Junior Level

filter(Predicate) is an intermediate operation that keeps only the elements that satisfy a condition.

Why it is better than if inside a loop: filter can be combined in chains, operation order can be changed, and it can be easily parallelized without changing logic. if inside a loop is hard-wired into the imperative style.

Takes a Predicate<T> — a functional interface with the method boolean test(T t).

List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

// Keep only even numbers
List<Integer> even = numbers.stream()
    .filter(n -> n % 2 == 0)
    .collect(Collectors.toList());
// Result: [2, 4, 6, 8, 10]

// Filtering null elements
list.stream()
    .filter(Objects::nonNull)
    .forEach(System.out::println);

Important: filter alone does not trigger processing — a terminal operation is needed.

Middle Level

Internal implementation

filter creates a Sink.ChainedReference, whose accept method:

Sink.ChainedReference is an internal class that wraps the next Sink in the chain and calls its accept() only if the Predicate returns true.

public void accept(T t) {
    if (predicate.test(t)) {
        downstream.accept(t); // pass it further
    }
    // otherwise — the element is "swallowed"
}

Stateless nature

The filter does depend on other elements — each is processed independently. This makes filter ideal for parallelStream.

Early Filtering

The sooner you cut off unnecessary data, the less work subsequent expensive operations will have:

// GOOD — filter at the beginning
stream.filter(Objects::nonNull)
      .filter(s -> s.startsWith("A"))
      .map(expensiveMapping)

// BAD — expensiveMapping will be called for all elements
stream.map(expensiveMapping)
      .filter(Objects::nonNull)

Chain of filters vs complex predicate

// Option A — more readable
stream.filter(Objects::nonNull)
      .filter(s -> s.startsWith("A"))

// Option B — slightly faster
stream.filter(s -> s != null && s.startsWith("A"))

For most cases, readability takes priority (Option A).

Senior Level

Predicate Complexity and performance

If the predicate performs heavy logic (regex, external cache call), it will be called for every element. Cache results of expensive computations.

Short-circuiting interaction

filter works together with short-circuit operations (findFirst, limit). If the filter finds a suitable element, the pipeline will stop — remaining elements are not processed.

Null-checks as de facto standard

filter(Objects::nonNull) is the standard pattern for data cleanup before processing, preventing NPE in subsequent map calls.

Edge Cases

  • Side Effects in predicate: Never modify external variables inside filter — this violates the functional paradigm and will lead to bugs in parallel streams
  • Optional.filter: The Optional class also has filter(), works similarly — turns a filled Optional into empty when the condition does not match

Diagnostics

Filtering Metrics: If the filter discards 99.9% of data in Java code — move the filtering to the SQL query level.

Logging with peek():

stream.peek(e -> log.debug("Before: " + e))
      .filter(predicate)
      .peek(e -> log.debug("After: " + e))

Interview Cheat Sheet

Must know:

  • filter(Predicate) — intermediate operation, keeps elements that satisfy the condition
  • Stateless — each element is processed independently, ideal for parallelStream
  • Early filtering: the earlier the filter, the less work for subsequent expensive operations
  • Works with short-circuit operations (findFirst, limit) — pipeline stops when a match is found
  • filter(Objects::nonNull) — standard pattern for NPE protection
  • Filter alone does not trigger processing — a terminal operation is needed

Common follow-up questions:

  • filter vs if inside a loop? — filter is declarative, easy to combine and parallelize
  • Chain of filters or one complex predicate? — Chain is more readable, single predicate is slightly faster
  • What if the predicate is expensive? — Cache results, it is called for every element
  • Can you modify external variables in a predicate? — No, this violates the functional paradigm

Red flags (DO NOT say):

  • “filter processes all elements immediately” — no, it is lazy until a terminal operation
  • “Side effects in a predicate are fine” — no, they lead to bugs in parallel streams
  • “filter sorts elements” — no, it only filters, order is preserved
  • “If a filter discards 99% — that is ok in Java code” — better to move it to the SQL level

Related topics:

  • [[2. What is the difference between intermediate and terminal operations]]
  • [[4. What does map() operation do]]
  • [[9. What are parallel streams]]
  • [[5. What does collect() operation do]]