Question 30 · Section 4

What operations does the Collection interface support?

If the project is on Java 8 — use coll.toArray(new String[0]). The JVM optimizes this just as efficiently.

Language versions: English Russian Ukrainian

🟢 Junior Level

Collection — the base interface for List, Set, Queue.

⚠️ DO NOT confuse: Collection (no ‘s’) — an interface (List, Set, Queue). Collections (with ‘s’) — a utility class with static methods (sort, synchronizedList, unmodifiableList). These are different things!

Main methods:

Collection<String> coll = new ArrayList<>();

// Adding
coll.add("A");
coll.addAll(List.of("B", "C"));

// Checking
coll.contains("A");       // true
coll.containsAll(...);    // true
coll.isEmpty();           // false
coll.size();              // 3

// Removing
coll.remove("A");
coll.removeAll(...);
coll.clear();

// Conversion
Object[] arr = coll.toArray();
String[] strArr = coll.toArray(String[]::new);  // Java 11+

🟡 Middle Level

Method groups

Modification:

add(e)           // → true if changed
addAll(c)        // → true if changed
remove(e)        // → true if removed
removeAll(c)     // Remove all from c
retainAll(c)     // Keep only from c (intersection)
clear()          // Clear

Java 8+ functional:

// removeIf — removal by condition
list.removeIf(e -> e.startsWith("A"));  // O(n) for ArrayList!

// stream / parallelStream
coll.stream()
    .filter(e -> e.length() > 3)
    .map(String::toUpperCase)
    .collect(Collectors.toList());

contains complexity

// ArrayList: O(n)
list.contains("A");  // Iteration!

// HashSet: O(1)
set.contains("A");   // Hash!

// containsAll for two lists: O(n × m)!
list1.containsAll(list2);  // For each element of list2, iterate list1!

toArray

// Java 11+: most modern approach
String[] arr = list.toArray(String[]::new);

// Old approach (also works)
String[] arr = list.toArray(new String[0]);
// → JVM optimizes, no need to create an array of the right size!

If the project is on Java 8 — use coll.toArray(new String[0]). The JVM optimizes this just as efficiently.


🔴 Senior Level

When NOT to use parallelStream

  1. Collection < 10K elements — fork overhead > benefit
  2. Operations are fast (arithmetic, getters)
  3. Order matters — parallelStream does not guarantee order
  4. Side effects exist — race conditions

removeIf optimization

// ArrayList.removeIf:
// 1. Finds the first element to remove
// 2. Before it — does NOT touch
// 3. After — ONE System.arraycopy
// → O(n), minimal writes!

// VS iterator.remove() in a loop:
// → System.arraycopy EVERY time
// → O(n²)!

Bulk operations complexity

// list1.removeAll(list2)
// If list2 = ArrayList: O(n × m)
// If list2 = HashSet: O(n + m)

// → Always convert to Set for bulk operations!
Set<String> set = new HashSet<>(list2);
list1.removeAll(set);  // O(n + m)

For medium and large collections — conversion to Set is beneficial. For small ones (< 10 elements), the overhead of creating a HashSet may outweigh the benefit.

Spliterator characteristics

Spliterator<String> spliterator = coll.spliterator();

// Optimization flags:
// IMMUTABLE  → collection does not change
// CONCURRENT → can be modified in parallel
// DISTINCT   → all elements are unique
// SORTED     → sorted
// SIZED      → exact size is known

// Stream API uses these flags!
set.stream()
   .distinct()  // → Does nothing! Already DISTINCT
   .count();    // → O(1) only when the Stream knows the exact size and there are no operations that change the count. distinct() still requires a traversal.

GC Pressure

// toArray() creates a COPY of all data!
Object[] copy = collection.toArray();  // O(n) memory

// For large collections → GC pressure
// → Use stream instead of copying
collection.stream().forEach(this::process);

toArray creates a full copy. Avoid if: (1) Collection > 100K elements (GC pressure). (2) You are only reading (use stream/forEach). Acceptable if: (1) Need an array for a legacy API. (2) Collection is small.

Production Experience

Real scenario: containsAll O(n²)

// 10,000 elements in both lists
list1.containsAll(list2);  // 10,000 × 10,000 = 100 million operations!

// Solution:
Set<String> set = new HashSet<>(list2);
list1.containsAll(set);  // 10,000 × O(1) = 10,000 operations
// → 10,000x speedup!

Best Practices

  1. removeIf → O(n) for ArrayList
  2. toArray(T[]::new) → Java 11+ style
  3. containsAll with Set → O(n+m) instead of O(n×m)
  4. Spliterator flags → Stream optimizations
  5. Avoid toArray for large collections (GC)
  6. retainAll = set intersection
  7. addAll = union

Summary for Senior

  • removeIf = O(n), optimized internally
  • Bulk ops → convert to Set
  • toArray → copy, GC pressure
  • Spliterator → flags for Stream optimizations
  • contains → O(n) for List, O(1) for Set
  • Java 11+ → toArray(T[]::new)

🎯 Interview Cheat Sheet

Must know:

  • Collection (interface) != Collections (utility class) — do not confuse in an interview
  • Basic groups: modification (add, remove, retainAll), checking (contains, isEmpty), conversion (stream, toArray)
  • removeIf() — O(n) for ArrayList (single System.arraycopy), better than iterator.remove() O(n^2)
  • containsAll/removeAll for two ArrayLists = O(n × m); converting one to Set = O(n + m)
  • toArray(T[]::new) (Java 11+) — modern style; JVM optimizes new T[0] just as efficiently
  • toArray() creates a full copy — GC pressure for large collections (>100K elements)
  • Spliterator flags (DISTINCT, SORTED, SIZED) — internal Stream API optimizations

Frequent follow-up questions:

  • Why does containsAll of two 10K-element lists = 100 million operations? — For each element of list2, list1 is iterated: 10K × 10K. Solution: new HashSet(list2).
  • When is parallelStream NOT worth using? — Collection < 10K elements, fast operations, order matters, side effects exist.
  • Why convert to Set before removeAll? — HashSet.contains = O(1); instead of O(n × m) we get O(n + m).
  • What do Spliterator flags do? — Hints for the Stream API to optimize: DISTINCT — distinct() is a noop, SIZED — count() = O(1).

Red flags (DO NOT say):

  • “Collection and Collections are the same thing” — no, Collection = interface, Collections = utility class
  • contains for ArrayList = O(1)” — no, O(n), iterates all elements
  • toArray() does not create a copy” — no, it’s a full copy of data into a new array
  • parallelStream is always faster than regular stream” — no, fork overhead for small collections outweighs the benefit

Related topics:

  • [[28. How to properly remove elements during iteration]]
  • [[23. What is Collections.unmodifiableList()]]
  • [[26. What are fail-fast and fail-safe iterators]]