Question 11 · Section 4

What is the difference between HashSet, LinkedHashSet and TreeSet?

4. Immutable elements — mandatory! 5. initialCapacity for HashSet — avoid resize 6. Java 21 → SequencedSet API 7. Primitives → fastutil/Trove (less memory)

Language versions: English Russian Ukrainian

🟢 Junior Level

HashSet, LinkedHashSet and TreeSet — three implementations of Set (a collection of unique elements).

Main difference:

  HashSet LinkedHashSet TreeSet
Order None Insertion order Sorting
Speed Fast O(1) hash table Medium O(1) + linked list overhead Slower O(log n) red-black tree
null ✅ Allowed ✅ Allowed ❌ Not allowed

Example:

// HashSet: no order
Set<String> hs = new HashSet<>();
hs.add("C"); hs.add("A"); hs.add("B");
// Order depends on hashCode() — may differ across JVMs!
// → [A, C, B] (order not guaranteed)

// LinkedHashSet: insertion order
Set<String> lhs = new LinkedHashSet<>();
lhs.add("C"); lhs.add("A"); lhs.add("B");
// → [C, A, B] (as inserted)

// TreeSet: sorted
Set<String> ts = new TreeSet<>();
ts.add("C"); ts.add("A"); ts.add("B");
// → [A, B, C] (sorted!)

🟡 Middle Level

HashSet

// Internally: HashMap
// Complexity: O(1) for add/remove/contains
Set<String> set = new HashSet<>();

// ⚠️ Order depends on hashCode()
// Order may change on resize!

// Iteration: O(capacity + size)
// → Empty HashSet(1_000_000) → slow iteration!

LinkedHashSet

// Internally: LinkedHashMap
// Complexity: O(1) for add/remove/contains
// + doubly linked list for order

Set<String> set = new LinkedHashSet<>();
// Preserves INSERTION order

// Iteration: O(size) — faster than HashSet!
// → Does not depend on capacity
// HashSet iterates over ALL buckets including empty ones (O(capacity + size)). LinkedHashSet only iterates over linked list elements (O(size)). The difference is noticeable with low fill factor.

// Memory: +2 references per element (next, before)

TreeSet

// Internally: TreeMap (red-black tree)
// Complexity: O(log n) for all operations
Set<String> set = new TreeSet<>();

// Navigation:
set.add("A"); set.add("C"); set.add("E");
set.floor("D");   // → "C" (nearest smaller)
set.ceiling("D"); // → "E" (nearest larger)
set.subSet("A", "D"); // → [A, C] (range)

When to use which

Scenario What to choose
Uniqueness, speed HashSet
Uniqueness + insertion order LinkedHashSet
Sorting + ranges TreeSet
Access-order cache LinkedHashMap (accessOrder=true), NOT LinkedHashSet

Trap: mutable elements

// ❌ Dangerous!
Set<Person> set = new HashSet<>();
Person p = new Person("Ivan");
set.add(p);
p.setName("Petr");  // hashCode() changed!
set.contains(p);    // → FALSE! Object is "lost"

// ✅ Use immutable objects
record Person(String name) {}  // Immutable!

When NOT to use

  • HashSet: if iteration order is needed — use LinkedHashSet
  • LinkedHashSet: if memory is limited (+50% linked list overhead) — use HashSet
  • TreeSet: if sorting is not needed (50x slower than HashMap) — use HashSet

🔴 Senior Level

Memory Footprint

HashSet (64-bit JVM, Compressed OOPs):
  Node: 32 bytes overhead per element

LinkedHashSet:
  Node: 48 bytes (additional next/before)

TreeSet:
  Entry: 40+ bytes (left, right, parent, color)

→ For 1 million Longs:
  HashSet: ~36 MB
  LinkedHashSet: ~52 MB
  TreeSet: ~44 MB

Java 21: SequencedSet

// LinkedHashSet and TreeSet → SequencedSet
LinkedHashSet<String> set = new LinkedHashSet<>();
set.addFirst("A");    // New method!
set.addLast("Z");     // New method!
set.getFirst();       // New method!
set.reversed();       // Reverse order (view)

// HashSet is NOT SequencedSet (no order)

HashSet Iterator O(capacity + size)

// ❌ Slow: capacity 1 million, size 10
HashSet<String> set = new HashSet<>(1_000_000);
for (int i = 0; i < 10; i++) set.add("x");

for (String s : set) { ... }
// → Iterates over 1 million buckets!

// ✅ LinkedHashSet: O(size) = 10 iterations

LRU Cache with LinkedHashSet

// LinkedHashMap with accessOrder = true
Map<K, V> cache = new LinkedHashMap<>(16, 0.75f, true) {
    protected boolean removeEldestEntry(Map.Entry eldest) {
        return size() > MAX_SIZE;  // Evict the oldest
    }
};

// get() moves element to the end → LRU eviction

compareTo vs equals

// TreeSet uses compareTo, NOT equals!
class Person implements Comparable<Person> {
    String name;
    public int compareTo(Person o) {
        return name.compareTo(o.name);
    }
    public boolean equals(Object o) {
        return false;  // Different logic!
    }
}

// If compareTo = 0 → element will NOT be added
// Even if equals = false!

Production Experience

Real scenario: HashSet iteration killed latency

  • HashSet(10 million capacity, 100 elements)
  • Iteration: 50 ms (traversing 10 million buckets!)
  • Solution: LinkedHashSet
  • Result: 0.01 ms (100 elements)

Best Practices

  1. HashSet — default choice for uniqueness
  2. LinkedHashSet — when order matters
  3. TreeSet — for sorting and ranges
  4. Immutable elements — mandatory!
  5. initialCapacity for HashSet — avoid resize
  6. Java 21 → SequencedSet API
  7. Primitives → fastutil/Trove (less memory)

Summary for Senior

  • HashSet = O(1), no order, iteration O(capacity+size)
  • LinkedHashSet = O(1), insertion order, iteration O(size)
  • TreeSet = O(log n), sorting, navigation, ranges
  • Memory: HashSet < TreeSet < LinkedHashSet
  • SequencedSet (Java 21) → getFirst/reversed
  • Immutable elements — critical!
  • compareTo ≠ equals → data loss in TreeSet

🎯 Interview Cheat Sheet

Must know:

  • HashSet = O(1), no order, internally HashMap, iteration O(capacity + size)
  • LinkedHashSet = O(1), insertion order, internally LinkedHashMap, iteration O(size)
  • TreeSet = O(log n), sorting, internally TreeMap (red-black tree), navigation (floor/ceiling/subSet)
  • LinkedHashSet stores order via doubly linked list (+2 references per element)
  • HashSet iteration traverses ALL buckets including empty ones — LinkedHashSet only traverses elements
  • TreeSet uses compareTo for uniqueness, not equals
  • Mutable elements are dangerous: changing hashCode() after addition = element lost
  • Java 21: SequencedSet API (getFirst, addLast, reversed) for LinkedHashSet and TreeSet

Frequent follow-up questions:

  • Why is HashSet iteration slower with large capacity? — Because it traverses all buckets (O(capacity + size)), including empty ones. LinkedHashSet only traverses the linked list (O(size)).
  • When will TreeSet reject an element? — When compareTo returns 0, even if equals returns false.
  • What happens if you mutate an element in HashSet after adding it? — hashCode() changes, the element ends up in the “wrong” bucket — contains returns false.
  • Which Set to choose for an LRU cache? — LinkedHashMap with accessOrder=true and overridden removeEldestEntry.

Red flags (DO NOT say):

  • ❌ “TreeSet uses equals to check uniqueness” — it uses compareTo!
  • ❌ “HashSet preserves insertion order” — no, order is not guaranteed
  • ❌ “You can safely mutate mutable elements in a Set” — this breaks the hashCode/equals contract
  • ❌ “LinkedHashSet is slower than HashSet for all operations” — only for memory, complexity is the same O(1)

Related topics:

  • [[12. How does HashSet work internally]]
  • [[13. What is TreeSet and how does it work]]
  • [[15. What is the difference between HashMap, LinkedHashMap and TreeMap]]
  • [[14. What is Map and what implementations exist]]