What are persistent data structures?
Think of Git: when you make a commit, a new version of the code is created, but all old commits remain accessible.
Junior Level
Persistent data structures are immutable structures that preserve the previous version and return a new one upon “modification.”
Simple analogy
Think of Git: when you make a commit, a new version of the code is created, but all old commits remain accessible.
Example
io.vavr.collection.List<String> v1 = io.vavr.collection.List.of("A", "B");
io.vavr.collection.List<String> v2 = v1.append("C");
// v1 = List("A", "B"), v2 = List("A", "B", "C") — v1 is unchanged!
Both versions exist simultaneously and don’t interfere with each other.
Middle Level
How it works: Structural Sharing
Instead of full copying, persistent structures use structural sharing:
When adding an element to a persistent tree:
- A new node is created for the added element
- Only the nodes on the path from root to the new element are copied
- All other branches remain shared between the old and new version
Result: a new version in $O(\log n)$ instead of $O(n)$ with full copying.
In the standard JDK
The JDK has no persistent collections with structural sharing.
List.of() and similar create immutable, but not persistent collections:
List.of(),Collections.unmodifiableList()— either copy everything or prohibit changes
Libraries for Java
- Vavr (formerly Javaslang) —
List,HashMap,Setbased on HAMT - PCollections — lightweight persistent collections library
When to use
- Undo/Redo — change history is stored “for free”
- Event Sourcing / CQRS — state snapshots
- Multi-threading — safe snapshots without locks
Senior Level
HAMT (Hash Array Mapped Trie)
Most persistent Maps use HAMT — a tree where keys are distributed by hash code bits:
- Lookup: $O(\log_{32} n)$ — effectively $O(1)$
- Insertion: $O(\log_{32} n)$ — only the path is copied
- Structural sharing — shared branches between versions
Comparison with regular collections
| Operation | ArrayList | Persistent Vector (Vavr) |
|---|---|---|
| get(i) | O(1) | O(log₃₂ n) |
| add | O(1)* | O(log₃₂ n) |
| update | O(1) | O(log₃₂ n) |
| Copy | O(n) | O(log₃₂ n) |
*amortized
Application in distributed systems
- Event Sourcing — each event is immutable, state is recomputed
- Snapshot isolation — saving “DB state at 12:00”
- Reactive programming — streams of immutable states
Summary for Senior
- Persistence = Immutability + Structural Sharing
- Solves the $O(n)$ copying problem with immutable changes
- HAMT — key data structure: $O(\log_{32} n)$ for all operations
- Critical for Event Sourcing, CQRS, functional programming
Interview Cheat Sheet
Must know:
- Persistent structures = immutability + structural sharing — on modification only the path is copied
- Analogy — Git: new commit = new version, old ones are accessible
- Structural Sharing: on element addition, nodes on the path are copied, other branches are shared — $O(\log n)$
- HAMT (Hash Array Mapped Trie) — tree with hash-bit distribution: $O(\log_{32} n)$
- JDK has no persistent collections;
List.of()— immutable, but not persistent - Libraries: Vavr, PCollections; applications: Undo/Redo, Event Sourcing, Snapshot isolation
Frequent follow-up questions:
- Does persistent mean persisted to disk? — No, “persistent” = preserves previous versions in memory
- Is List.of() a persistent collection? — No, immutable but not persistent; no structural sharing
- How is it better than regular copying? — $O(\log n)$ instead of $O(n)$ — only the path to the changed element is copied
- Where to apply? — Undo/Redo, Event Sourcing, CQRS, multi-threading with safe snapshots
Red flags (do NOT say):
- “Persistent = serializable” — persistent = preserves versions, not about persistence
- “List.of() is a persistent structure” — it’s immutable, but without structural sharing
- “Vavr is part of JDK” — it’s a third-party library
- “Persistent structures are slower than ArrayList” — for get O(1) vs O(log₃₂ n), but for copying O(log n) vs O(n)
Related topics:
- [[14. What is the difference between shallow copy and deep copy]]
- [[23. How does immutability affect performance]]
- [[25. Are there disadvantages to immutable objects]]
- [[12. How to protect a collection from modification]]