Question 24 · Section 13

What are persistent data structures?

Think of Git: when you make a commit, a new version of the code is created, but all old commits remain accessible.

Language versions: English Russian Ukrainian

Junior Level

Persistent data structures are immutable structures that preserve the previous version and return a new one upon “modification.”

Simple analogy

Think of Git: when you make a commit, a new version of the code is created, but all old commits remain accessible.

Example

io.vavr.collection.List<String> v1 = io.vavr.collection.List.of("A", "B");
io.vavr.collection.List<String> v2 = v1.append("C");
// v1 = List("A", "B"), v2 = List("A", "B", "C") — v1 is unchanged!

Both versions exist simultaneously and don’t interfere with each other.


Middle Level

How it works: Structural Sharing

Instead of full copying, persistent structures use structural sharing:

When adding an element to a persistent tree:

  1. A new node is created for the added element
  2. Only the nodes on the path from root to the new element are copied
  3. All other branches remain shared between the old and new version

Result: a new version in $O(\log n)$ instead of $O(n)$ with full copying.

In the standard JDK

The JDK has no persistent collections with structural sharing. List.of() and similar create immutable, but not persistent collections:

  • List.of(), Collections.unmodifiableList() — either copy everything or prohibit changes

Libraries for Java

  • Vavr (formerly Javaslang) — List, HashMap, Set based on HAMT
  • PCollections — lightweight persistent collections library

When to use

  • Undo/Redo — change history is stored “for free”
  • Event Sourcing / CQRS — state snapshots
  • Multi-threading — safe snapshots without locks

Senior Level

HAMT (Hash Array Mapped Trie)

Most persistent Maps use HAMT — a tree where keys are distributed by hash code bits:

  • Lookup: $O(\log_{32} n)$ — effectively $O(1)$
  • Insertion: $O(\log_{32} n)$ — only the path is copied
  • Structural sharing — shared branches between versions

Comparison with regular collections

Operation ArrayList Persistent Vector (Vavr)
get(i) O(1) O(log₃₂ n)
add O(1)* O(log₃₂ n)
update O(1) O(log₃₂ n)
Copy O(n) O(log₃₂ n)

*amortized

Application in distributed systems

  • Event Sourcing — each event is immutable, state is recomputed
  • Snapshot isolation — saving “DB state at 12:00”
  • Reactive programming — streams of immutable states

Summary for Senior

  • Persistence = Immutability + Structural Sharing
  • Solves the $O(n)$ copying problem with immutable changes
  • HAMT — key data structure: $O(\log_{32} n)$ for all operations
  • Critical for Event Sourcing, CQRS, functional programming

Interview Cheat Sheet

Must know:

  • Persistent structures = immutability + structural sharing — on modification only the path is copied
  • Analogy — Git: new commit = new version, old ones are accessible
  • Structural Sharing: on element addition, nodes on the path are copied, other branches are shared — $O(\log n)$
  • HAMT (Hash Array Mapped Trie) — tree with hash-bit distribution: $O(\log_{32} n)$
  • JDK has no persistent collections; List.of() — immutable, but not persistent
  • Libraries: Vavr, PCollections; applications: Undo/Redo, Event Sourcing, Snapshot isolation

Frequent follow-up questions:

  • Does persistent mean persisted to disk? — No, “persistent” = preserves previous versions in memory
  • Is List.of() a persistent collection? — No, immutable but not persistent; no structural sharing
  • How is it better than regular copying? — $O(\log n)$ instead of $O(n)$ — only the path to the changed element is copied
  • Where to apply? — Undo/Redo, Event Sourcing, CQRS, multi-threading with safe snapshots

Red flags (do NOT say):

  • “Persistent = serializable” — persistent = preserves versions, not about persistence
  • “List.of() is a persistent structure” — it’s immutable, but without structural sharing
  • “Vavr is part of JDK” — it’s a third-party library
  • “Persistent structures are slower than ArrayList” — for get O(1) vs O(log₃₂ n), but for copying O(log n) vs O(n)

Related topics:

  • [[14. What is the difference between shallow copy and deep copy]]
  • [[23. How does immutability affect performance]]
  • [[25. Are there disadvantages to immutable objects]]
  • [[12. How to protect a collection from modification]]