Question 12 · Section 16

What is Dirty Checking in Hibernate

Dirty checking is a mechanism for automatically tracking entity changes in the persistence context and saving them to the database. This is one of Hibernate's key features that...

Language versions: English Russian Ukrainian

Overview

Dirty checking is a mechanism for automatically tracking entity changes in the persistence context and saving them to the database. This is one of Hibernate’s key features that eliminates the need for manual UPDATE queries.


Junior Level

What is Dirty Checking

Dirty checking - Hibernate automatically tracks changes to entities in the persistence context and automatically generates an UPDATE query on save.

@Transactional
public void updateUser(Long id, String name) {
    // 1. Load from DB - entity becomes Persistent
    User user = entityManager.find(User.class, id);

    // 2. Change field - Hibernate "notices" the change
    user.setName(name);

    // 3. NO em.merge() or em.save() needed!
    // On commit - Hibernate automatically does UPDATE
}

How It Works - Simply

1. Load: Hibernate loads entity and saves its "snapshot"
2. Change: you modify a field in the object
3. Flush: Hibernate compares snapshot with current state
4. If different -> generates UPDATE SQL
5. If same -> does nothing

Example

@Transactional
public void updateEmail(Long userId, String newEmail) {
    User user = entityManager.find(User.class, userId);  // snapshot
    user.setEmail(newEmail);                              // change
    // On method exit -> commit -> flush -> UPDATE
}

// Equivalent SQL:
// UPDATE users SET email = ? WHERE id = ?

Middle Level

Detailed Dirty Checking Mechanism

Step 1: Load entity
- entityManager.find(User.class, 1L)
- Hibernate creates EntityEntry with snapshot
- snapshot = copy of all fields at load time

Step 2: Change
- user.setName("New Name")
- Hibernate does NOTHING immediately

Step 3: Flush (on commit or explicit)
- Hibernate traverses all entities in persistence context
- For each: compares snapshot with current state
- If changes found -> schedule UPDATE
- Executes all UPDATEs to DB
- Updates snapshot

When Flush Happens

// 1. On transaction commit (automatic)
@Transactional
public void update() {
    user.setName("New");
}  // commit -> flush -> UPDATE

// 2. On entityManager.flush() (explicit)
entityManager.flush();  // UPDATE executed

// 3. Before executing a query (to return actual data)
user.setName("New");
List<User> users = entityManager.createQuery("FROM User", User.class)
    .getResultList();  // flush happens before this!

Optimization - Read-Only Entities

// If entity won't be modified - mark as read-only
@Query("SELECT u FROM User u WHERE u.id = :id")
@Lock(LockModeType.OPTIMISTIC)
User findReadOnly(@Param("id") Long id);

// Or via hint
User user = entityManager.createQuery("FROM User u WHERE u.id = :id", User.class)
    .setParameter("id", id)
    .setHint("org.hibernate.readOnly", true)
    .getSingleResult();

// Advantages:
// 1. No snapshot created (memory savings)
// 2. Dirty checking not executed
// 3. Less flush overhead

Common Mistakes

// Unnecessary UPDATE
user.setName(user.getName());  // value not changed
// Hibernate will do dirty check, find no changes, and skip UPDATE. However the check itself has overhead for each field - with 10k entities this is millions of comparisons.

// Dirty checking for large contexts
@Transactional
public void processAll() {
    List<User> users = userRepository.findAll();  // 10k users
    for (User user : users) {
        // dirty checking for 10k entities - slow!
        processUser(user);
    }
}

// Solution: periodic clear
@Transactional
public void processAll() {
    List<User> users = userRepository.findAll();
    for (int i = 0; i < users.size(); i++) {
        processUser(users.get(i));
        if (i % 100 == 0) {
            entityManager.flush();
            entityManager.clear();  // reset dirty checking
        }
    }
}

Senior Level

Internal Implementation

PersistenceContext:

StatefulPersistenceContext {
    entityEntries: Map<Object, EntityEntry>
}

EntityEntry {
    loadedState: Object[]    // snapshot at load
    state: Object[]          // current state
    status: Status          // MANAGED, DELETED, READ_ONLY
    id: Serializable
    version: Object
}

Flush algorithm:
1. For each entity in entityEntries:
   if entity.status == MANAGED:
     if !Arrays.equals(entry.loadedState, entry.state):
       scheduleUpdate(entity)
2. Execute all scheduled UPDATEs
3. Update loadedState = state

Performance Characteristics

Dirty checking overhead:
- O(N) where N = number of entities in persistence context
- For each entity: array comparison of all fields
- For 10k+ entities - can be noticeably slow

Optimizations:
- read-only hint -> O(0) (skips entity)
- clear() periodically -> reduces N
- StatelessSession -> no dirty checking

@SelectBeforeUpdate

@Entity
@SelectBeforeUpdate(true)
public class User {
    // Before UPDATE, Hibernate does a SELECT
    // To check if data actually changed
    // Useful when entity is often "updated" without real changes
}

Trade-off: one extra SELECT to avoid an unnecessary UPDATE. Beneficial when the entity is often “saved” without real changes and UPDATE is expensive (triggers, auditing).

Batch Updates

@Transactional
public void batchUpdate(List<User> users) {
    for (int i = 0; i < users.size(); i++) {
        User managed = entityManager.merge(users.get(i));
        managed.setStatus("processed");

        if (i % 50 == 0) {
            entityManager.flush();    // execute UPDATE
            entityManager.clear();    // clear context
        }
    }
}

Best Practices

Dirty checking for simple updates
Read-only hint for read-only queries
entityManager.clear() for large operations
Understanding when flush happens
Periodic flush + clear in batch operations

Dirty checking for large persistence contexts
Without read-only hint when update not needed
Ignoring performance impact
Manual UPDATE when dirty checking is sufficient

Interview Cheat Sheet

Must know:

  • Dirty checking - Hibernate automatically tracks changes to managed entities
  • On load, a snapshot is saved; on flush, compared with current state
  • No explicit merge() or save() needed for updating managed entities
  • O(N) overhead where N = number of entities in persistence context
  • Flush happens on: commit, entityManager.flush(), before query
  • Read-only hint saves memory - no snapshot created, dirty checking skipped

Frequent follow-up questions:

  • When does UPDATE happen? On flush: snapshot != current state -> UPDATE SQL generated
  • Why is dirty checking slow for 10k+ entities? O(N) array comparison for each field of each entity
  • What is @SelectBeforeUpdate? SELECT before UPDATE to check if data changed - trade-off: SELECT vs unnecessary UPDATE
  • How to optimize? Read-only hint, clear() periodically, StatelessSession for bulk operations

Red flags (DO NOT say):

  • “I always call merge() after change” - not needed for managed, extra SELECT
  • “Dirty checking for 100k entities without clear” - O(N) overhead, will be slow
  • “Without read-only hint for reports” - unnecessary snapshot and dirty checking overhead
  • “I don’t understand when flush happens” - critical for transactions

Related topics:

  • [[7. Describe the Entity Lifecycle in Hibernate]]
  • [[13. How Does the Flush Mechanism Work in Hibernate]]
  • [[9. What is the First-Level Cache in Hibernate]]
  • [[14. What is the Difference Between persist() and merge()]]