Question 6 · Section 16

What Does the @BatchSize Annotation Do

@BatchSize is one of the most useful Hibernate annotations for combating the N+1 problem. It allows loading related entities in batches, significantly reducing the number of SQL...

Language versions: English Russian Ukrainian

Overview

@BatchSize is one of the most useful Hibernate annotations for combating the N+1 problem. It allows loading related entities in batches, significantly reducing the number of SQL queries.


Junior Level

What is @BatchSize

The @BatchSize annotation makes Hibernate load related entities in batches instead of executing a separate query for each entity.

// Without @BatchSize - N queries
for (Order order : orders) {
    order.getItems().size();
    // SELECT * FROM order_items WHERE order_id = ?  (for each Order)
}
// 100 orders -> 100 queries

// With @BatchSize(size = 10) - N/10 queries
@Entity
public class Order {
    @OneToMany(mappedBy = "order")
    @BatchSize(size = 10)
    private List<OrderItem> items;
}

// Hibernate: SELECT * FROM order_items WHERE order_id IN (?,?,?,?,?,?,?,?,?,?)
// 100 orders -> 10 queries (10 in each)

IN clause - part of an SQL WHERE id IN (…) query. Databases limit the number of parameters. Prepared statement - a pre-compiled SQL query.

How It Works

1. Hibernate collects IDs of entities for which the association needs to be loaded
2. Instead of N separate queries - forms one query with IN (id1, id2, ...)
3. Batch size = @BatchSize value
4. Number of queries = N / batchSize

Where It Can Be Used

// On a collection
@OneToMany(mappedBy = "order")
@BatchSize(size = 25)
private List<OrderItem> items;

// On a class (for loading multiple entities by ID)
@Entity
@BatchSize(size = 50)
public class User { }

// When loading
List<User> users = session.createQuery("FROM User", User.class)
    .setHint(AvailableHints.HINT_SPEC_FETCH_SIZE, 50)
    .getResultList();
// This is a different mechanism: fetch size controls how many rows the JDBC driver
// fetches at a time from the cursor. @BatchSize controls how many IDs
// are substituted into the IN-clause. Don't confuse them!

When @BatchSize is Useless

  1. A single entity is loaded - batch not needed
  2. JOIN FETCH is used - data is already loaded
  3. Collection is small (1-3 elements) - batch overhead > benefit

Middle Level

Choosing Optimal Size

// Too small size
@BatchSize(size = 2)  // hardly helps, 50 queries instead of 100

// Optimal range
@BatchSize(size = 10)  // minimum
@BatchSize(size = 25)  // usually good
@BatchSize(size = 50)  // maximum (more may cause IN clause issues)

Global Configuration

# application.yml - global batch size for all
spring:
  jpa:
    properties:
      hibernate:
        default_batch_fetch_size: 50
// Or via Java Config
@Configuration
public class HibernateConfig {
    @Bean
    public LocalContainerEntityManagerFactoryBean entityManagerFactory() {
        // ...
        properties.put("hibernate.default_batch_fetch_size", 50);
        return factory;
    }
}

Common Mistakes

// Too small size
@BatchSize(size = 1)  // equivalent to no @BatchSize

// Too large size (DB issues)
@BatchSize(size = 1000)  // may exceed IN clause parameter limit
// Oracle: max 1000 parameters
// PostgreSQL: depends on configuration

// Using for @ManyToOne without understanding
@ManyToOne
@BatchSize(size = 25)
private User user;
// Works, but only makes sense when many entities reference different Users

Comparison with @Fetch

// @BatchSize - loads in batches on access
@BatchSize(size = 25)
private List<OrderItem> items;

// @Fetch(SUBSELECT) - one subquery for all
@Fetch(FetchMode.SUBSELECT)
private List<OrderItem> items;

// Combination - best approach
@BatchSize(size = 25)
@Fetch(FetchMode.SUBSELECT)
private List<OrderItem> items;

Senior Level

Internal Implementation

In Hibernate 6+, batch fetching works more efficiently thanks to an improved IN-clause formation algorithm.

How @BatchSize works:

1. On first access to a collection proxy:
   - Hibernate checks BatchSize
   - Collects available parent entity IDs

2. Forms the query:
   - WHERE parent_id IN (id1, id2, ..., idN)
   - N = min(@BatchSize, available IDs, default_batch_fetch_size)

3. Loads all related entities in one query
4. Caches them in the Persistence Context
5. On access to other collections - takes from cache

Important:
- Real batch size = min(@BatchSize, hibernate.default_batch_fetch_size)
- Hibernate dynamically adjusts IN clause size

Database Limitations

Oracle:       max 1000 parameters in IN clause
PostgreSQL:   depends on max_stack_depth and prepared statement
MySQL:        depends on max_allowed_packet
SQL Server:   up to 2100 parameters

Optimization in Production

// Configuration via Hibernate 6
spring.jpa.properties.hibernate.default_batch_fetch_size=50

// For specific entities - override
@Entity
@BatchSize(size = 100)  // larger for frequently used
public class User { }

// Smaller for large collections
@OneToMany
@BatchSize(size = 10)
private List<LogEntry> logs;

Combining Strategies

@Entity
public class Order {

    // JOIN FETCH for mandatory data
    @Query("SELECT DISTINCT o FROM Order o JOIN FETCH o.user WHERE o.id = :id")
    Order findByIdWithUser(@Param("id") Long id);

    // @BatchSize for additional data
    @OneToMany(mappedBy = "order")
    @BatchSize(size = 25)
    private List<OrderItem> items;

    // EntityGraph for dynamic loading
    @EntityGraph(attributePaths = {"items", "payments"})
    List<Order> findByStatus(String status);
}

Monitoring and Debugging

# Enable SQL logging for verification
spring.jpa.show-sql: true
spring.jpa.properties.hibernate.format_sql: true

# For query counting
logging.level.org.hibernate.SQL: DEBUG
logging.level.org.hibernate.type.descriptor.sql.BasicBinder: TRACE

Best Practices

@BatchSize(10-50) for collections
Global default_batch_fetch_size = 25-50
Combine with JOIN FETCH
Account for DB limitations (IN clause limits)
Monitor actual query counts

Too small size (< 5)
Too large size (> 100 requires verification: make sure your DB supports such an IN clause and the query doesn't exceed max_allowed_packet. For reference tables with 5000 records, batch size 100 may be optimal.)
Without query monitoring
Using instead of JOIN FETCH when data is always needed

Interview Cheat Sheet

Must know:

  • @BatchSize loads related entities in batches: N queries -> N/batchSize
  • Real size = min(@BatchSize, hibernate.default_batch_fetch_size)
  • Optimal range: 10-50, too small - doesn’t help, too large - IN clause problems
  • Can be used on collections, classes, and globally via default_batch_fetch_size
  • DBs have limits: Oracle - 1000 parameters, SQL Server - 2100
  • Combines with JOIN FETCH and SUBSELECT for optimal results

Frequent follow-up questions:

  • Where can @BatchSize be used? On collections (@OneToMany), on classes, globally via configuration
  • What if parents were loaded one by one? @BatchSize still works - collects IDs on access
  • @BatchSize vs @Fetch(SUBSELECT)? SUBSELECT - one query for all parents, BatchSize - splits into groups
  • When is @BatchSize useless? Single entity, JOIN FETCH used, or very small collection

Red flags (DO NOT say):

  • “I set @BatchSize(size = 1000)” - will exceed DB parameter limits
  • “@BatchSize replaces JOIN FETCH” - they complement, not replace each other
  • “Global size 1000 for everything” - IN clause and performance problems
  • “I don’t monitor SQL after @BatchSize” - can’t see the effect without monitoring

Related topics:

  • [[1. What is the N+1 Problem and How to Solve It]]
  • [[5. What Fetch Strategies Exist in Hibernate]]
  • [[28. How to Use JOIN FETCH to Solve the N+1 Problem]]
  • [[10. What is the Second-Level Cache and When to Use It]]