Question 20 · Section 12

How to Find Out How Much Memory a String Occupies

The size of a string in memory depends on the Java version and the string's content.

Language versions: English Russian Ukrainian

🟢 Junior Level

The size of a string in memory depends on the Java version and the string’s content.

Simple calculation for Java 9+:

Total size = String object (~24 bytes) + byte[] array (~16 bytes header) + characters

Example:

String s = "Hello"; // 5 Latin-1 characters
// String object: ~24 bytes
// byte[5] array:  ~21 bytes (16 header + 5 data, rounded to 24)
// Total: ~48 bytes

How to measure precisely: Use the JOL library (Java Object Layout):

import org.openjdk.jol.info.GraphLayout;

String s = "Hello";
System.out.println(GraphLayout.parseInstance(s).totalSize());
// Prints exact size in bytes, including the object itself and all related data

Simple analogy: String is like a box (String object) containing another box (byte array). To find the total size, you need to add both boxes together.


🟡 Middle Level

String object structure (Java 9+, 64-bit JVM with CompressedOops)

String Object (24 bytes):
├── Mark Word (object header):    12 bytes
├── Class Pointer (compressed):    4 bytes
├── byte[] value (reference):      4 bytes
├── byte coder:                    1 byte
├── int hash:                      4 bytes
├── Padding:                       3 bytes (to 8-byte alignment)
└── TOTAL:                        24 bytes (rounded to multiple of 8)

byte[] Array:
├── Array Header (Mark + Class):  12 bytes
├── Array length:                  4 bytes
├── Data:                          N bytes (1 byte/char for Latin-1, 2 for UTF-16)
├── Padding:                       to 8-byte boundary
└── TOTAL:                        16 + N (rounded to 8) bytes

Calculation examples

String Java 8 (char[]) Java 9+ (Latin-1) Java 9+ (UTF-16)
"" 40 bytes 48 bytes N/A
"Hello" 48 bytes 48 bytes N/A
"Привет" 52 bytes N/A 52 bytes
100 chars (Latin-1) 232 bytes 140 bytes N/A
100 chars (mixed) 232 bytes N/A 240 bytes

How to measure in practice

JOL (Java Object Layout):

<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.17</version>
</dependency>
// Full size (String + array)
long total = GraphLayout.parseInstance(s).totalSize();

// Detailed layout
System.out.println(GraphLayout.parseInstance(s).toPrintable());

Table of typical mistakes

Mistake Consequences Solution
Using sizeof like in C++ Java has no sizeof Use JOL or Instrumentation.getObjectSize()
Counting only characters, forgetting headers Underestimating by ~28–40 bytes Always account for overhead: String object + array header
Not accounting for CompressedOops Wrong calculations for Heap > 32GB Without CompressedOops each pointer = 8 bytes instead of 4

Comparison: Java 8 vs Java 9+

Aspect Java 8 Java 9+ (Latin-1) Java 9+ (UTF-16)
Internal array char[] (2 bytes/char) byte[] (1 byte/char) byte[] (2 bytes/char)
coder field No 1 byte 1 byte
"Hello" size 48 bytes 48 bytes N/A
"Привет" size 52 bytes N/A 52 bytes

When you don’t need exact String size measurement

  • Short-lived strings — Young GC collects them for free
  • Small number of strings — overhead is unnoticeable against Heap
  • Prototypes and PoC — optimize only when problem is proven

🔴 Senior Level

Internal Implementation — exact calculation

64-bit JVM with UseCompressedOops (default for Heap < 32GB):

// String object (Java 9+)
// Mark Word:       12 bytes (8 mark word + 4 klass pointer compressed)
// value ref:        4 bytes
// coder:            1 byte
// hash:             4 bytes
// hashIsZero:       1 byte (in some JDK builds)
// Padding:          to 24 bytes (multiple of 8)
// = 24 bytes total

// byte[] array
// Mark Word:       12 bytes
// length:           4 bytes
// data:             N bytes
// Padding:          to 8-byte boundary
// = 16 + N (rounded up to 8)

Without CompressedOops (-XX:-UseCompressedOops, Heap > 32GB):

  • Each pointer = 8 bytes instead of 4
  • String object: ~40 bytes (vs 24 with compressed)
  • byte[] array: ~24 + N bytes

Edge Cases (minimum 3)

1. String Pool — one object, many references:

String s1 = "Hello";
String s2 = "Hello";
String s3 = "Hello";
// All three references point to ONE object in String Pool
// Total memory: 44 bytes (not 3 × 44 = 132)

2. Substring (Java 7+) — copies array:

String huge = "A".repeat(1_000_000); // ~1MB
String sub = huge.substring(0, 5);    // "AAAAA"
// sub — separate byte[5], not a reference to part of huge
// Before Java 7: sub shared huge's array (memory leak with huge.substring(0,5))
// Java 7+: copies — safe, but sub = ~44 bytes

3. Interned strings — additional overhead:

String s = new String("Hello").intern();
// String object: ~44 bytes
// + entry in StringTable: ~24-40 bytes (depends on JVM version, native hashtable entry)
// Total: ~76 bytes per unique interned string

4. CompressedOops disabled at Heap > 32GB:

# At -Xmx64g: CompressedOops may be disabled
# String object: 40 bytes instead of 24
# With 10M strings: +160MB overhead!

5. Substring from UTF-16 string — inherits UTF-16:

String mixed = "Hello Мир";    // UTF-16 (due to Cyrillic)
String sub = mixed.substring(0, 5); // "Hello" — still UTF-16!
// sub takes ~56 bytes (UTF-16: 24 String + 32 byte[10]) instead of ~48 bytes
// (Latin-1: 24 String + 24 byte[5]). Difference ~8 bytes.
// Loss: ~5 bytes per substring

Performance — real measurements

// JOL benchmark
String empty = "";
String latin5 = "Hello";
String cyrillic5 = "Привет";
String latin100 = "A".repeat(100);

GraphLayout.parseInstance(empty).totalSize();     // 40 bytes
GraphLayout.parseInstance(latin5).totalSize();    // 44 bytes
GraphLayout.parseInstance(cyrillic5).totalSize(); // 52 bytes
GraphLayout.parseInstance(latin100).totalSize();  // 140 bytes
Scenario Size per string 1M strings 10M strings
Empty string 40 bytes 40MB 400MB
Latin-1, 5 chars 44 bytes 44MB 440MB
UTF-16, 5 chars 52 bytes 52MB 520MB
Latin-1, 100 chars 140 bytes 140MB 1.4GB
UTF-16, 100 chars 240 bytes 240MB 2.4GB

Memory and GC implications

Heap savings:

  • Compact Strings (Java 9+): ~40–50% savings for ASCII/Latin-1 strings
  • With 70% Latin-1 strings in app: overall Heap reduction 20–30%
  • Less Heap → less frequent Full GC → lower latency

GC cycles:

  • Young GC: scans Eden/Survivor — fewer string allocations → faster scan
  • Old Gen: fewer objects → less work for marking/compaction
  • G1 GC: smaller region size for string-heavy apps → more efficient evacuation

Thread Safety

String — immutable, thread-safe. Size doesn’t change after creation. coderfinal, value@Stable. No race conditions when reading size from multiple threads.

Production War Story

Scenario: Cache of 1M strings in memory (user profiles, JSON API service):

  • Java 8: 1M × ~50 bytes (avg) = ~50MB
  • Java 9+ Compact: 1M × ~35 bytes (avg, 70% Latin-1) = ~35MB
  • Savings: 15MB → less GC pressure, Full GC 25% less frequent

Scenario 2: Highload service with -Xmx2g:

  • Without CompressedOops: String overhead = ~40 bytes/object
  • With CompressedOops: String overhead = ~24 bytes/object
  • With 10M objects: 160MB savings on headers alone
  • This is the difference between stable operation and OOM at peak load

Scenario 3: Log aggregator — storing 10M log lines in memory:

  • Each string: ~200 bytes (avg, mixed Latin-1/UTF-16)
  • Total: ~2GB for strings alone
  • Enabling -XX:+UseStringDeduplication: saves 400MB (duplicate log levels, host names)

Monitoring

# Check CompressedOops
java -XX:+PrintFlagsFinal -version 2>&1 | grep UseCompressedOops
# bool UseCompressedOops = true  {lp64_product}

# Heap histogram — how many strings in memory
jmap -histo:live <pid> | head -30
# num  #instances  #bytes  class name
# 1:   1234567     49382680  java.lang.String

# JOL in runtime
java -javaagent:jol-cli.jar=includes=java.lang.String -jar app.jar

# MAT (Memory Analyzer Tool)
# Heap dump → Dominator Tree → java.lang.String → Shallow/Retained Heap

# JFR — allocations
java -XX:StartFlightRecording=settings=profile,filename=recording.jfr ...
# In JFR: Memory → Object Allocation — filter by java.lang.String
// Runtime measurement via Instrumentation
// (requires -javaagent or Attach API)
long size = instrumentation.getObjectSize(stringInstance);

// JOL — full footprint
System.out.println(GraphLayout.parseInstance(s).toFootprint());

Best Practices for Highload

  • Use JOL for exact measurement, don’t count manually
  • CompressedOops is enabled by default — don’t disable without good reason
  • Compact Strings (Java 9+) give free 40–50% savings for Latin-1
  • String Pool: duplicate strings = one object (savings with high deduplication)
  • For ultra-low-latency: avoid String, use byte[] or ByteBuf (Netty)
  • At Heap > 32GB: CompressedOops is disabled → +30–50% overhead on objects → plan capacity
  • For string-heavy apps: consider -XX:+UseStringDeduplication (G1 GC)
  • Profile before optimizing: sometimes String is only 5% of Heap, and optimization won’t help

🎯 Interview Cheat Sheet

Must know:

  • String size = object (~24 bytes) + byte[] array (~16 + N bytes), where N depends on coder
  • Latin-1: 1 byte/char, UTF-16: 2 bytes/char (Java 9+)
  • "Hello" ≈ 48 bytes (24 String + 24 byte[5]), "Привет" ≈ 52 bytes (UTF-16)
  • JOL (Java Object Layout) — library for exact measurement: GraphLayout.parseInstance(s).totalSize()
  • CompressedOops (default for Heap < 32GB) reduces pointers from 8 to 4 bytes
  • Without CompressedOops (Heap > 32GB): String object ≈ 40 bytes instead of 24

Frequent follow-up questions:

  • How to find exact String size? — JOL: GraphLayout.parseInstance(s).totalSize(). Or Instrumentation.getObjectSize().
  • Why does "" (empty string) take 48 bytes? — 24 bytes (String object) + 24 bytes (empty byte[] array with header).
  • Does String Pool affect size? — Yes: duplicate literals = one object. 100 references to "Hello" = 48 bytes total, not 4800.
  • What happens at Heap > 32GB? — CompressedOops is disabled, each pointer = 8 bytes. With 10M strings: +160MB overhead.

Red flags (DON’T say):

  • ❌ “Java has sizeof like C++” — no, use JOL or Instrumentation
  • ❌ “String size = only character length” — forget about object headers (~40 bytes overhead)
  • ❌ “CompressedOops is always enabled” — disabled at Heap > 32GB
  • ❌ “substring() still shares array” — since Java 7u6 copies, memory leak is fixed

Related topics:

  • [[19. What are Compact Strings in Java 9+]]
  • [[22. What is String Deduplication in G1 GC]]
  • [[1. How String Pool Works]]
  • [[13. What substring() Does and How It Worked Before Java 7]]