How to Find Out How Much Memory a String Occupies

🟢 Junior Level

The size of a string in memory depends on the Java version and the string’s content.

Simple calculation for Java 9+:

Total size = String object (~24 bytes) + byte[] array (~16 bytes header) + characters

Example:

String s = "Hello"; // 5 Latin-1 characters
// String object: ~24 bytes
// byte[5] array:  ~21 bytes (16 header + 5 data, rounded to 24)
// Total: ~48 bytes

How to measure precisely: Use the JOL library (Java Object Layout):

import org.openjdk.jol.info.GraphLayout;

String s = "Hello";
System.out.println(GraphLayout.parseInstance(s).totalSize());
// Prints exact size in bytes, including the object itself and all related data

Simple analogy: String is like a box (String object) containing another box (byte array). To find the total size, you need to add both boxes together.

🟡 Middle Level

String object structure (Java 9+, 64-bit JVM with CompressedOops)

String Object (24 bytes):
├── Mark Word (object header):    12 bytes
├── Class Pointer (compressed):    4 bytes
├── byte[] value (reference):      4 bytes
├── byte coder:                    1 byte
├── int hash:                      4 bytes
├── Padding:                       3 bytes (to 8-byte alignment)
└── TOTAL:                        24 bytes (rounded to multiple of 8)

byte[] Array:
├── Array Header (Mark + Class):  12 bytes
├── Array length:                  4 bytes
├── Data:                          N bytes (1 byte/char for Latin-1, 2 for UTF-16)
├── Padding:                       to 8-byte boundary
└── TOTAL:                        16 + N (rounded to 8) bytes

Calculation examples

String	Java 8 (`char[]`)	Java 9+ (Latin-1)	Java 9+ (UTF-16)
`""`	40 bytes	48 bytes	N/A
`"Hello"`	48 bytes	48 bytes	N/A
`"Привет"`	52 bytes	N/A	52 bytes
100 chars (Latin-1)	232 bytes	140 bytes	N/A
100 chars (mixed)	232 bytes	N/A	240 bytes

How to measure in practice

JOL (Java Object Layout):

<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.17</version>
</dependency>

// Full size (String + array)
long total = GraphLayout.parseInstance(s).totalSize();

// Detailed layout
System.out.println(GraphLayout.parseInstance(s).toPrintable());

Table of typical mistakes

Mistake	Consequences	Solution
Using `sizeof` like in C++	Java has no `sizeof`	Use JOL or `Instrumentation.getObjectSize()`
Counting only characters, forgetting headers	Underestimating by ~28–40 bytes	Always account for overhead: String object + array header
Not accounting for CompressedOops	Wrong calculations for Heap > 32GB	Without CompressedOops each pointer = 8 bytes instead of 4

Comparison: Java 8 vs Java 9+

Aspect	Java 8	Java 9+ (Latin-1)	Java 9+ (UTF-16)
Internal array	`char[]` (2 bytes/char)	`byte[]` (1 byte/char)	`byte[]` (2 bytes/char)
coder field	No	1 byte	1 byte
`"Hello"` size	48 bytes	48 bytes	N/A
`"Привет"` size	52 bytes	N/A	52 bytes

When you don’t need exact String size measurement

Short-lived strings — Young GC collects them for free
Small number of strings — overhead is unnoticeable against Heap
Prototypes and PoC — optimize only when problem is proven

🔴 Senior Level

Internal Implementation — exact calculation

64-bit JVM with UseCompressedOops (default for Heap < 32GB):

// String object (Java 9+)
// Mark Word:       12 bytes (8 mark word + 4 klass pointer compressed)
// value ref:        4 bytes
// coder:            1 byte
// hash:             4 bytes
// hashIsZero:       1 byte (in some JDK builds)
// Padding:          to 24 bytes (multiple of 8)
// = 24 bytes total

// byte[] array
// Mark Word:       12 bytes
// length:           4 bytes
// data:             N bytes
// Padding:          to 8-byte boundary
// = 16 + N (rounded up to 8)

Without CompressedOops (-XX:-UseCompressedOops, Heap > 32GB):

Each pointer = 8 bytes instead of 4
String object: ~40 bytes (vs 24 with compressed)
byte[] array: ~24 + N bytes

Edge Cases (minimum 3)

1. String Pool — one object, many references:

String s1 = "Hello";
String s2 = "Hello";
String s3 = "Hello";
// All three references point to ONE object in String Pool
// Total memory: 44 bytes (not 3 × 44 = 132)

2. Substring (Java 7+) — copies array:

String huge = "A".repeat(1_000_000); // ~1MB
String sub = huge.substring(0, 5);    // "AAAAA"
// sub — separate byte[5], not a reference to part of huge
// Before Java 7: sub shared huge's array (memory leak with huge.substring(0,5))
// Java 7+: copies — safe, but sub = ~44 bytes

3. Interned strings — additional overhead:

String s = new String("Hello").intern();
// String object: ~44 bytes
// + entry in StringTable: ~24-40 bytes (depends on JVM version, native hashtable entry)
// Total: ~76 bytes per unique interned string

4. CompressedOops disabled at Heap > 32GB:

# At -Xmx64g: CompressedOops may be disabled
# String object: 40 bytes instead of 24
# With 10M strings: +160MB overhead!

5. Substring from UTF-16 string — inherits UTF-16:

String mixed = "Hello Мир";    // UTF-16 (due to Cyrillic)
String sub = mixed.substring(0, 5); // "Hello" — still UTF-16!
// sub takes ~56 bytes (UTF-16: 24 String + 32 byte[10]) instead of ~48 bytes
// (Latin-1: 24 String + 24 byte[5]). Difference ~8 bytes.
// Loss: ~5 bytes per substring

Performance — real measurements

// JOL benchmark
String empty = "";
String latin5 = "Hello";
String cyrillic5 = "Привет";
String latin100 = "A".repeat(100);

GraphLayout.parseInstance(empty).totalSize();     // 40 bytes
GraphLayout.parseInstance(latin5).totalSize();    // 44 bytes
GraphLayout.parseInstance(cyrillic5).totalSize(); // 52 bytes
GraphLayout.parseInstance(latin100).totalSize();  // 140 bytes

Scenario	Size per string	1M strings	10M strings
Empty string	40 bytes	40MB	400MB
Latin-1, 5 chars	44 bytes	44MB	440MB
UTF-16, 5 chars	52 bytes	52MB	520MB
Latin-1, 100 chars	140 bytes	140MB	1.4GB
UTF-16, 100 chars	240 bytes	240MB	2.4GB

Memory and GC implications

Heap savings:

Compact Strings (Java 9+): ~40–50% savings for ASCII/Latin-1 strings
With 70% Latin-1 strings in app: overall Heap reduction 20–30%
Less Heap → less frequent Full GC → lower latency

GC cycles:

Young GC: scans Eden/Survivor — fewer string allocations → faster scan
Old Gen: fewer objects → less work for marking/compaction
G1 GC: smaller region size for string-heavy apps → more efficient evacuation

Thread Safety

String — immutable, thread-safe. Size doesn’t change after creation. coder — final, value — @Stable. No race conditions when reading size from multiple threads.

Production War Story

Scenario: Cache of 1M strings in memory (user profiles, JSON API service):

Java 8: 1M × ~50 bytes (avg) = ~50MB
Java 9+ Compact: 1M × ~35 bytes (avg, 70% Latin-1) = ~35MB
Savings: 15MB → less GC pressure, Full GC 25% less frequent

Scenario 2: Highload service with -Xmx2g:

Without CompressedOops: String overhead = ~40 bytes/object
With CompressedOops: String overhead = ~24 bytes/object
With 10M objects: 160MB savings on headers alone
This is the difference between stable operation and OOM at peak load

Scenario 3: Log aggregator — storing 10M log lines in memory:

Each string: ~200 bytes (avg, mixed Latin-1/UTF-16)
Total: ~2GB for strings alone
Enabling -XX:+UseStringDeduplication: saves 400MB (duplicate log levels, host names)

Monitoring

# Check CompressedOops
java -XX:+PrintFlagsFinal -version 2>&1 | grep UseCompressedOops
# bool UseCompressedOops = true  {lp64_product}

# Heap histogram — how many strings in memory
jmap -histo:live <pid> | head -30
# num  #instances  #bytes  class name
# 1:   1234567     49382680  java.lang.String

# JOL in runtime
java -javaagent:jol-cli.jar=includes=java.lang.String -jar app.jar

# MAT (Memory Analyzer Tool)
# Heap dump → Dominator Tree → java.lang.String → Shallow/Retained Heap

# JFR — allocations
java -XX:StartFlightRecording=settings=profile,filename=recording.jfr ...
# In JFR: Memory → Object Allocation — filter by java.lang.String

// Runtime measurement via Instrumentation
// (requires -javaagent or Attach API)
long size = instrumentation.getObjectSize(stringInstance);

// JOL — full footprint
System.out.println(GraphLayout.parseInstance(s).toFootprint());

Best Practices for Highload

Use JOL for exact measurement, don’t count manually
CompressedOops is enabled by default — don’t disable without good reason
Compact Strings (Java 9+) give free 40–50% savings for Latin-1
String Pool: duplicate strings = one object (savings with high deduplication)
For ultra-low-latency: avoid String, use byte[] or ByteBuf (Netty)
At Heap > 32GB: CompressedOops is disabled → +30–50% overhead on objects → plan capacity
For string-heavy apps: consider -XX:+UseStringDeduplication (G1 GC)
Profile before optimizing: sometimes String is only 5% of Heap, and optimization won’t help

🎯 Interview Cheat Sheet

Must know:

String size = object (~24 bytes) + byte[] array (~16 + N bytes), where N depends on coder
Latin-1: 1 byte/char, UTF-16: 2 bytes/char (Java 9+)
"Hello" ≈ 48 bytes (24 String + 24 byte[5]), "Привет" ≈ 52 bytes (UTF-16)
JOL (Java Object Layout) — library for exact measurement: GraphLayout.parseInstance(s).totalSize()
CompressedOops (default for Heap < 32GB) reduces pointers from 8 to 4 bytes
Without CompressedOops (Heap > 32GB): String object ≈ 40 bytes instead of 24

Frequent follow-up questions:

How to find exact String size? — JOL: GraphLayout.parseInstance(s).totalSize(). Or Instrumentation.getObjectSize().
Why does "" (empty string) take 48 bytes? — 24 bytes (String object) + 24 bytes (empty byte[] array with header).
Does String Pool affect size? — Yes: duplicate literals = one object. 100 references to "Hello" = 48 bytes total, not 4800.
What happens at Heap > 32GB? — CompressedOops is disabled, each pointer = 8 bytes. With 10M strings: +160MB overhead.

Red flags (DON’T say):

❌ “Java has sizeof like C++” — no, use JOL or Instrumentation
❌ “String size = only character length” — forget about object headers (~40 bytes overhead)
❌ “CompressedOops is always enabled” — disabled at Heap > 32GB
❌ “substring() still shares array” — since Java 7u6 copies, memory leak is fixed

Related topics:

[[19. What are Compact Strings in Java 9+]]
[[22. What is String Deduplication in G1 GC]]
[[1. How String Pool Works]]
[[13. What substring() Does and How It Worked Before Java 7]]