Thread Dumps and Heap Dumps: The JVM X-Ray

When the JVM won't talk, dumps speak for it.

Thread dumps and heap dumps for JVM diagnosis

The JVM is a remarkably efficient black box. It manages memory, threads, garbage collection, and JIT compilation without you having to think about it — until something goes wrong. When it does go wrong in production, the JVM doesn't volunteer information. You have to ask. The tools to ask are two: thread dumps and heap dumps.

I've diagnosed deadlocks at 2 AM, memory leaks that took weeks to surface, and saturated thread pools that took down entire platforms. In every case, the answer was in a dump. The problem is that many engineers have never read one, and when they need to, they don't know where to start.

Thread Dumps: what every thread is doing

A thread dump is an instant snapshot of every thread in the JVM at a given moment. Each thread shows its name, its state, and its full stack trace. It's the fundamental tool for diagnosing concurrency problems, resource contention, and deadlocks.

You'll encounter four thread states, and each tells a different story:

Capturing thread dumps

Three main methods, all production-safe. They don't stop the JVM, cause no significant pause, and require no restart:

A single thread dump shows one instant. To diagnose intermittent issues, capture three or four at 5-10 second intervals. Threads that appear in the same state and the same line of code across all captures are your suspects.

Diagnostic patterns in thread dumps

After reading hundreds of thread dumps, the patterns repeat. These are the most common:

For automated analysis, tools like IBM Thread Analyzer and fastthread.io parse the dump, group threads by state, detect deadlocks, and visualize contention. fastthread.io is particularly useful — it works in the browser: upload the file and get an immediate report.

Heap Dumps: what's in memory

If the thread dump shows what the JVM is doing, the heap dump shows what it's holding. It's a complete capture of the heap: every object, its type, its size, and its references to other objects. It's the definitive tool for diagnosing memory leaks and garbage collection pressure.

A heap dump can be several gigabytes — proportional to the configured heap size. Capturing it causes a stop-the-world pause whose duration depends on heap size. In production, be aware of this impact.

Capturing heap dumps

Analysis with Eclipse MAT

Eclipse Memory Analyzer Tool (MAT) is the standard tool for heap dump analysis. Free, robust, and capable of handling multi-gigabyte dumps. Three fundamental analyses:

Common memory leak patterns

Real case: memory leak in Sterling OMS

A production IBM Sterling environment ran business processes handling purchase orders. The heap grew steadily: 4 GB after startup, 6 GB after 48 hours, OutOfMemoryError after a week. The team restarted the JVM every 5 days as a "fix".

We configured -XX:+HeapDumpOnOutOfMemoryError and waited. When the OOM hit, we analyzed the dump with Eclipse MAT. The Dominator Tree revealed a single java.util.ArrayList instance retaining 2.3 GB of heap. That list lived inside a processed document cache that never ran eviction.

MAT's Leak Suspects pointed directly to the chain: GC Root → ThreadLocal → BusinessProcessContext → DocumentCache → ArrayList with 1.2 million entries. Every processed document was added for "reuse" but never removed. The fix was a size limit and a 30-minute TTL. The heap stabilized at 3.5 GB.

Don't guess what's happening in the JVM. Measure it. A thread dump takes 2 seconds to capture and can save you hours of speculation. A heap dump is the difference between "I think there's a leak" and "I know exactly where the leak is".

Jorel del Portal

Jorel del Portal

Systems engineer specialized in enterprise software architecture and high availability platforms.