Garbage Collection in Java

Unlike C++, Java automatically allocates and deallocates memory so the applications don’t have to do it by its own. But, the way how garbage collection works depends on JVM implementation. In this article I will talk about HotSpot JVM, since it is commonly used in production.

In HotSpot JVM the heap is separated into the generations basing on weak generational hypothesis because empirical analysis of applications has shown that most objects have short life-cycle and there are few old-to-young references. Therefore, the heap is separated into young generation and old generation. This segregation helps to keep a big bunch of dead objects in one place that can be collected very quickly.  At the same time the young generation separated into three spaces eden, survivor 0 and survivor 1.

jvm_heap2

The perm space is the method area that contains information on methods, classes, method’s code etc. One of the biggest improvements in Java 8 is that perm space is removed from the heap. Since in earlier versions if your application loads a lot of classes then you had to tune and optimize this part of the heap. Another interesting improvement is string deduplication it searches for strings which occured more than once and replaces them to pointer that references to a string.

Performance Triangle

The performance can be measured in terms of latency, throughput and footprint. Latency is the amount of time that an application or component takes to complete an action. Throughput is the number of operations that can be performed by an application or component in a given period of time. Footprint is the amount of memory that an application uses while it is running. Unfortunately during performance optimizations you need to make tradeoff between these three measures.

pref_trian

Because when we get something we usually have to give up something in return.

Garbage Collector Strategies

Since virtual machines are getting better and better and memory usage differ from application to application. You can choose a strategy that is most suitable for you. For instance, the moment when garbage collection occurs is called “Stop the World” event because all application threads are stopped until the operation completes. Too frequent and prolonged “Stop the World” events may significantly affect UI application, where user will have to wait response to long.

There are four ways how unused objects can be removed from the heap.

1) The Serial GC

The most simple garbage collector that performs minor and major garbage collection serially using one thread. It is mainly designed for environment with small memory and small number of CPU cores. Actually it uses only one CPU and freezes all application threads. Therefore, it doesn’t suitable for server environment.

To enable the Serial Collector use:
-XX:+UseSerialGC

2) The Parallel GC

The parallel garbage collector is similar to serial collector but it uses multiple threads to perform the young generation garbage collection, which can significantly reduce garbage collection overhead. It is intended for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded hardware.

To enable the Parallel Collector in young generation use (The old generation will be colleted in one thread):

-XX:+UseParallelGC

To enable the Parallel Collector in young and old generations use:

-XX:+UseParallelOldGC

3) The Concurrent Mark Sweep (CMS) Collector

Garbage collecting in young generation occurs very often and takes much less time than collection in old generation. Therefore when you use CMS collector on the young generation then parallel GS strategy is used.

The old generation is collected in four steps:

  1. Initial mark. Here the process makes a short “Stop the World” pause and searches all root references on objects in registers, stacks etc.
  2. Concurrent mark. On this step the garbage collector marks all objects that can be accessed from the roots as alive objects. This step is performed simultaneously with all threads of the application without “Stop the World” event.
  3. Remark. During this step the CMS makes “Stop the World” pause again and searches alive objects that were created during the concurrent mark step. This step may be performed in more than one thread.
  4. Concurrent sweep. The last step that removes all unmarked objects without “Stop the World” event.

Thus, the CMS divides garbage collection on several parts some of them can be performed simultaneously with the application. This helps to avoid long “Stop the World” pauses. On the other hand this strategy requires more actions that can affect overal performance of the application.

To enable the CMS Collector use:
-XX:+UseConcMarkSweepGC
To set the number of threads use:
-XX:ParallelCMSThreads=<n>

4) The G1 Garbage Collector

The Garbage-First (G1) garbage collector is targeted for multiprocessor machines with large memories. It splits the heap into a set of equally sized regions. By default the size of the region is equal to 1 MB, but you can change it up to 32 MB.

GC_regions

If object’s size is greater than region’s size the GC allocates two or more regions for the object. The set of such regions is called humongous region.

Once the G1 is triggered it selects regions with young generation and some regions with old generation using heuristics. Actually GC tracks various metrics about each region, for instance how long it will take to collect a concrete region. As result the regions that are mostly empty will be collected first (hence, “garbage first”).  Thus, the large amount of memory is released.

Also, the G1 GC reduces heap fragmentation by incremental parallel copying of live objects from one or more sets of regions into one or more different new regions to achieve compaction.

To enable the G1 Collector use:

–XX:+UseG1GC

Heap Common Settings

HotSpot JVM provides many switches that can be used to configure heap space. Here are the most common of them.

Switch Description
-Xms Sets the initial heap size for when the JVM starts.
-Xmx Sets the maximum heap size.
-Xmn Sets the size of the Young Generation.
-XX:PermSize Sets the starting size of the Permanent Generation.
-XX:MaxPermSize Sets the maximum size of the Permanent Generation
-XX:MaxNewSize Sets maximum size of the young generation.
-XX:NewSize Sets initial size of the young generation.
-XX:SurvivorRatio=<value>: Set ratio between eden and one survivor space. The eden is <value> times bigger than a survivor space.
-XX:MaxTenuringThreshold Defines how many minor GC cycles an object can stay in the survivor spaces until it finally gets tenured to the old space.

Leave a Reply