Understanding Garbage Collection in Java
This article explains in most simple language how Garbage Collection works for beginners to understand. If you are a pro at the subject, you may turn away right now. Happy learning Folks .
Why do we need Garbage Collection?
Before Java most popular languages were C, C++ and in which programmers were supposed to Allocate the memory like
And at the end of the program, coders had to free objects that at times lead to memory leaks or other issues. So Java came up with automatic Garbage Collection mechanism.
Before we get into that , lets see basic what is objects lifecycle.
The above figure shows basically how objects live in memory.Now Java runs the GC on two concepts:
Hypothetically there are two states of objects Live Objects and Dead Objects. Live Objects are those which are being referred by another objects. The link starts from root of the application and then object to object , so if you reverse transverse any live object you could reach to root object. Then there are dead object which have served their purpose and are no longer referred by any other object but it still occupies space on heap. Hence we needed a memory management process.
So these objects are allocated memory in the “heap” of java memory. Static members, class definitions ,metadata, etc are stored in the “method area” i.e Permgen/Metaspace. Garbage collection is carried out by a daemon thread called Garbage Collector and one cannot force GC to happen. When new allocations can not happen any more then due to a full heap we end up with a java.lang.OutOfMemory error in heap space and that causes a lot of troubles. That means even if we have GC in Java we still have memory leaks in Java.
Garbage Collection has three major components as shown below in diagram.
Here’s what each of the above does:
Mark: Starting from the root application , it traverses down the object graph and marks objects that are reachable as live objects
Sweep/Delete : As name suggests this one sweeps or deletes the unreachable objects
Compacting: It compacts the memory by moving around the objects and marking the allocation contiguous than fragmented.
So all these three steps occur from time to time to clean memory and attain garbage collection, collectively they are also called CMS.
The above diagram depicts how memory is divided and how it is referred in different stages of object lifecycle, we will go through how it works afterwards.
1. Eden Space is place where all new objects are created within this space
2. Let’s say Eden space is full and you need to create a new object. At that point small GC kicks in and moves any unreachable object from Eden to “Survivor space from” indicating that it survived a generation of GC.
3. Objects that are created for a very long duration go through a number of generations of GC and if they are still reachable then they stay in Old Gen else they get cleared by a Major GC.
So at this point we infer Minor GC will only run in Young Generation and Major GC runs across entire memory.
There is a phenomenon called STOP THE WORLD, which means every thread except ones needed for GC will be stopped from executing and will start only once GC task is completed. Stop the world will occur no matter which GC you chose, it’s basically JVM stopping the application form running to execute a GC.
Coming back to the question how does it works? So following few points are very important for understanding how is GC working.
1. Let’s say in Eden Space we are full with objects and minor GC runs as soon as new object is needed. What happens is objects which are unreachable are marked, cleaned and those who are reachable are moved to survivor1 and marked as survived GC1 and that leaves space in eden space for new objects.
MINOR GC Execution
Now again Minor GC occurs and objects from Eden and Survivor 1 are marked – sweep cleaned again and after that all objects are moved to Survivor 2 space with GC age marked and again Eden is free for new objects.
The reason S1 and S2 are in place is it takes the objects and arrange then in “contiguous manner” removing the need of Compact service. Now when eden is full then all move back to S1 and this goes on till the property of threshold gets the age of object. At that minor sweep the object that age move to “Old Gen”. Old Gen objects stay there for a while.
GC Thread monitor old gen and as soon as it approaches full size and then triggers Mark Sweep and Compacting and this becomes a huge operation and halting the current executing apps and Executes Major GC runs through entire heap.
With this comes another question is how do check if GC is occurring fine? We measure the performance of GC in two terms :
We determine these on how quickly can an application respond to a requested piece of info , like how fast can a UI open or a desktop application responds to an event or how fast a db query executes. For application that focus on responsiveness large pause times are not acceptable. The focus is on responding in short periods of time.
On the other hand Throughput focuses on maximizing the amount of work by an application in a specific period of time. Example of how throughput might be measured include below points:
1. The number of transactions completed in a given time.
2. The number of jobs that a batch program can complete in an hour.
3. The number of database queries that can be completed in an hour.
High pause times are acceptable for applications that focuses on throughput. Since high throughput applications focus on benchmarks over longer periods of time, quick response time is not a consideration.
The type of garbage collectors available are:
Serial Collector is basic garbage collector that runs in single thread, can be used for basic applications.
Concurrent Collector is thread that performs GC along with application execution as the application runs , it does not wait for old generation to be full. The situation STOP THE WORLD will only occur during mark/re-mark.
Parallel Collector uses multiple CPUs to perform GC. Multiple threads doing mark/sweep etc. It does not start until heap is full/near to full “Stop The World” will only occur when it runs.
The question arises when to use which GC? Which is the question while you are in designing phase.
CMS (Concurrent Mark Sweep) is to be used when you have more memory , high number of CPU’s . The application would demand small/short pauses and that should not affect the performance. It is known as low latency collector which makes it the most favoured application for GC.
Whereas Parallel Collector is to be used only when there is less memory or lower number of available CPUs. Application demands high throughput and can withstand long pauses.
G1 GC or Garbage First GC straddles the young tenured generations boundary
As it divides heap into different regions and during a GC it can collect a subset of regions to act as young generation in next cycle. It has more predictable (tuneable) GC pauses and it achieves concurrency and parallelism together. This utilizes heap more efficiently.
One final question that has created confusion always is that you always need to remember is Finalizer method gets called only when GC clears object but there is no certainty when the GC occurs so whatever you declare here or resources held will occupy memory . If you recreate a object in this block what will happen is the object is recreated and not garbage collected but the finalize block is called once and only once so next time it will not be called and the object would be cleared.
Hope this helped you folks . I’ll be doing a series of simple understanding documents suggestions are most welcomed.
If you want to sell your readymade software to the genuine clients or businessman, list your software with details and demo links.
Clients will find it using our advanced search filter and will contact you directly.
No any charge for the product lsiting.