Sunday, May 29, 2016

Java Virtual Machine Internals - Master it in 30 mins

1)What is JVM(Java Virtual Machine)?

A Java Virtual Machine(JVM) is a Runtime environment required for execution of a Java application. Every Java Application runs inside a Runtime instance of some concrete implementation of Abstract Specifications of JVM.


Benfits of the JVM : 


-> Java code is compiled down into an intermediary language called byte code. The JVM is then responsible for executing this byte code. The JVM acts as the intermediary layer and handles the OS specific details which means that as developers we shouldn’t need to worry about it.



-> The other benefit of the JVM is that any language that can compile down to byte code can run on it, not just java.  Languages like Groovy, Scala and Clojure are all JVM based languages.

-> The separation from the real hardware also means the code is sandboxed, limiting the amount of damage it can do to a host computer.  Security is a great benefit of the JVM.


A JVM has the following sub components as shown in the below image: 


1)Class Loader Sub System

2)Run Time Data Area
3)Native Methods Interface
4)Native Method Library
5)Execution Engine.





















Let's discuss each in detail - 

1). Class Loader Sub System :


It does three major tasks :
-> Loads the .Class (Classes and Interfaces) files into Memory.
-> Verifies the Byte Code Instructions.
-> Allocating memory for class variables and initializing them to default values.
-> Static fields for a class are created and these are set to standard default values.

Read about the execution order here.

Class loaders can be a bootstrap class loader , loads JDK internal classes, java.* packages ; System class loader  loads classes from system classpath and Extensions class loader loads the jar files from JDK extensions directory - usually lib/ext directory of the JRE. There is a user defined class loader as well which can customize class loading. These load untrustworthy classes and not an intrinsic part of JVM. They are written in Java, converted to class files and loaded into the JVM and installed like any other objects.


A Magic Number of a class file is a unique Identifier for Tools to quickly differentiate class files from non class files.The First Four Bytes of each Java Class file has the Magic Value as 0xCAFEBABE.



2). Run time data areas -

-> Method area - It's strictly a non heap area , stores information about the loaded type , run time constants , static fields with values , code for methods and constructors. The reflection operation are operated on this memory area. All threads share the same method area, so access to the method area's data structures must be designed to be thread-safe. If two threads are attempting to find a class named Lava, for example, and Lava has not yet been loaded, only one thread should be allowed to load it while the other one waits.

What is stored in the method area -

  • Type Information :
    1. The fully qualified name of the type(class / interface).
    2. The fully qualified name of the type's direct super class (unless the type is an interface or class java.lang.Object, neither of which have a super class).
    3. Whether or not the type is a class or an interface.
    4. The type's modifiers (public, abstract, final).
    5. An ordered list of the fully qualified names of any direct super interfaces.
  • The constant pool for the type :
    • A constant pool is an ordered set of constants used by the type , integer , floating point constants etc.
  • Field Information:
    • The information of the fields (name , type access modifiers) and the order in which these fields are declared should be stored.
  • Method information:
    • For each method declared store the information (name ,return type ,number and types (in order) of the method's parameters , access modifiers) and the order in which these methods are declared.
    • In addition the methods bytecode and the Exception table needs also to be stored in the method area.
  • Static class variables :
    • Class variables are shared among all instances of a class and can be accessed even in the absence of any instance. Before a Java virtual machine uses a class, it must allocate memory from the method area for each non-final class variable declared in the class.
    • All final class variables will have a copy in the constant pool.
  • A Reference to Class Class Loader -
    • For each type it loads, a Java virtual machine must keep track of whether or not the type was loaded via the bootstrap class loader or a user-defined class loader. For those types loaded via a user-defined class loader, the virtual machine must store a reference to the user-defined class loader that loaded the type. This information is stored as part of the type's data in the method area.
    • The virtual machine uses this information during dynamic linking. When one type refers to another type, the virtual machine requests the referenced type from the same class loader that loaded the referencing type.
  • A Reference to Class Class
    • An instance of class java.lang.Class is created by the Java virtual machine for every type it loads. The virtual machine must in some way associate a reference to the Class instance for a type with the type's data in the method area.
-> Heap - The objects are created and stored here. The following are the parts of the java Heap.
  1.  New Generation : Short lived objects are in majority in any application. Analyzing all objects in an application during a GC would be slow and time consuming, so as to quickly garbage collect short lived objects they are separated and are placed in the New Generation. A GC in the New Generation is termed as a minor GC. New Generation is further split as :
    1.  Eden Space: All new objects are placed here and when this becomes full, a minor GC occurs. The objects that are still referenced are then promoted to a survivor space.
    2. Survivor Space: Each minor GC increments the number of objects in the survivor space.When an object has survived a sufficient number of minor GCs (defaults to 15) , it will then be promoted to the Old Generation. Some JVM implementations use two survivor spaces.
  2. Old Generation : The objects that survive the minor GCs in the New Generation are promoted to the Old Generation. It is usually much larger. A GC in the old generation is termed as a full GC. Full GCs are stop the world scenario.
  3. PermGen - The permanent generation stores the metadata about the JVM.

One of the benefits of using generations is the reduction of the impact of fragmentation. When an Object is garbage collected, it leaves a gap in the memory where it was. Either we can compact the remaining Objects (a stop-the-world scenario , will describe later) or we can leave them and slot new Objects in. However if we do not compact we may find Objects cannot just fit in the spaces in between.

Let's talk about some garbage collection now :
  • Java has the concept of a garbage collector.  When objects are no longer needed the JVM will automatically identify and clear the memory space for us.
    • Positives : The developer can worry much less about memory management and concentrate on actual problem solving ; The GC has a lot of smart algorithms for memory management which work automatically in the background.
    • Negatives : GC has an effect on the application performance, notably slowing it down or stopping it.The so called : “Stop the world” . STOP THE WORLD : When a GC happens it is necessary to completely pause the threads in an application whilst collection occurs. For most applications long pauses are not acceptable.
  • Garbage Collection Algorithms :
    • Serial - Designed for a single CPU systems where the entire application is stopped whilst GC occurs.It uses mark-sweep-compact. This means it goes through all of the objects and marks which objects are available for Garbage Collection, before clearing them out and then copying all of the objects into contiguous space (so therefore has no fragmentation).
    • Parallel - Similar to Serial, except that it uses multiple threads to perform the GC so should be faster.
    • Concurrent Mark and Sweep (CMS) - This minimizes pauses by doing most of the GC related work concurrently with the processing of the application, minimizing the total time the application has to stop while garbage collection. CMS is a non compacting algorithm which can lead to fragmentation problems. 
    • G1GC (garbage first garbage collector) - A concurrent parallel collector that is viewed as the long term replacement for CMS and does not suffer from the same fragmentation problems as CMS.
-> Stack - Java stacks are created private to a thread. Every thread will have a program counter (PC) and a java stack.  The stack consists of stack Frames. When a thread invokes a Method, The JVM pushes a new frame onto that thread's Java Stack. All Arguments, Local Variables, intermediate computations and Return Values if any are kept in the stack corresponding to the Method invoked. The stack frame on the top of the stack is called the active stack frame, which is the current place of execution. When the Method completes, the Virtual Machine pops and discards the frame for that Method.

-> PC Registers - This stores the memory addresses of the instructions to be executed by the Microprocessor.
-> Native Methods Stacks - Here native methods written in language other than java are executed.

3). Native Method Interfaces - 

This is a program that connects JVM with Native Method Libraries for executing Native Methods.

4). Native Method Library - It holds the Native Libraries Information.


5). Execution Engine - It contains the Interpreter and JIT Compiler, which convert Byte Code into Machine Code. JVM uses Optimization Techniques to decide which part of code to be Interpreted and which part of code to be JIT Compiled.


------------------------------------------------------------------


Tuning the JVM and Garbage Collection : 

-XX:-UseConcMarkSweepGC: Use the CMS collector for the old gen.

-XX:-UseParallelGC: Use Parallel GC for New Gen

-XX:-UseParallelOldGC: Use Parallel GC for Old and New Gen.

-XX:-HeapDumpOnOutOfMemoryError: Create a thread dump when the application runs out of memory. Very useful for diagnostics.

-XX:-PrintGCDetails: Log out details of Garbage Collection.

-Xms512m: Sets the initial heap size to 512m

-Xmx1024m: Sets the maximum heap size to 1024m

-XX:NewSize and -XX:MaxNewSize: Specifically set the default and max size of the New Generation

- XX:NewRatio=3: Set the size of the Young Generation as a ratio of the size of the Old Generation.

-XX:SurvivorRatio=10: Set the size of Eden space relative to the size of a survivor space.

For more info on JVM analysis and debugging click here .
-------------------------------------------------------------------
Some other concepts to understand - 

-> Garbage Collector won’t remove a strong reference. 
-> A soft reference will only get removed if memory is low. 
-> A weak reference will get removed on the next garbage collection cycle. 
-> A phantom reference will be finalized but the memory will not be reclaimed. Can be useful when you want to be notified that an object is about to be collected.

-> Other GC algorithms are reference Counting, tracing ,compacting , copying etc.
-> The JVM specific classes and Method objects are part of permanent zone and are never garbage collected.

---- Thread Synchronizations ----
  • JVM associates a lock with an object or a class to achieve mutilthreading. A lock is like a token or privilege that only one thread can "possess" at any one time. When a thread wants to lock a particular object or class, it asks the JVM. JVM responds to thread with a lock maybe very soon, maybe later, or never. When the thread no longer needs the lock, it returns it to the JVM.
  • If a thread has a lock,no other thread can access the locked data until the thread that owns the lock releases it. The JVM uses locks in conjunction with monitors. A monitor is basically a guardian in that it watches over a sequence of code, making sure only one thread at a time executes the code.
  • A single thread is allowed to lock the same object multiple times. Java Virtual Machine(JVM) maintains a count of the number of times the object has been locked. An unlocked object has a count of zero. When a thread acquires the lock for the first time, the count is incremented to one. Each time the thread acquires a lock on the same object, a count is incremented. Each time the thread releases the lock, the count is decremented. When the count reaches zero, the lock is released and made available to other threads.
  • For a synchronized instance Method, the JVM acquires the lock associated with the object upon which the Method is being invoked. For a class Method, it acquires the lock associated with the class to which the Method belongs. After a synchronized Method completes, whether it completes by returning or by throwing an exception, the lock is released.
I hope after reading this blog your JVM internal concepts are clear. If you have any additional info that's missing in this article or is wrong , i would like to hear it in the comments.

Happy Learning.

No comments:

Post a Comment