Introduction into Java Virtual Machine

When we are talking about Java Virtual Machine we can think of three different things: specification, implementation or instance of the JVM. Lets have a look at these things in more details. The specification describes abstract concept without any implementation details to keep interoperability. Basing on the specification we can create implementation for almost all platforms. Thus, the same Java byte code can be executed on Windows or Linux. The difference between implementations is not only based on the platform, some JVM implementations can differ by performance, vendor or by additional functionality. To run Java application we need an instance of JVM implementation. The new instance is created every time the application started and instance is destroyed when application completed. So – if we run three applications then three instances will be created.

Basically, by the specification, the Java virtual machine is described in terms of  subsystems, runtime area etc:
jvm_internlas

Classloader

JVM uses classloader to load Java classes into JRE. The class loader loads classes on demand. For example, the class file of the program being executed is loaded first and then other classes and interfaces are loaded as they get referenced in the bytecode being executed.

On the diagram above the class loader is represented as a single block, but it may consists of three different class loaders: bootstrap classloader, extensions classloader and system classloader.

  1. Bootstrap classloader loads the Java libraries placed in the <JAVA_HOME>/jre/lib directory.
  2. The extensions class loader loads the classes in the extensions directories <JAVA_HOME>/jre/lib/ext.
  3. The system class loader loads classes located in CLASSPATH directory.

Besides these three classloaders you can create your own user-defined classloaders in the application. The below figure shows the hierarchy of classloaders.

classloadersAll classloaders except bootstrap (that is usually written in native code) have parent and extend java.lang.ClassLoader abstract class. In such hierarchy a class loader delegates classloading to its parent before attempting to load a class itself.

Here goes an example of user-defined class which I took here:

class NetworkClassLoader extends ClassLoader {
    String host;
    int port;

    public Class findClass(String name) {
        byte[] b = loadClassData(name);
        return defineClass(name, b, 0, b.length);
    }

    private byte[] loadClassData(String name) {
        // load the class data from the connection
    }
}

The user-defined classloaders can be used when you need to change the way the bytecode is loaded. For instance, the bytecode can be loaded via HTTP or it can be encrypted. Or you can modify the loaded bytecode.

Another key points on classloaders:

  1. The class is loaded only once. If class has already loaded, the classloader returns it from cache.
  2. All classloaders have its own cache or namespace.
  3. If the class is not found, a java.lang.ClassNotFoundException is thrown.

Method Area

When a class is loaded via classloader, the JVM extracts information about the class into the Method(class) area. The method area saves per-class structures such as the run-time constant pool, field and method data, and the code for methods and constructors.

The way how per-class structures are stored in method area depends on the implementation. Usually the implementation balances between fast access and size of the method area.

Method area is shared between all threads of the application. Via class loader you can remove per-class structures.

Heap

Heap is shared among all threads of the application and it contains class instances and arrays. The way how the heap stores objects depends on the implementation. For instance, in Oracle HotSpot JVM the heap is divided into generations and there are different types of garbage collection.

I’m going to create a separate post about HotSpot heap. It will be ready soon.

Stack

When a thread invokes a Java method the JVM creates and pushes a frame into the stack.  Once method is completed JVM removes the frame from the stack. Each frame includes local variables, another stack with operands (operand stack) and frame data.

Local variables are stored in array of words. Thus, an instruction that uses local variable can access it by index. Values of type int, float, reference and returnAddress occupy one entry in the array, long and double occupy two entries (dual entry). Byte, short and char are converted to int before saving. If an instruction references to the dual enty it provides index of the first entry. Also, non-static methods holds reference to the instance of object on which this method was invoked (it is called this reference).

Operand stack is used to store arguments and return values of the virtual machine instructions. For example, iadd instruction adds two integers and it expects that these two integers are on the top of the stack.

iload_0     // Pushes int value from local variable 0 into the stack.
iload_1     // Pushes int value from local variable 1 into the stack.
iadd        // Pops these two values, adds them, and pushes the result
            // back into the stack.

Frame data includes references to constant pool, assist in processing a normal or abrupt method completion and keeps exceptions table.

Program Counter Register

Whenever a thread is started JVM creates its own program counter register. The PC register refers to the current instruction being executed by the thread. If current executed method is native then value of the PC register is undefined.

Native Method Call

A stack for native methods typically written in other languages and also contains frames. Usually it is allocated per thread and may throw StackOverflowError.

Execution Engine

The bytecode is a machine code for JVM and execution engine is responsible for its execution. We already know that when JVM loads a class file it extracts a stream of bytecodes for every method in the class into the method area.

The stream of bytecodes is a sequence of instructions where each instruction has opcode followed by zero or more operands. The bellow is an example of java code compiled into bytecode:

Java code
public class ByteCode {

    public static final String GREETING = "Hello world!";

    public static void main(String[] args) {
        ByteCode bt = new ByteCode();
        bt.sayHello();
    }

    public void sayHello(){
        try {
            System.out.println(GREETING);
        } catch (ClassCastException e) {
            System.out.println("Error description 1");
        } catch (Exception e) {
            System.out.println("Error description 2");
        }
    }

    public static void sayBye(){
        String farewell = "Bye!";
        System.out.println(farewell);
    }

}
Bytecode
Compiled from "ByteCode.java"
public class ByteCode {
  public static final java.lang.String GREETING;

  public ByteCode();
    Code:
       0: aload_0       
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return        

  public static void main(java.lang.String[]);
    Code:
       0: new           #2                  // class ByteCode
       3: dup           
       4: invokespecial #3                  // Method "<init>":()V
       7: astore_1      
       8: aload_1       
       9: invokevirtual #4                  // Method sayHello:()V
      12: return        

  public void sayHello();
    Code:
       0: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #6                  // String Hello world!
       5: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: goto          32
      11: astore_1      
      12: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
      15: ldc           #9                  // String Error description 1
      17: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      20: goto          32
      23: astore_1      
      24: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
      27: ldc           #11                 // String Error description 2
      29: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      32: return        
    Exception table:
       from    to  target type
           0     8    11   Class java/lang/ClassCastException
           0     8    23   Class java/lang/Exception

  public static void sayBye();
    Code:
       0: ldc           #12                 // String Bye!
       2: astore_0      
       3: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
       6: aload_0       
       7: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      10: return        
}

How exactly execution engine executes bytecode again depends on JVM implementation. It can be interpretor, just-in-time compiler or adaptive optimization.

Leave a Reply