Java performance

In software development, the programming language Java was historically considered slower than the fastest 3rd generation typed languages such as C and C++. The main reason being a different language design, where after compiling, Java programs run on a Java virtual machine rather than directly on the computer's processor as native code, as do C and C++ programs. Performance was a matter of concern because much business software has been written in Java after the language quickly became popular in the late 1990s and early 2000s.
Since the late 1990s, the execution speed of Java programs improved significantly via introduction of just-in-time compilation , the addition of language features supporting better code analysis, and optimizations in the JVM. Hardware execution of Java bytecode, such as that offered by ARM's Jazelle, was also explored to offer significant performance improvements.
The performance of a Java bytecode compiled Java program depends on how optimally its given tasks are managed by the host Java virtual machine, and how well the JVM exploits the features of the computer hardware and operating system in doing so. Thus, any Java performance test or comparison has to always report the version, vendor, OS and hardware architecture of the used JVM. In a similar manner, the performance of the equivalent natively compiled program will depend on the quality of its generated machine code, so the test or comparison also has to report the name, version and vendor of the used compiler, and its activated compiler optimization directives.

Virtual machine optimization methods

Many optimizations have improved the performance of the JVM over time. However, although Java was often the first Virtual machine to implement them successfully, they have often been used in other similar platforms as well.

[|Just-in-time compiling]

Early JVMs always interpreted Java bytecodes. This had a large performance penalty of between a factor 10 and 20 for Java versus C in average applications. To combat this, a just-in-time compiler was introduced into Java 1.1. Due to the high cost of compiling, an added system called HotSpot was introduced in Java 1.2 and was made the default in Java 1.3. Using this framework, the Java virtual machine continually analyses program performance for hot spots which are executed frequently or repeatedly. These are then targeted for optimizing, leading to high performance execution with a minimum of overhead for less performance-critical code.
Some benchmarks show a 10-fold speed gain by this means. However, due to time constraints, the compiler cannot fully optimize the program, and thus the resulting program is slower than native code alternatives.

Adaptive optimizing

Adaptive optimizing is a method in computer science that performs dynamic recompilation of parts of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between just-in-time compiling and interpreting instructions. At another level, adaptive optimizing may exploit local data conditions to optimize away branches and use inline expansion.
A Java virtual machine like HotSpot can also deoptimize code formerly JITed. This allows performing aggressive optimizations, while still being able to later deoptimize the code and fall back to a safe path.

Garbage collection

The 1.0 and 1.1 Java virtual machines used a mark-sweep collector, which could fragment the heap after a garbage collection.
Starting with Java 1.2, the JVMs changed to a generational collector, which has a much better defragmentation behaviour.
Modern JVMs use a variety of methods that have further improved garbage collection performance.

Other optimizing methods

Compressed Oops

Compressed Oops allow Java 5.0+ to address up to 32 GB of heap with 32-bit references. Java does not support access to individual bytes, only objects which are 8-byte aligned by default. Because of this, the lowest 3 bits of a heap reference will always be 0. By lowering the resolution of 32-bit references to 8 byte blocks, the addressable space can be increased to 32 GB. This significantly reduces memory use compared to using 64-bit references as Java uses references much more than some languages like C++. Java 8 supports larger alignments such as 16-byte alignment to support up to 64 GB with 32-bit references.

Split bytecode verification

Before executing a class, the Sun JVM verifies its Java bytecodes. This verification is performed lazily: classes' bytecodes are only loaded and verified when the specific class is loaded and prepared for use, and not at the beginning of the program. However, as the Java class libraries are also regular Java classes, they must also be loaded when they are used, which means that the start-up time of a Java program is often longer than for C++ programs, for example.
A method named split-time verification, first introduced in the Java Platform, Micro Edition, is used in the JVM since Java version 6. It splits the verification of Java bytecode in two phases:

Design-time – when compiling a class from source to bytecode
Runtime – when loading a class.

In practice this method works by capturing knowledge that the Java compiler has of class flow and annotating the compiled method bytecodes with a synopsis of the class flow information. This does not make runtime verification appreciably less complex, but does allow some shortcuts.

Escape analysis and lock coarsening

Java is able to manage multithreading at the language level. Multithreading is a method allowing programs to perform multiple processes concurrently, thus producing faster programs on computer systems with multiple processors or cores. Also, a multithreaded application can remain responsive to input, even while performing long running tasks.
However, programs that use multithreading need to take extra care of objects shared between threads, locking access to shared methods or blocks when they are used by one of the threads. Locking a block or an object is a time-consuming operation due to the nature of the underlying operating system-level operation involved.
As the Java library does not know which methods will be used by more than one thread, the standard library always locks blocks when needed in a multithreaded environment.
Before Java 6, the virtual machine always locked objects and blocks when asked to by the program, even if there was no risk of an object being modified by two different threads at once. For example, in this case, a local was locked before each of the add operations to ensure that it would not be modified by other threads, but because it is strictly local to the method this is needless:

public String getNames

Starting with Java 6, code blocks and objects are locked only when needed, so in the above case, the virtual machine would not lock the Vector object at all.
Since version 6u23, Java includes support for [|escape analysis].

Register allocation improvements

Before Java 6, allocation of registers was very primitive in the client virtual machine, which was a problem in CPU designs which had fewer processor registers available, as in x86s. If there are no more registers available for an operation, the compiler must copy from register to memory, which takes time. However, the server virtual machine used a color-graph allocator and did not have this problem.
An optimization of register allocation was introduced in Sun's JDK 6; it was then possible to use the same registers across blocks, reducing accesses to the memory. This led to a reported performance gain of about 60% in some benchmarks.

[|Class data sharing]

Class data sharing is a mechanism which reduces the startup time for Java applications, and also reduces memory footprint. When the JRE is installed, the installer loads a set of classes from the system JAR file into a private internal representation, and dumps that representation to a file, called a "shared archive". During subsequent JVM invocations, this shared archive is memory-mapped in, saving the cost of loading those classes and allowing much of the JVM's metadata for these classes to be shared among multiple JVM processes.
The corresponding improvement in start-up time is more obvious for small programs.

History of performance improvements

Apart from the improvements listed here, each release of Java introduced many performance improvements in the JVM and Java application programming interface.
JDK 1.1.6: First just-in-time compilation
J2SE 1.2: Use of a generational collector.
J2SE 1.3: Just-in-time compiling by HotSpot.
J2SE 1.4: See , for a Sun overview of performance improvements between 1.3 and 1.4 versions.
Java SE 5.0: Class data sharing
Java SE 6:

Split bytecode verification
Escape analysis and lock coarsening
Register allocation improvements

Other improvements:

Java OpenGL Java 2D pipeline speed improvements
Java 2D performance also improved significantly in Java 6

See also 'Sun overview of performance improvements between Java 5 and Java 6'.

Java SE 6 Update 10

Java Quick Starter reduces application start-up time by preloading part of JRE data at OS startup on disk cache.
Parts of the platform needed to execute an application accessed from the web when JRE is not installed are now downloaded first. The full JRE is 12 MB, a typical Swing application only needs to download 4 MB to start. The remaining parts are then downloaded in the background.
Graphics performance on Windows improved by extensively using Direct3D by default, and use shaders on graphics processing unit to accelerate complex Java 2D operations.
Java 7

Several performance improvements have been released for Java 7:
Future performance improvements are planned for an update of Java 6 or Java 7:

Provide JVM support for dynamic programming languages, following the prototyping work currently done on the Da Vinci Machine,
Enhance the existing concurrency library by managing parallel computing on multi-core processors,
Allow the JVM to use both the client and server JIT compilers in the same session with a method called tiered compiling:
*The client would be used at startup,
*The server would be used for long-term running of the application.
Replace the existing concurrent low-pause garbage collector by a new collector called Garbage First to ensure consistent pauses over time.
Comparison to other languages

Objectively comparing the performance of a Java program and an equivalent one written in another language such as C++ needs a carefully and thoughtfully constructed benchmark which compares programs completing identical tasks. The target platform of Java's bytecode compiler is the Java platform, and the bytecode is either interpreted or compiled into machine code by the JVM. Other compilers almost always target a specific hardware and software platform, producing machine code that will stay virtually unchanged during execution. Very different and hard-to-compare scenarios arise from these two different approaches: static vs. dynamic compilations and recompilations, the availability of precise information about the runtime environment and others.
Java is often compiled just-in-time at runtime by the Java virtual machine, but may also be compiled ahead-of-time, as is C++. When compiled just-in-time, the micro-benchmarks of The Computer Language Benchmarks Game indicate the following about its performance:

slower than compiled languages such as C or C++,
similar to other just-in-time compiled languages such as C#,
much faster than languages without an effective native-code compiler, such as Perl, Ruby, PHP and Python.
Program speed

Benchmarks often measure performance for small numerically intensive programs. In some rare real-life programs, Java out-performs C. One example is the benchmark of Jake2. The Java 5.0 version performs better in some hardware configurations than its C counterpart. While it is not specified how the data was measured, it notes how the same Java source code can have a huge speed boost just by updating the VM, something impossible to achieve with a 100% static approach.
For other programs, the C++ counterpart can, and usually does, run significantly faster than the Java equivalent. A benchmark performed by Google in 2011 showed a factor 10 between C++ and Java. At the other extreme, an academic benchmark performed in 2012 with a 3D modelling algorithm showed the Java 6 JVM being from 1.09 to 1.91 times slower than C++ under Windows.
Some optimizations that are possible in Java and similar languages may not be possible in certain circumstances in C++:

C-style pointer use can hinder optimizing in languages that support pointers,
The use of escape analysis methods is limited in C++, for example, because a C++ compiler does not always know if an object will be modified in a given block of code due to pointers,
Java can access derived instance methods faster than C++ can access derived virtual methods due to C++'s extra virtual-table look-up. However, non-virtual methods in C++ do not suffer from v-table performance bottlenecks, and thus exhibit performance similar to Java.

The JVM is also able to perform processor specific optimizations or inline expansion. And, the ability to deoptimize code already compiled or inlined sometimes allows it to perform more aggressive optimizations than those performed by statically typed languages when external library functions are involved.
Results for microbenchmarks between Java and C++ highly depend on which operations are compared. For example, when comparing with Java 5.0:

32 and 64 bit arithmetic operations, File I/O and Exception handling, have a similar performance to comparable C++ programs
Arrays operations performance are better in C.
Trigonometric functions performance is much better in C.

----
;Notes

Multi-core performance

The scalability and performance of Java applications on multi-core systems is limited by the object allocation rate. This effect is sometimes called an "allocation wall". However, in practice, modern garbage collector algorithms use multiple cores to perform garbage collection, which to some degree alleviates this problem. Some garbage collectors are reported to sustain allocation rates of over a gigabyte per second, and there exist Java-based systems that have no problems scaling to several hundreds of CPU cores and heaps sized several hundreds of GB.
Automatic memory management in Java allows for efficient use of lockless and immutable data structures that are extremely hard or sometimes impossible to implement without some kind of a garbage collection. Java offers a number of such high-level structures in its standard library in the java.util.concurrent package, while many languages historically used for high performance systems like C or C++ are still lacking them.

Startup time

Java startup time is often much slower than many languages, including C, C++, Perl or Python, because many classes must be loaded before being used.
When compared against similar popular runtimes, for small programs running on a Windows machine, the startup time appears to be similar to Mono's and a little slower than.NET's.
It seems that much of the startup time is due to input-output bound operations rather than JVM initialization or class loading. Some tests showed that although the new [|split bytecode verification] method improved class loading by roughly 40%, it only realized about 5% startup
improvement for large programs.
Albeit a small improvement, it is more visible in small programs that perform a simple operation and then exit, because the Java platform data loading can represent many times the load of the actual program's operation.
Starting with Java SE 6 Update 10, the Sun JRE comes with a Quick Starter that preloads class data at OS startup to get data from the disk cache rather than from the disk.
Excelsior JET approaches the problem from the other side. Its Startup Optimizer reduces the amount of data that must be read from the disk on application startup, and makes the reads more sequential.
In November 2004, Nailgun, a "client, protocol, and server for running Java programs from the command line without incurring the JVM startup overhead" was publicly released. introducing for the first time an option for scripts to use a JVM as a daemon, for running one or more Java applications with no JVM startup overhead. The Nailgun daemon is insecure: "all programs are run with the same permissions as the server". Where multi-user security is needed, Nailgun is inappropriate without special precautions. Scripts where per-application JVM startup dominates resource use, see one to two order of magnitude runtime performance improvements.

Memory use

Java memory use is much higher than C++'s memory use because:

There is an overhead of 8 bytes for each object and 12 bytes for each array in Java. If the size of an object is not a multiple of 8 bytes, it is rounded up to next multiple of 8. This means an object holding one byte field occupies 16 bytes and needs a 4-byte reference. C++ also allocates a pointer for every object which class directly or indirectly declares virtual functions.
Lack of address arithmetic makes creating memory-efficient containers, such as tightly spaced structures and XOR linked lists, currently impossible.
Contrary to malloc and new, the average performance overhead of garbage collection asymptotically nears zero as the heap size increases.
Parts of the Java Class Library must load before program execution. This leads to a significant memory overhead for small applications.
Both the Java binary and native recompilations will typically be in memory.
The virtual machine uses substantial memory.
In Java, a composite object is created using references to allocated instances of B and C. In C++ the memory and performance cost of these types of references can be avoided when the instance of B and/or C exists within A.

In most cases a C++ application will consume less memory than an equivalent Java application due to the large overhead of Java's virtual machine, class loading and automatic memory resizing. For programs in which memory is a critical factor for choosing between languages and runtime environments, a cost/benefit analysis is needed.

Trigonometric functions

Performance of trigonometric functions is bad compared to C, because Java has strict specifications for the results of mathematical operations, which may not correspond to the underlying hardware implementation. On the x87 floating point subset, Java since 1.4 does argument reduction for sin and cos in software, causing a big performance hit for values outside the range.

Java Native Interface

The Java Native Interface invokes a high overhead, making it costly to cross the boundary between code running on the JVM and native code. Java Native Access provides Java programs easy access to native shared libraries via Java code only, with no JNI or native code. This functionality is comparable to Windows' Platform/Invoke and Python's ctypes. Access is dynamic at runtime without code generation. But it has a cost, and JNA is usually slower than JNI.

User interface

has been perceived as slower than native widget toolkits, because it delegates the rendering of widgets to the pure Java 2D API. However, benchmarks comparing the performance of Swing versus the Standard Widget Toolkit, which delegates the rendering to the native GUI libraries of the operating system, show no clear winner, and the results greatly depend on the context and the environments. Additionally, the newer JavaFX framework, intended to replace Swing, addresses many of Swing's inherent issues.

Use for high performance computing

Some people believe that Java performance for high performance computing is similar to Fortran on compute-intensive benchmarks, but that JVMs still have scalability issues for performing intensive communication on a grid computing network.
However, high performance computing applications written in Java have won benchmark competitions. In 2008, and 2009, an Apache Hadoop based cluster was able to sort a terabyte and petabyte of integers the fastest. The hardware setup of the competing systems was not fixed, however.

In programming contests

Programs in Java start slower than those in other compiled languages. Thus, some online judge systems, notably those hosted by Chinese universities, use longer time limits for Java programs to be fair to contestants using Java.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...