Java 11 versus Java 8 Performance

Today I found a nice example of the enormous performance improvements between Java versions.

Prefix

In the last months I rewrote my regression tests for better support of layouts which are a complex feature of the AutoCAD formats.

Because of the complexity small changes can influence the rendering a lot, and changes which makes the layouts in one file look better can easily make a lot of other files look worse. So I rewrote my regression testing from scratch.

Basically the main test runs over all my sample files and renders them into bitmap images which are compared with images from the last run. This is done on a machine with a slightly outdated Intel Core i7-3770K processor in 6 parallel threads (the processor has 4 cores and supports hyperthreading) with 32GByte of physical memory. The Java process itself is restricted to 24GByte to allow using the computer even if the test runs. This includes makinguse of the GUI to have a look at possible problems in an early stage of the test. If problems are severe I can stop the test, fix these problems, and restart.

Performance

One of the goals of the rewrite was to improve performance, so I was a little disappointed when I recognized that my tests was taking a lot more time than expected. Indeed the machine was unusable in between and the regression ran into out of memory (OOM) problems. This is basically not that surprising as some of my test files are really large, and larger means more memory and longer time, so the 6 worker threads tend to all handle these large files in parallel.

I tried a lot of things to improve performance, but in the end I recognized that I used Java 11 to run my old regression test, but Java 8 to run the new one as it is also the version for which the code is implemented (due to various reasons including Oracle’s awful decision how Java versions are supported).

So I switched my regression test to use Java 11 and indeed it ran much faster. Not only that, there were even no more OOMs and the machine still stayed usable.

Luckily I track various performance numbers so I can share what was happening.

Results

Raw Performance Differences

The following numbers are collected by using System.nanoTime() which is only well enough because of the large numbers of times adding up.

Item Java 8 Java 11 Improvement J11 vs J8
Overall Duration for Reading (seconds) 9994.910 7129.167 28.7%
Overall Duration for Conversion (seconds) 26025.677 13051.580 49.9%
Overall Duration for Drawing (seconds) 15480.715 4813.025 68.9%
Sum of the Above Durations (seconds) 51501.302 24993.772 51.5%
Wall Clock Time (seconds) 10131.400 4763.419 53.0%

The last two rows show the combined times spent in all interesting parts of the regression (second to last), and the time in which I’m mostly interested, i.e. how long is the complete regression running.

Basically each file is read into an internal representation of the file format (Reading), converted into a representation optimized for drawing (Conversion), and then drawn to images (Drawing). Some times like creating or writing the images are not included, and there is also a GUI acting in the background which shows the regression running in real time. Therefore the parallelity is very good, better than 5 in both cases although some durations are not included in the sum which would make it even better.

Garbage Collector Differences

But obviously the Java 8 version is much slower. And as said it has some OOM issues. Luckily I also track the garbage collector (find the code used for this below):

Item Java 8 Java 11
Max Memory Used 22,906,667,008 25,769,803,776
# GCs 2 2
GC 1: Name PS MarkSweep G1 Young Generation
GC 1: MemoryPools PS Eden Space, PS Survivor Space, PS Old Gen G1 Eden Space, G1 Survivor Space, G1 Old Gen
GC 1: Collection Count 312 2963
GC 1: Collection Duration 5268.033s 349.181s
GC 2: Name PS Scavenge G1 Old Generation
GC 2: MemoryPools PS Eden Space, PS Survivor Space G1 Eden Space, G1 Survivor Space, G1 Old Gen
GC 2: Collection Count 1517 16
GC 2: Collection Duration 608.569s 131.668s
Complete Collection Duration 5876.602s 480.849s

So here it becomes obvious where most improvements arise. Assuming that garbage collection is often done exclusively (i.e. all threads are stopped for it) more than 50% of my regression’s wall clock run time using Java 8 were spent garbage collecting instead of just 10% in Java 11.

Of course the regression test is awfully heavy on memory usage, so indeed GC improvements should help a lot. But still this is great news, I just have to wait 1h19m for my regression instead of 2h49m, and even the OOM errors do not happen any more.

Appendix

GC Tracking Code

Here’s the code used to output information about garbage collectors (requires Java 8+):

      // helpers
      final String STAT_FORMAT = "%40s: %s\n"
      final PrintStream s = System.out;

      // here starts the real code
      final java.util.List<GarbageCollectorMXBean> beans =
              ManagementFactory.getGarbageCollectorMXBeans();
      s.printf(STAT_FORMAT, "Max Memory", Runtime.getRuntime().maxMemory());
      beans.sort(Comparator.comparingLong(GarbageCollectorMXBean::getCollectionTime));
      s.printf(STAT_FORMAT, "# GCs", beans.size());
      int gcNum = 1;
      long duration = 0L;
      for (GarbageCollectorMXBean gc : beans) {
        final String gcId = "GC "+ gcNum++ + ": ";
        s.printf(STAT_FORMAT, gcId+"Name", gc.getName());
        s.printf(STAT_FORMAT, gcId+"MemoryPools", Arrays.toString(gc.getMemoryPoolNames()));
        s.printf(STAT_FORMAT, gcId+"Collection Count", gc.getCollectionCount());
        s.printf(STAT_FORMAT, gcId+"Collection Duration", gc.getCollectionTime());
        duration += gc.getCollectionTime();
      }
      s.printf(STAT_FORMAT, "Complete GC Duration", duration);