An In-depth Look at Gradle's Approach to Faster Compilation

August 25, 2023

General

Definitions
Comparing the approaches
Performance comparison
Conclusion
Notes

Introduction

One of many performance optimizations that make Gradle Build Tool fast and scalable is compilation avoidance. Gradle avoids recompiling as much as possible by determining if the result of compilation would be identical, even if upstream dependencies have changed.

The situation can be illustrated like this: if classes MyApp and NumberUtils are in different projects and MyApp’s project depends on NumberUtils’s project at compile time, then any internal change to NumberUtils does not require MyApp’s project to be recompiled. Both before and after the change, MyApp compiles to identical bytecode, so Gradle can continue to use the MyApp.class file it has already built.

A GitHub Gist showing MyApp.java which uses NumberUtils.sum, and a NumberUtils.java.diff which shows a change from an imperative for-loop summation to a Stream-based reduce. On the right, it says "Does MyApp need to be recompiled? No!"

Gradle is smart enough to avoid recompiling a class if and only if two conditions are true: 1) its own code hasn’t changed and 2) any changes to classes it compiles against are ABI-compatible. As discussed previously in the Compilation Avoidance blog post, this is a complementary feature to incremental compilation and works without the need to generate “header JARs” or “ABI JARs” known from some other build systems.

In this post, we compare these two approaches to compilation avoidance (with and without the generation of header JARs) and measure how skipping header JAR generation allows Gradle to achieve superior build performance.

Definitions #

Before we dive into performance comparisons, let’s clarify a few important concepts.

What is an ABI? #

ABI stands for “Application Binary Interface”.

In Java, this means the parts of the library that are exposed to consumers, such as most public classes, public methods, and public fields. A library’s ABI does not include the bodies of methods, private classes, private methods, or private fields.

If a library changes its ABI, this can cause compilation failures or runtime errors for downstream consumers. If a library’s ABI has not changed between two versions, we call those versions “ABI-compatible” or “binary compatible”.

A full definition of binary compatibility can be found in the Java Language Specification, Chapter 13.

What is a JAR file? #

A JAR file is a compressed archive that contains .class and resource files for a library using the same format as a ZIP file and a .jar extension.

JAR files are used by javac to compile Java source code against a library’s ABI.

What is a header JAR? #

A header JAR (also known as an “ABI JAR”) is created by taking the original set of classes or sources and removing the parts that are not part of the ABI. This means that the file contents of the header JAR itself represents the ABI of the library.

What is compilation avoidance? #

Compilation avoidance is an optimization done by build systems to avoid the expensive work of compiling if the output wouldn’t change. A compilation can be avoided if the ABI of the input classes remains the same.

This is safe because even if a library’s internals are changing, there is still no need to recompile unchanged downstream consumers unless its ABI changes. Internal changes to the implementation of a library may result in different behavior at runtime for the consumer, but cannot change the result of the consumer’s compilation.

Comparing the approaches #

Some build systems, such as Bazel, generate header JARs in order to implement compilation avoidance. On the other hand, Gradle does compilation avoidance without generating header JARs. This section compares the two approaches.

How do build systems that use header JARs do compilation avoidance? #

Build systems typically rely on comparing the exact file contents between the previous and current build to determine if work should be done. If the file changes in any way, the old output is invalidated and must be built again. By using header JARs as the inputs to compile downstream projects (instead of the full JAR file containing implementation code like method bodies), ABI-compatible changes in dependencies will produce the same file contents for the header JAR. This allows the build system to use the file content to determine if a compilation should be avoided.

A downside of this approach is that there is no finer-grained level of avoidance – even ABI changes in internal library classes that are never used by a downstream project can cause recompilation because the contents of the header JAR will still change.

Additionally, as header JARs are typically not distributed, these build systems must generate them upon downloading a library JAR for the first time, and usually store both copies of the JAR.

For local dependencies, the header JAR can be generated in parallel with real compilation, which can allow downstream consumers to start compiling faster. In practice, we noticed that header JAR generation was much slower than compilation, resulting in a net penalty.

How does Gradle do compilation avoidance without header JARs? #

Instead of only relying on file contents to detect changes, Gradle analyzes JARs and directories that are used on a compile classpath. This analysis is similar to the process of creating a header JAR, but instead of emitting a new JAR file, which would require lots of disk I/O and take extra space, Gradle directly checks the ABI against the previous ABI. In this way, Gradle still gets the benefits of a header JAR without the extra costs.

In addition, because Gradle has the full class analysis available, it also knows exactly which classes were changed, and can avoid recompiling any files that do not depend on them, avoiding even more compilation than header JARs.

Gradle also does not handle local dependencies any differently from remote dependencies, aside from using the results of compilation directly in class directories instead of packaging them into a JAR.

Process Details #

Imagine you had a project that builds a Java application that has a Java library dependency¹. A typical workflow is to edit a source file, then re-run the application. If the source file you edit is in the library, then the build system will need to perform the following steps:

A graph showing the order of work done by Gradle and Bazel. It shows two execution columns, connected from left to right by dotted arrows. Dashes surround the application-related compilation and JAR steps.

The dotted dependencies only exist for Gradle, as Bazel analyzes the source code for a local project dependency instead of using the output of the library compilation, and also compiles against the header JAR. This means that the two sides of this graph can be done fully in parallel for Bazel, only coming together when they are needed to run the application.

The dashed nodes are skipped if the new ABI is the same as the previous ABI.

Even though Bazel can do more work in parallel, due to the large amount of time it takes to write new class files, compress them into a new JAR, and write that to disk, it ends up being much slower due to the generation of the header JAR.

Performance comparison #

Using the Gradle Profiler, we can capture build timing operation profiles from both Gradle and Bazel. See the experimental project used here for instructions on how to reproduce these measurements.

We looked at two scenarios. One involves an ABI change and the other a non-ABI (internal) change. These scenarios are designed to highlight performance differences relevant to the usage of header JARs to avoid unnecessary re-compilation, not as a comprehensive comparison of Gradle and Bazel.

These measurements were performed using the latest currently available version of Bazel (6.2.1) and recent versions of Gradle (8.0+) on an Apple M2 Max MacBook Pro with 64 GB running macOS 13.4.1 using the Temurin 17.0.7 JDK. Where in particular Gradle is spending its build time can be discovered by setting the Gradle Profiler to produce low-level Chrome traces and interpreting the results.

Project structure #

Our experiment used a synthetic 1000 project build with 10 source files per project and inter-project dependencies. This is a project structure more typical of Bazel than Gradle projects. There are various tradeoffs to either build tool’s typical way of structuring large builds, such as the number of inter-project dependencies the build system must consider with a greater number of projects.

Unlike Gradle, Bazel does not perform incremental compilation at the project level. In this experiment only a fraction (25%) of the projects directly depend on the changing class. However the projects that do include such a dependency use Production0 either directly or transitively in every class, to minimize Gradle’s incremental compilation benefits.

The class Production0 defined in project0 is used directly by a quarter of the projects in this build (every project with a number divisible by 4).

For example the class Production44 in project4 contains the code:

private Production0 property0 = new Production0();

public Production0 getProperty0() {
    	return property0;
}

public void setProperty0(Production0 value) {
    	property0 = value;
}

private String property1 = property0.getProperty1();

The big picture of inter-project dependencies looks like this:

A graph of the project layout. project3 depends on project0-2, project7 depends on project4-6, project4 depends on project0, and project8 depends on project0. An arrow indicates that the class Production0 will be changed.

This pattern of project and class dependencies repeats for higher numbered projects. The red projects (which have numbers divisible by 4) contain classes such as Production44 that must be recompiled whenever there is an ABI change to Production0. Various other projects, marked in yellow in the diagram, only transitively depend on project0, should not need to be recompiled, as they do not use the changed Production0 type in any way. Most projects, marked in white, have no dependency relationship on project0.

In contrast, when there is a non-ABI change, only Production0 in project0 should need to be recompiled. Other projects with a direct dependency such as project4 are not affected as the ABI of project0 remains the same, and so there is no need for their recompilation.

Results #

When comparing Bazel and Gradle, one thing immediately noticeable is that both Bazel and Gradle show dramatic improvements in build time when handling a non-ABI change vs. an ABI change. When dealing with an ABI change however, Bazel is much slower than Gradle. For a non-ABI change, Bazel is actually slightly faster than Gradle 8.0. However, the latest versions of Gradle are also faster than Bazel for a non-ABI change.

In the ABI-change scenario, Bazel is slower mainly because it needs to generate new header JARs for ABI changes. Since many projects directly depend on the project containing the ABI modification, when building with Bazel these projects also need their header JARs recompiled. This is because Bazel doesn’t have a way to know the ABI change didn’t affect them.

Gradle can extract the new ABI to verify there is no need for these recompilations without generating and packaging a header JAR, avoiding work that can be quite expensive in terms of overall build time. Gradle’s incremental compilation also improves performance during the ABI change scenario, though this is less impactful in this project.

The measurements also demonstrate significant improvements in the latest Gradle releases, which reflects the Gradle team’s commitment to improving performance and scalability. The specific optimizations that affect the measured scenarios are explained below.

Performance improvements in Gradle 8.1 #

Due to the large number of subprojects involved, much of the Gradle build time is spent configuring the build and recalculating the Directed Acyclic Graph of task dependencies rather than task execution.

In Gradle 8.1, the Configuration Cache became stable. With configuration caching enabled, subsequent invocations of the same tasks skip the work of the configuration phase and get right to task execution, speeding up Gradle builds in both scenarios.

Configuration cache was enabled in the 8.1 and 8.3 tests. Using this feature is a great way to improve overall build times.

Further performance improvements in Gradle 8.3 #

As we analyzed Gradle execution traces to understand the results of this experiment, we noticed that Gradle was spending disproportionately more time starting fresh compiler daemons on each build than compiling.

Starting in the 8.3 release, Gradle reuses Java compiler daemons by keeping them around between builds in order to avoid repaying their startup costs with each compilation. This can speed up some builds significantly and requires no additional configuration. With this change, Gradle 8.3 is even faster than Gradle 8.1 in both scenarios above.

The persistent compiler deamons optimization is enabled by default for macOS and Linux in Gradle 8.3. Windows support is coming in the 8.4 release.

Conclusion #

Compilation avoidance is an important build performance optimization that speeds up builds by avoiding recompilation of downstream consumers when a library’s ABI has not changed.

In this blog post, we discussed two approaches to compilation avoidance: an approach relying on header JARs used in some build systems and the approach used by Gradle that calculates ABIs and short-circuits header JAR generation.

Gradle’s ability to analyze the classpath directly instead of building a header JAR for each change means it does not spend extra time creating and compiling partial class files and writing another JAR to disk. This advantage scales with the number of subprojects in a build.

For a large build with many interconnected projects that we measured, Gradle’s classpath analysis and incremental compilation already result in a much faster build for the ABI change scenario. For the non-ABI scenario, Gradle is faster with more recent optimizations like configuration cache (stable in Gradle 8.1) and persistent compiler daemons (enabled by default in Gradle 8.3).

Be sure to upgrade soon to take advantage of these improvements and more, and continue Building Happiness!

Thanks to Tom Tresansky, who co-authored this blog post, and Stepan Goncharov, who provided key insights about Bazel.

Notes #

Assuming a non-JPMS build, which may affect certain details. ↩

An In-depth Look at Gradle's Approach to Faster Compilation

Table of Contents

Introduction

Definitions #

What is an ABI? #

What is a JAR file? #

What is a header JAR? #

What is compilation avoidance? #

Comparing the approaches #

How do build systems that use header JARs do compilation avoidance? #

How does Gradle do compilation avoidance without header JARs? #

Process Details #

Performance comparison #

Project structure #

Results #

Performance improvements in Gradle 8.1 #

Further performance improvements in Gradle 8.3 #

Conclusion #

Notes #

Discuss

An In-depth Look at Gradle's Approach to Faster Compilation

Table of Contents

Introduction

Definitions #

What is an ABI? #

What is a JAR file? #

What is a header JAR? #

What is compilation avoidance? #

Comparing the approaches #

How do build systems that use header JARs do compilation avoidance? #

How does Gradle do compilation avoidance without header JARs? #

Process Details #

Performance comparison #

Project structure #

Results #

Performance improvements in Gradle 8.1 #

Further performance improvements in Gradle 8.3 #

Conclusion #

Notes #

Discuss

Related Posts