Incremental Compilation, the Java Library Plugin, and other performance features in Gradle 3.4

Table of Contents

Introduction

We are very proud to announce that the newly released Gradle 3.4 has significantly improved support for building Java applications, for all kind of users. This post explains in details what we fixed, improved and added. We will in particular focus on:

  • Extremely fast incremental builds
  • The end of the dreaded compile classpath leakage

The improvements we made can dramatically improve your build times. Here’s what we measured:

The benchmarks are public, and you can try them out yourself and are synthetic projects representing real world issues reported by our consumers. In particular, what matters in a continuous development process is being incremental (making a small change should never result in a long build):

For those who work on a single project with lots of sources:

  • changing a single file, in a big monolithic project and recompiling
  • changing a single file, in a medium-sized monolithic project and recompiling

For multi-project builds:

  • making a change in an ABI-compatible way (change the body of a method, for example, but not method signatures) in a subproject, and recompiling
  • making a change in an ABI-incompatible way (change a public method signature, for example) in a subproject, and recompiling

For all those scenarios, Gradle 3.4 is much faster. Let’s see how we did this.

Compile avoidance for all #

One of the greatest changes in Gradle 3.4 regarding Java support just comes for free: upgrade to Gradle 3.4 and benefit from compile avoidance. Compile avoidance is different from incremental compilation, which we will cover later. So what does it mean? It’s actually very simple. Imagine that your project app depends on project core, which itself depends on project utils:

In app:

public class Main {
   public static void main(String... args) {
        WordCount wc = new WordCount();
        wc.collect(new File(args[0]);
        System.out.println("Word count: " + wc.wordCount());
   }
}

In core:

public class WordCount {  // WordCount lives in project `core`
   // ...
   void collect(File source) {
       IOUtils.eachLine(source, WordCount::collectLine);
   }
}

In utils:

public class IOUtils { // IOUtils lives in project `utils`
    void eachLine(File file, Callable<String> action) {
        try {
            try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
                // ...
            }
        } catch (IOException e) {
            // ...
        }
    }
}

Then, change the implementation of IOUtils. For example, change the body of eachLine to introduce the expected charset:

public class IOUtils { // IOUtils lives in project `utils`
    void eachLine(File file, Callable<String> action) {
        try {
            try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "utf-8") )) {
                // ...
            }
        } catch (IOException e) {
            // ...
        }
    }
}

Now rebuild app. What happens? Until now, utils had to be recompiled, but then it also triggered the recompilation of core and eventually app, because of the dependency chain. It sounds reasonable at first glance, but is it really?

What changed in IOUtils is purely an internal detail. The implementation of eachLine changed, but its public API didn’t. Any class file previously compiled against IOUtils is still valid. Gradle is now smart enough to realize that. This means that if you make such a change, Gradle will only recompile utils, and nothing else! And while this example may sound simple, it’s actually a very common pattern: typically, a core project is shared by many subprojects, and each subproject has dependencies on different subprojects. A change to core would trigger a recompilation of all projects. With Gradle 3.4 this will no longer be the case, meaning that it recognizes ABI (Application Binary Interface) breaking changes, and will trigger recompilation only in that case.

This is what we call compilation avoidance. But even in the case when the compilation can not be avoided, Gradle 3.4 will make things much faster with the help of incremental compile.

Improved incremental compilation #

For years, Gradle has supported an experimental incremental compiler for Java. In Gradle 3.4, not only is this compiler stable, but we also have significantly improved both its robustness and performance! Use it now: we’re going to make it the default soon! To enable Java incremental compilation, all you need to do is to set it on the compile options:

tasks.withType(JavaCompile) {
   options.incremental = true // one flag, and things will get MUCH faster
}

If we add the following class in project core:

public class NGrams {  // NGrams lives in project `core`
   // ...
   void collect(String source, int ngramLength) {
       collectInternal(StringUtils.sanitize(source), ngramLength);
   }
   // ...
}

and this class in project utils:

public class StringUtils {
   static String sanitize(String dirtyString) { ... }
}

Imagine that we change the class StringUtils and recompile our project. You can easily see that we only need to recompile StringUtils and NGrams but not WordCount. NGrams is a dependent class of StringUtils. WordCount doesn’t use StringUtils, so why would it need to be recompiled? This is what the incremental compiler does: it analyzes the dependencies between classes, and only recompiles a class when it has changed, or one of the classes it depends on has changed.

Those of you who have already tried the incremental Java compiler before may have seen that it wasn’t very smart when a changed class contained a constant. For example, this class contains a constant:

public class SomeClass {
    public static final int MAGIC_NUMBER = 123;
}

If this class was changed, then Gradle gave up and recompiled not just all the classes of that project but also all the classes in projects that depend on that project. If you wonder why, you have to understand that the Java compiler inlines constants like this. So when we analyze the result of compilation, and that the bytecode of a class contains the literal 123, we have no idea where the literal was defined. It could be in the class itself, or a constant of any dependency found anywhere on its classpath. In Gradle 3.4, we made that behavior much smarter, and only recompile classes which could potentially be affected by the change. In other words, if the class is changed, but the constant is not, we don’t need to recompile. Similarly, if the constant is changed, but that the dependents didn’t have a literal in their bytecode of the old value, we don’t need to recompile them: we would only recompile the classes that have candidate literals. This also means that not all constants are born equal: a constant value of 0 is much more likely to trigger a full recompilation when changed, than a constant value 188847774

Our incremental compiler is also now backed with in-memory caches that live in the Gradle daemon across builds, and thus make it significantly faster than it used to be: extracting the ABI of a Java class is an expensive operation that used to be cached, but on disk only.

If you combine all those incremental compilation improvements with the compile avoidance that we described earlier in this post, Gradle is now really fast when recompiling Java code. Even better, it also works for external dependencies. Imagine that you upgrade from foo-1.0.0 to foo-1.0.1. If the only difference between the two versions of the library is, for example, a bugfix, and that the API hasn’t changed, compile avoidance will kick in and this change in an external dependency will not trigger a recompile of your code. If the new version of the external dependency has a modified public API, Gradle’s incremental compiler will analyze the dependencies of your project on individual classes of the external dependency, and only recompile where necessary.

About annotation processors #

Annotation processors are a very powerful mechanism that allows generation of code just by annotating sources. Typical use cases include dependency injection (Dagger) or boilerplate code reduction (Lombok, Autovalue, Butterknife, …). However, using annotation processors can have a very negative impact on the performance of your builds.

What does an annotation processor do? #

Basically, an annotation processor is a Java compiler plugin. It is triggered whenever the Java compiler recognizes an annotation that is handled by a processor. From the build tool point of view, it’s a black box: we don’t know what it’s going to do, in particular what files it’s going to generate, and where.

Therefore whenever the annotation processor implementation changes, Gradle needs to recompile everything. That is not that bad by itself, as this probably doesn’t happen very often. But for reasons explained soon things are much worse and Gradle has to disable compile avoidance when annotation processors are not declared explicitly. But first let’s understand what’s going on. Typically today annotation processors are added to the compile classpath.

While Gradle can detect which jar contains annotation processors, what it cannot detect is which other jars in the compile classpath are used by the annotation processor implementation. They also have dependencies. That means potentially any change in the compile classpath may affect the behavior of the annotation processor in a way Gradle can not understand. Therefore any change in the compile classpath will trigger a full recompile and we are back to square one.

But there is a solution to this.

Explicitly declaring the annotation processor classpath #

Should the fact that an annotation processor, which is a compiler plugin that uses external dependencies, influence your compile classpath? No, the dependencies of the annotation processor should never leak into your compile classpath. That’s why javac has a specific -processorpath option which is distinct from -classpath. Here is how you can declare this with Gradle:

configurations {
    apt
}
dependencies {
    // The dagger compiler and its transitive dependencies will only be found on annotation processing classpath
    apt 'com.google.dagger:dagger-compiler:2.8'

    // And we still need the Dagger annotations on the compile classpath itself
    compileOnly 'com.google.dagger:dagger:2.8'
}

compileJava {
    options.annotationProcessorPath = configurations.apt
}

Here, we’re creating a configuration, apt, that will contain all the annotation processors we use, and therefore also their specific transitive dependencies. Then we set the annotationProcessorPath to this configuration. What this enables is two-fold:

  • it disables automatic annotation processor detection on the compile classpath, making the task start faster (faster up-to-date checks)
  • it will make use of the processorpath option of the Java compiler, and properly separate compile dependencies from the annotation processing path
  • it will enable compile avoidance : by explicitly saying that you use annotation processors, we can now make sure that everything that is found on classpath is only binary interfaces

In particular, you will notice how Dagger cleanly separates its compiler from its annotations: we have dagger-compiler as an annotation processing dependency, and dagger (the annotations themselves) as compile dependencies. For Lombok, you would typically have to put the same dependency both in compile and apt to benefit from compile avoidance again.

However, some annotation processors do not separate these concerns properly and thus leak their implementation classes onto your classpath. Compile avoidance still works in this scenario: you need just put the jar on both the apt and compileOnly configurations.

Incremental compile with annotation processors #

As said above, with annotation processors, Gradle does not know which files they are going to generate. Neither does it know where and based on what conditions. Therefore Grade disables the Java incremental compiler if annotation processors are in use, even if you declare them explicitly as we just have done. It is however possible to limit the impact of this to the set of classes that really use annotation processors. In short, you can declare a different source set, with a different compile task, that will use the annotation processor, and leave the other compile tasks without any kind of annotation processing: any change to a class that doesn’t use annotation processors would therefore benefit from incremental compilation, whereas any change to the sources that use annotations would trigger a full recompilation, but of that source set only. Here’s an example how to do it:

configurations {
    apt
    aptCompile
}
dependencies {
    apt 'com.google.dagger:dagger-compiler:2.8'
    aptCompile 'com.google.dagger:dagger:2.8'
}

sourceSets {
   processed {
       java {
          compileClasspath += configurations.aptCompile
       }
   }
   main {
       java {
          compileClasspath += processed.output
       }
   }
}

compileProcessedJava {
    options.annotationProcessorPath = configurations.apt
}

In practice this may not be an easy split to perform, dependending on how much the main sources depend on classes found in the processed classes. We are, however, exploring options to enable incremental compilation when annotation processors are present, which means that this shouldn’t be an issue in the future.

Java libraries #

We at Gradle have been explaining for a long time why the Maven dependency model is broken, but it’s often hard to realize without a concrete example, because users just get used to the defect and deal with it as if it was natural. In particular, the pom.xml file is used both for building a component and for its publication metadata. Gradle has always worked differently, by having build scripts which are the “recipe” to build a component, and publications, which can be done to Maven, Ivy, or whatever other repositories you need to support. The publication contains metadata about how to consume the project, meaning that we clearly separate what you need to build a component from what you need as its consumer. Separating the two roles is extremely important, and it allows Gradle 3.4 to add a fundamental improvement to Java dependency management. There are multiple benefits you get with this new feature. One is better performance, as it complements the other performance features we have described above, but there are more.

We’ve all been doing it wrong #

When building a Java project, there are two things being considered:

  • what do I need to compile the project itself?
  • what do I need at runtime to execute the project?

Which drives us naturally to declaring dependencies in two distinct scopes:

  • compile : the dependencies I need to compile the project
  • runtime : the dependencies I need to run the project

Maven and Gradle have both been using this for years. But since the beginning, we knew we were wrong. This view is over simplistic, because it doesn’t consider the consumers of your project. In particular, there are (at least) two kinds of projects in the Java world:

  • applications, which are standalone, executable, and don’t expose any API
  • libraries, which are used by other libraries, or other applications, as bricks to build software, and therefore expose an API

The problem with the simplistic approach of having two configurations (Gradle) or scopes (Maven) is that you don’t consider what is required in your API versus what is required by your implementation. In other words, you are leaking the compile dependencies of your component to downstream consumers.

Imagine that we are building an IoT application home-automation which depends on a heat-sensor library that has commons-math3.jar and guava.jar on its compile classpath. Then the compile classpath of home-automation will include commons-math3.jar and guava.jar. There are several consequences to this:

  • home-automation may start using classes from commons-math3.jar or guava.jar without really realizing they are transitive dependencies of heat-sensor (transitive dependency leakage).
  • the compile classpath of home-automation is bigger:
    • this increases the time spend on dependency resolution, up-to-date checking, classpath analysis and javac.
    • the new Gradle compile avoidance will be less efficient because changes in the classpath are more likely to happen and compile avoidance will not kick in. Specially, when you are using annotation processors where Gradle incremental compile is disabled, this comes with a high cost.
  • you are increasing the chances of dependency hell (different versions of the same dependency on classpath)

But the worst issue is that if the usage of guava.jar is a purely internal detail for heat-sensor, and that home-automation starts using it because it was found on classpath, then it becomes very hard to evolve heat-sensor because it would break consumers. The leakage of dependencies is a dreaded issue that leads to slowly evolving software and feature freeze, for the sake of backwards compatibility.

We know we’ve been doing this wrong, it’s time to fix it, and introduce the new Java Library plugin!

Introducing the Java Library plugin #

Starting from Gradle 3.4, if you build a Java library, that is to say a component aimed at being consumed by other components (a component that is a dependency of another), then you should use the new Java Library plugin. Instead of writing:

apply plugin: 'java'

use:

apply plugin: 'java-library'

They both share a common infrastructure, but the java-library plugin exposes the concept of an API. Let’s migrate our heat-sensor library, which itself has 2 dependencies:

dependencies {
   compile 'org.apache.commons:commons-math3:3.6.1'
   compile 'com.google.guava:guava:21.0'
}

When you study the code in heat-sensor, you understand that commons-math3 is exposed in the public API, while guava is purely internal:

import com.google.common.collect.Lists;
import org.apache.commons.math3.stat.descriptive.SummaryStatistics;

public class HeatSensor {
    public SummaryStatistics getMeasures(int lastHours) {
         List<Measurement> measures = Lists.newArrayList(); // Google Guava is used internally, but doesn't leak into the public API
         // ...
         return stats;
    }
}

It means that if tomorrow, heat-sensor wants to switch from Guava to another collections library, it can do it without any impact to its consumers. But in practice, it’s only possible if we cleanly separate those dependencies into 2 buckets:

dependencies {
   api 'org.apache.commons:commons-math3:3.6.1'
   implementation 'com.google.guava:guava:21.0'
}

The api bucket is used to declare dependencies that should transitively be visible by downstream consumers when they are compiled. The implementation bucket is used to declare dependencies which should not leak into the compile classpath of consumers (because they are purely internal details).

Now, when a consumer of heat-sensor is going to be compiled, it will only find commons-math3.jar on compile classpath, not guava.jar. So if home-automation accidently tries to use a class from Google Guava, it will fail at compile time, and the consumer needs to decide whether it really wants to introduce Guava as a dependency or not. On the other hand, if it tries to use a class from Apache Math3, which is an API dependency, then will succeed, because API dependencies are absolutely required at compile time.

Better POMs than Maven #

So when does implementation matter? It matters at runtime only! This is why, now, the pom.xml file that Gradle generates whenever you choose to publish on a Maven repository is cleaner than what Maven can offer! Let’s look at what we generate for heat-sensor, using the maven-publish plugin:

<?xml version="1.0" encoding="UTF-8"?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.acme</groupId>
  <artifactId>heat-sensor</artifactId>
  <version>1.0.0-SNAPSHOT</version>
  <dependencies>
    <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-math3</artifactId>
      <version>3.6.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>21.0</version>
      <scope>runtime</scope>
    </dependency>
  </dependencies>
</project>

What you see is the pom.xml file that is published, and therefore used by consumers. And what does it say?

  • to compile against heat-sensor, you need commons-math3 on compile classpath
  • to run against heat-sensor, you need guava on runtime classpath

This is very different from having the same pom.xml for both compiling the component and consuming it. Because to compile heat-sensor itself, you would need guava in compile. In short: Gradle generates better POM files than Maven, because it makes the difference between the producer and the consumer.

More uses cases, more configurations #

You might be aware of the compileOnly configuration that was introduced in Gradle 2.12, which can be used to declare dependencies which are only required when compiling a component, but not at runtime (a typical use case is libraries which are embedded into a fat jar or shadowed). The java-library plugin provides a smooth migration path from the java plugin: if you are building an application, you can continue to use the java plugin. Otherwise, if it’s a library, just use the java-library plugin. But in both cases:

  • instead of the compile configuration, you should use implementation instead
  • instead of the runtime configuration, you should use runtimeOnly configuration to declare dependencies which should only be visible at runtime
  • to resolve the runtime of a component, use runtimeClasspath instead of runtime.

Impact on performance #

To show you what the impact on performance can be, we added a benchmark which compares two scenarios:

  • making an ABI-compatible change in a library, then recompile
  • making an ABI-incompatible change in a library, then recompile

Only Gradle 3.4 supports the concept of library, and therefore uses the Java Library Plugin. And to make it even clearer, this benchmark does not use the incremental compiler (which would make things even faster, updates would almost be a no-op):

As you can see, in addition to better modelling, there’s a strong impact on performance!

Conclusion #

Gradle 3.4 brings dramatic improvements to the Java ecosystem. Better incremental compilation and compile avoidance will significantly improve your productivity, while clean separation of API and implementation dependencies will avoid accidental leakage of dependencies and help you better model your software. Note that we have more goodness to come. In particular, separation of API and implementation is key to Java 9 success, with the awakening of Project Jigsaw. We’re going to add a way to declare what packages belong to your API, making it even closer to what Jigsaw will offer, but supported on older JDKs too.

In addition, Gradle 4.0 will ship with a build cache, which will strongly benefit from the improvements described in this post: it’s a mechanism which allows reusing, and sharing, the result of execution of tasks on a local machine or over the network. Typical use cases include switching branches, or simply checking out a project which has already been built by a colleague or on CI. Said differently, if you, or someone else, has already built something you need, you would get it from the cache instead of having to build it locally. For this, the build cache needs to generate a cache key which is, for java compile task, typically sensitive to the compile classpath. The improvements that ship in 3.4 will make this cache key more likely to be hit, because we would ignore what is not relevant to consumers (only ABI matters).

We encourage you to upgrade now, take a look at the documentation of the new Java Library plugin and discover all it can do for you!

Discuss