Introducing flaky test mitigation tools

This post introduces a new Gradle plugin and build scans improvements aimed at mitigating your flaky tests.

Flaky tests disrupt software development cycles by blocking CI pipelines and causing unnecessary failure investigations. Unhealthy teams live by re-running builds, sometimes several times, to get changes through. Martin Fowler has pointed words about non-deterministic tests that are worth a read.

To eliminate this parasite from your organization you have to identify, prioritize, and fix your flaky tests.

Mitigating flaky tests

There are a number of clever heuristics that help identify flaky tests. You could run some static analysis to prove that a test failure could theoretically not cause a given test failure. You could count the number of “flips” from failed to passed and some threshold that over which a test is considered flaky.

These heuristics work, usually, but they are really, really hard to get right. A big problem arises when your flaky test detection methodology is itself flaky — people do not trust the system and they resolve to rerun-and-suffer mode until they get the result they want.

Re-running tests in the same execution environment is a way of identifying a flaky test beyond reasonable doubt. It’s no wonder so many teams and libraries incorporate this simple strategy. It’s even been built directly into Maven Surefire and Failsafe.

This is why we’ve developed the Test Retry Gradle Plugin that retries failed tests for the purposes of mitigating test flakiness.

New Test Retry Gradle Plugin

You can use this Gradle configuration to retry tests and optionally fail the build on flakiness:

plugins {
    id 'org.gradle.test-retry' version '1.0.0'
}

test {
    retry {
        failOnPassedAfterRetry = true
        maxFailures = 42
        maxRetries = 1
    }
}
plugins {
    id("org.gradle.test-retry") version "1.0.0"
}

tasks.test {
    retry {
        failOnPassedAfterRetry.set(true)
        maxFailures.set(42)
        maxRetries.set(1)
    }
}

There are 4 especially neat aspects to this plugin:

  1. No test source changes are required. This allows proactive detection of new flaky tests!
  2. You can control whether your build fails or passes when flakiness is encountered using failOnPassedAfterRetry. This means that you can adopt this plugin to detect flaky tests without silencing them.
  3. You can prevent retrying in a test run after a discrete number of tests fail using maxFailures. If your build encounters many failures, it’s likely there is a major problem causing many tests to fail and retrying is a waste of resources.
  4. Tests are retried at method-level or finer where possible — no rerunning whole classes of tests!

Supported environments of Gradle Test Retry Plugin

The Test Retry Plugin supports Gradle 5.0 and later out-of-the-box with the following test frameworks:

  • JUnit4
  • JUnit Platform (JUnit 5)
  • Spock
  • TestNG

In some cases, passing upstream tests or downstream tests (when using @Test(dependsOn = {}), for example) must be re-executed by the plugin to ensure correctness where a flaky test depends on state from tests it depends on.

For more information about supported frameworks and retry mechanics, please see the Test Retry Gradle Plugin docs.

How flaky tests are reported

We chose to report all discrete executions of flaky tests to maximize compatibility with existing test reports and IDEs. Therefore a flaky test report might look like this:

JUnit report with multiple test executions

We are working to make flaky a first-class test outcome in more tools you use daily to clarify reporting. The docs have more details about how flaky tests are reported in logs, test reports, and popular IDEs.

In good news, build scans already report flaky tests in a clear and accurate way!

Flaky tests support for build scans

Gradle and Maven build scans now report a test as flaky when it is executed multiple times in the same build with both a PASSED and FAILED result, in any order. This allows build scans to report a flaky test using in-build retry mechanisms such as Maven’s rerun failing tests options such as -Dsurefire.rerunFailingTestsCount=2 or your custom retry mechanism.

Build scan with flaky tests

Identifying and silencing flaky tests only treats symptoms. There are likely some real concurrency or performance issues that cause suffering for your customers — you must analyze flaky test executions to fix them. You can read more in my blog post about analyzing flaky tests in Maven and Gradle builds.

Conclusion

We hope the new Test Retry Gradle Plugin and new flaky test analysis features help you eradicate flaky tests.

Let us know what you think here on on Twitter.