Use Test Repetitions Mode in Xcode to Diagnose and Fix Flaky Tests

Use Test Repetitions Mode in Xcode to Diagnose and Fix Flaky Tests
Photo by Evan Smogor / Unsplash

If you've written unit tests in the past, there is a fairly good chance that you came across the flaky tests. Flaky tests are the ones whose success or failure is unreliable. They might pass today and may fail for some time before reverting back to pass status. They add extra overhead, time-waste in debugging, and cause negative productivity where the time spent to deal with them outweighs the benefits they offer.

There are many reasons why tests might be flaky,

  1. Inappropriate creation of mock dates where if time zone or daylight saving setting changes, it will result in failure
  2. Incompatibility with underlying platform or device on which test is running
  3. Tests touching and modifying global states
  4. Inconsistent network connectivity
  5. Threading issues
  6. Tests causing side-effects for other tests

Xcode offers a few tools to at least make these failures more visible during the development phase,

  1. Execute in Parallel - Runs all tests in parallel. If tests depend on each other or multiple tests access the same datastore in a thread-unsafe manner, this mode allows to catch them
  2. Randomize execution order - If the test failure or success depends on the fixed execution order, randomizing execution order helps us to catch these edge-cases faster

Please note that the Test Repetition feature is supported only in Xcode 13. If you are using an older Xcode version, please upgrade before trying this tutorial out

Xcode 13 Provides Test Repetition Modes

Even though we know a test is flaky, it's difficult to repeat the failure. Especially when it only occurs on a build machine and fails to reproduce when run locally. One solution is to manually run the test repeatedly and see if you can replicate the failure.

But this is time-consuming. If there are many tests, you end up spending a significant amount of time trying to repeatedly run and see if those runs can replicate the failures seen earlier.

Xcode 13 fixes this problem by introducing test repetition modes. There are three types of test repetition modes currently supported in Xcode.

  1. Fixed Iterations

In this mode, the test is run a fixed number of times irrespective of whether the test fails or passes during those iterations. It allows you to investigate what percentage of time the test fails for a given number of fixed runs and allows you to record the test reliability.

Number of iterations - 7

Pass -> Fail -> Fail -> Pass -> Fail -> Pass -> Pass

2. Until Failure

In this mode, the test is run until it fails. You can use this mode until the test fails. Then you can go back, run the test with the same configuration it last ran, and analyze the causes of failures by inspecting parameters used by the test.

Pass -> Pass -> Pass -> Pass -> Pass -> Fail

3. Retry on failure

It's the opposite of Until Failure mode where the test keeps running until it passes. I have seen this behavior many times where the test passes the first time on CI but re-running it results in a pass. You can use these modes to analyze why some tests that initially fail on CI eventually pass.

Fail -> Fail -> Fail -> Fail -> Fail -> Fail -> Fail -> Pass

A Flaky Test

In order to analyze the effect of these new modes on analyzing and catching test failures, we will start with a sample flaky test.

Code:


func findMax(val1: Int, val2: Int) -> Int {
    let randomNumber = Int.random(in: 0...1)
    return randomNumber == 0 ? val1 : val2
}

Test:


func testFindMax() {
    let maxNumber = findMax(val1: 100, val2: 200)
    XCTAssertEqual(maxNumber, 200)
}

It's a simple test with fictitious findMax the function which is responsible for returning the max value among pair of numbers passed to it. To make it flaky, I have introduced a random number generator which, based on whether the value generated by it is 0 or 1, will return either the first or the second value.

Creating a Test Plan

In order to take advantage of new modes, we will enclose our tests in a test plan. Here's how you can create a test plan for existing tests.

Click on a target and choose Edit Scheme

In the next dialogue, choose Test in the left-hand side menu and tap + button in the bottom-middle to create a new test plan.

Add an existing Test plan if you already have one. Since I don't have an existing test plan, I will choose "Create empty Test Plan" option.

Give an appropriate name to your test plan in the next window and click Save

Your empty Test plan is ready for the next steps.

Adding Tests to Test Plan

In order to add existing tests to the Test plan, select the Test Plan in the left-side pane and click + button at the bottom to add tests to it. It will show a dialogue like this,

Now select and add the first unit test target to this test plan and you're all set for the next steps.

Configuring Test Plan to Run Tests Repeatedly

Next, we will configure the test plan to run tests repeatedly. Select the Test plan, and go to configurations. Here, you will see an option to configure test repetition mode.

In this case, I am repeating it for a maximum number of set iterations.

After running the test suite, I go to the Project Navigator, click on the generated report and analyze the result,

If I run it with the same configuration, it produces different results,

Which is a clear indication of a flaky test.

Debugging a Flaky Test

Now that we know which test is flaky, we can only concentrate on the flaky test without running the whole test suite.

Go to your test file, right-click on the diamond in the left-hand gutter and choose Run <your_function_name> Repeatedly option.

You can configure additional options on the next screen.

Since we're debugging the cause of failure, we will select Pause on Failure so that debugger can pause as soon as the test fails and we can analyze what caused the failure.

After running for some time, the test failed as expected and we can see the debugger has paused at the test failure location.

Now that we know the test failed and maxNumber is 100, which is not what we expected, we can debug further into the findMax function to see what's going on.


func findMax(val1: Int, val2: Int) -> Int {
    let randomNumber = Int.random(in: 0...1)
    return randomNumber == 0 ? val1 : val2
}

Ahhh! Someone wrote a clowny findMax function where they are returning max based on the random number evaluation. We will fix it by using it the right way.


func findMax(val1: Int, val2: Int) -> Int {
    return val1 > val2 ? val1 : val2
}

Now that we fixed it, let's run it again repeatedly and see the result,

We ran the test 10 times and as Test Report shows, all 10 runs have passed which is a testament that we fixed the flaky test and it's passing consistently now.

Passing Test Repetition Modes From CLI (Command-line interface)

UI is not the only way to configure test repetition modes. If you're going to use repetition modes on CI, you need to pass them from the command-line interface.

While running the xcodebuild in test mode, pass the following parameters to customize test repetition modes,

xcodebuild test —project <project_name>.xcodeproj —scheme <scheme_name> —destination platform=i0S Simulator,name=iPhone 12,0S=15.0'—test—iterations 100 —run—tests—until—failure

In all, there are three possible test repetition flags that can be used to configure the test run,


-test-iterations <number> 
-retry-tests-on-failure 
-run-tests-until-failure

Summary

Flaky tests are one of the major reasons for causing headaches for software developers. iOS ecosystem where code might be running on the main thread or countless other background threads, the chances of flaky tests dramatically increase.

With the introduction of these new APIs, developers can easily automate running failing tests any number of times until one of the conditions is met. This added automation will result in a stable test system and a significant amount of time saved for each developer. One of the reasons why people disable or skip flaky tests is, they never get a chance or have enough time to diagnose or reaching to the point where they can reliably reproduce the test's failure. In my own experience, it's easier said than done.

The job of fixing flaky tests eventually falls on the team that is responsible for it, but with the introduction of Apple's new APIs, it's easier to diagnose and analyze the failure reason. This is an important step before developers can dig into fixing the test failure.

I am extremely enthusiastic about these new APIs and will be using it for my existing project where many tests are disabled due to flakiness. What are your thoughts about it? How is your experience with flaky tests? Are there existing tools do you utilize to analyze and fix flaky tests? Are there any examples of flaky tests that you will be excited to share with the community? I would love you to hear your thoughts on these questions and get feedback on this article. You can reach out to me anytime on Twitter @jayeshkawli.

Thanks for reading, and see you in the next blog post.

References:

Diagnose unreliable code with test repetitions - WWDC21 - Videos - Apple Developer
Test repetitions can help you debug even the most unreliable code. Discover how you can use the maximum repetitions, until failure, and...