Monday, June 20, 2011

Non-Determinism and Testing

Refactoring: Improving the Design of Existing CodeAt the keynote address of the 2011 Agile Development Practices West, Martin Fowler gave a talk on non- deterministic tests. These are tests that when you run them they pass, but if you run them again (without changing any code), they fail. They are also referred to as flaky tests or intermittent failures. The tests seem to pass or fail in a random fashion.

Martin asserts that non-deterministic tests are useless. The whole point of a regression suite is to be a bug detection mechanism. Its value is that you get immediate feedback when you make a mistake. The time between making a mistake and realizing that you did is very short. And because it is short, you can quickly figure out what you did wrong and fix it. With non-deterministic tests you get unreliable information.

Martin argues that non-deterministic tests are worse that useless. They are very dangerous. They are like an infection that infects the entire suite. One failing test will mark the entire suite as red. Usually we will dig in to see which test failed and fix it, but once we start seeing red regularly, we will assume the failure is due to the non-deterministic test and ignore the failure of the entire suite. This makes the entire suite flaky and the entire test suite becomes useless.

Martin purposes setting up a quarantine area for flaky tests. If we pull the tests from the suite, then the suite will remain reliable. The number of tests in quarantine need to be limited and fixed as soon as possible. Martin recommends setting some limits like no more than 3 tests can be in quarantine or a test in quarantine should be fixed within a week.

Martin next describes some causes for non-deterministic tests.

1. Lack of isolation: Tests depend on the order that are run. These are very hard to debug because when a test fails because of the order it ran in the suite, you usually re-run in isolation and then it passes. The real cause of the problem is in another test which is green and very hard to locate. There are two possible solutions:
  • Track dependencies: Track the order in which the tests need to run. This is very hard to manage and maintain.
  • Isolation: Martin prefers this solution. You can apply a clean-up strategy. That is, make sure there is a tear down that destroys anything you have created. An even better approach is to use a blank slate. Always start with a clean data set. The disadvantage here is that setting up the blank slate can be time consuming.
2. Asynchrony: Testing asynchrony is difficult. A common approach is to use what Martin calls a bare sleep where we introduce a sleep statement.


If we sleep too long, the test will run very slowly. If we reduce the sleep, then when we move to a different machine or if the machine is overloaded, then the tests will fail, so coming up with the correct number to sleep is very hard. Martin warns us against using this approach and recommends another solution using polling:

someFunction + timeout
while(!answer.recieved) {
  if ( > waitLimit) {
    throw new TestTimeoutException
  sleep (pollingInterval)

Here we have the sleep inside the polling and a timeout interval. You can keep the sleep interval very low and adjust the timeout. You have two times to play with (global constants).

Martin also discusses another approach using callbacks

callback = function
{ assert(answer) }
asserts.add(callback, timeout)

The callback method itself contains the verification. This solution depends on the testing framework that you are using.

3. Interaction with remote services: There are many things that can go wrong when dealing with a remote service that have nothing to do with your code (network down or slow, limited availability, unstable test data).

xUnit Test Patterns: Refactoring Test CodeMartin recommends using a a test double. This way, you control the data the service has access to, you can control changes, and ensure fast connections. However, some don’t believe this is testing the real service because we cannot keep the double consistent with the remote service. To solve this problem, Martin recommends that we add an integration contract test that does not run as part of the main build. This test will probably run nightly against the actual service and ensures that the signature of the double is in synch with the actual service.

Martin concludes by warning us that flaky tests are a common problem that can become very dangerous if not addressed immediately.

This presentation is available on youtube at

Martin also discusses non-deterministic tests on his blog