Randomness in Tests

By Eric — 4 minute read

Wouldn't it be cool if a few uses of the random module in your unit tests could discover bugs in your code? Meh.

The problem is that random tests are non-deterministic, meaning they lack the highly desirable property that if they pass in my local development environment, they'll also pass on the build server.

But... you might find bugs! Practically for free!

Martin Fowler went so far as to say that non-deterministic tests are:

  1. "useless" and
  2. "a virulent infection that can completely ruin your entire test suite"

Accidentally non-deterministic

Let's say you have a few tests that just fail every now and then. Maybe they're integration tests that are impacted by the load on some server. It would be great to better isolate the code under test so that doesn't happen, but that just hasn't hit the top of the priority list.

Another example are tests that rely on unspecified behavior that happens to work a particular way most of the time. I use the SQLAlchemy first() method on a query result often in tests, but I sometimes forget that "first" means "any random row" if there isn't some kind of ORDER BY in the query. I'll write the test and it passes multiple times so I think it's good. Then one out of 50 runs fails because the database server, perfectly acceptably, returns rows in a different order.

Tests like these train the team to just mash the Rebuild button on the build server until the tests pass. What if there was a real bug in there? Curiosity is overcome by the need to get a build, so it would probably go unnoticed.

Deliberately non-deterministic

While accidentally non-deterministic tests are unfortunate, deliberate randomness is annoying. I saw a test recently that randomly picked one of a half-dozen enumeration values for any run. It failed once. Huh, that's weird. Run again, it passed. Run it three more times and I can't get it to fail. OK, whatever.

There was an actual bug there -- one of the enum values had the wrong numerical value associated with it, and the test caught it, but so rarely that it just seemed like some kind of fluke. It probably would have been better to run the test with all of the enum values.

Wait a minute...

So no randomness at all in tests?

Well, sometimes it is really convenient to use randomness. For example, if you have database models with UUIDs as primary keys, it gets awkward if you aren't allowed to generate random IDs for test objects.

Or maybe you're using something like Factory Boy. That library implements the "Object Mother" pattern, wherein a factory creates objects with as many default values as possible, which is convenient for testing.

For example, let's say you want to create a few User objects, and a user has a bunch of required fields, like first name, last name, email, phone number, address, birth date, etc. With Factory Boy, I can just say:

user1 = UserFactory.create()
user2 = UserFactory.create()
user3 = UserFactory.create()

Each created user gets default values for each field, though I could pass in an explicit value for any of them, should that be relevant to the test. They're also unique if needed, like if it is invalid for two users to have the same email. This is really nice because it doesn't obscure the logic of the test with tons of irrelevant setup -- if I don't care what a specific field value is, it just gets filled in.

Even so, you could meet these requirements without randomness. If you don't care what a user's email address is, just that it is unique, a counter (addr1@example.com, addr2@example.com, etc.) works just as well as "realistic" data randomly generated by something like Faker.

So here are some guidelines for randomness:

  1. Random values are for convenience and test clarity rather than the hope that they'll expose a bug.
  2. You have to be able to reproduce the random values later on the off chance that a random value causes a test to fail. This could be as simple as logging all the random values used or using a fixed seed.
  3. If a random value causes a test failure, add an explicit test with that value, and make it pass.

What about fuzz testing?

OK, fuzz testing is inextricably tied to randomness, and can be a valuable quality and security tool. But I think it is important to separate functional and fuzz testing. The functional tests that would make up the commit phase of your build pipeline should be fast and deterministic. A later fuzz phase can take longer and, until a very high level of maturity is achieved, be run with the expectation that some human analysis will be required. Even for fuzz testing the second two guidelines above still apply.