A better way to reason about software testing terms

I was recently in a discussion between developers about improving the test coverage of a major software project. They needed guidance about what kind of tests to write, and how to write them.

The discussion quickly became confusing: the topic was too large, too general, and it turns out that there wasn’t even a well-defined shared vocabulary for testing concepts! The latter turns out to be a wider problem in the software development community: it simply doesn’t have well-defined testing terms!

In this post, I’d like to provide some guidance w.r.t. this matter. I’ll discuss:

An overview of the most common testing terms.
A new way of reasoning about testing concepts: reiterating what actually matters, and categorizing tests based on “size” and “approach”.
How the existing testing terminology fits in this new model.

Towards a well-defined testing terminology

Most literature on software testing focus on terms like “unit test”, “component test”, “integration test”, etc. I think this focus is a mistake because these terms are not well-defined. Their definitions depend on context and subjective matters.

For example: what’s a unit? How is it different from components? Can there be multiple levels of units and components? (hint: yes) What if you test two components together but the interaction with the second one is trivial, does that still count as an integration test?

The vagueness distracts us from the things that matter:

The goals of the tests. What do we want to test, and why?
The technical quality of the tests. Are the tests stable (absence of random failures)? Are the tests fast?

So I take a different approach, one that focuses on the tests’ behavior, runtime characteristics, and whether the goal is to build the thing right or to build the right thing. These concepts are much more objective and thus are easier to reason about.

Test sizes & the test pyramid

I put tests in one of three broad categories: small, medium and large. These are organized in a “test pyramid”, like so:

How do you measure a test’s size? By looking at its behavior:

Feature	Small	Medium	Large
Network access	✗	localhost only	✔︎
Database access	✗	✔︎	✔︎
File system access	✗	✔︎	✔︎
External systems access	✗	discouraged	✔︎
Concurrency	✗	✔︎	✔︎
Sleep statements	✗	✔︎	✔︎
Time limit	60s	300s	> 900s

The test pyramid is well-known concept from testing literature, described by e.g. ThoughtWorks’ Practical Test Pyramid. But whereas they organize the pyramid based on fuzzy concepts such as “unit tests”, “service tests” and “UI tests”, I organize the pyramid based on size.

The idea of organizing based on size came from Google. See Simon Stewart, Mike Bland and James A. Whittaker.

Why a test pyramid?

By combining the test pyramid concept with the idea of organizing tests based on size, we gain a powerful insight: the reason why the pyramid recommends having predominantly small tests.

Sure, the pyramid states that small tests are usually faster and more stable, but that in itself is a vague statement. The size definition table explains the reason behind that. Things like network access, concurrency, etc. are inherently brittle because they introduce many more failure modes. They also make tests slower because most of those things introduce additional overhead compared to simple local computing.

The brittleness of larger tests is the biggest problem. They increase the amount of false positives – test suite failures that don’t indicate an actual problem in your software, but rather a problem in the platform or an external component. When you are focused on delivering, the last thing you need is more distractions.

Test approaches

Tests can also be categorized as follows. This categorization is orthogonal to the size dimension.

TDD approach (test-driven development): to build the thing right.
BDD approach (behavior-driven development): to build the right thing.

The TDD approach is the one usually taken by developers. You write code and tests as part of your development process, and you take the practice of writing tests and writing testible code seriously.

But just doing TDD doesn’t necessarily ensure that you’re building the thing that stakeholders asked for. That’s where BDD comes in:

BDD focuses on domain knowledge and stakeholder collaboration.
BDD focuses on testing whether software behaves according to specification. Often times, the test names correspond to items in the requirements document.

TDD is generally related to technical tests such as “test whether the bubblesort algorithm works”, while BDD is generally related to stakeholder/requirements-oriented tests such as “if a bank account withdrawal happens in a foreign country, then the transaction should be flagged for fraud investigation”.

The idea of categorizing tests this way came from Devon Buriss’ blog post on the anatomy of test suites. He also has a post on how to write good acceptance tests.

Relationship to existing software testing terms

In order to deepen our understanding, let’s review the existing software testing terms and see how they relate to the “test size” and “approach” concepts I described above.

Term	Usual size	Approach	Description
Unit test	Small Maybe medium	TDD	Tests a small, fundamental part of a program: a class, a function, a logically related set of functions, etc.
Component test Module test	Any	Any	Probably the most useless term. A "component" or "module" could mean anything within the context of one program. Could be synonymous with unit test, service test or program test.
Service test Program test	Any	Any	Tests a program's functionality and behavior as a whole. This could be a small test (e.g. spawning the program with some arguments and checking its output). It could be a large test (e.g. an end-to-end, broad-stack UI test).
Broad-stack test Integration test End-to-end test System test	Large	Any	Tests whether the broader system as a whole, e.g. multiple microservices, work well together. You don't necessarily run everything in such a test: maybe you only test one microservice and one immediate dependency service, while mocking everything else. The term "end-to-end" leans more towards testing everything.
Contract test Interface test	Small Medium	TDD	An alternative to broad-stack testing. Instead of testing the program together with external systems, mock external systems away, and test whether the program's communication patterns with external systems are as expected.
UI test	Any	Any	When implemented by clicking around in the UI (e.g. Selenium) then it's usually a large test that touches the whole stack. But it's totally possible (and preferred) to test the UI in isolation, i.e. as a small test (unit test).
Acceptance test Functional test	Any	BDD	Tests a feature, a part of the requirements document.

Unit test

Usual size: small, maybe medium
Approach: TDD

Tests a small, fundamental part of a program: a class, a function, a logically related set of functions, etc.

Component test

Module test

Usual size: any
Approach: any

Probably the most useless term. A "component" or "module" could mean anything within the context of one program. Could be synonymous with unit test, service test or program test.

Service test

Program test

Usual size: any
Approach: any

Tests a program's functionality and behavior as a whole. This could be a small test (e.g. spawning the program with some arguments and checking its output). It could be a large test (e.g. an end-to-end, broad-stack UI test).

Broad-stack test

Integration test

End-to-end test

System test

Usual size: large
Approach: any

Tests whether the broader system as a whole, e.g. multiple microservices, work well together.

You don't necessarily run everything in such a test: maybe you only test one microservice and one immediate dependency service, while mocking everything else.

The term "end-to-end" leans more towards testing everything.

Contract test

Interface test

Usual size: small, medium
Approach: TDD

An alternative to broad-stack testing. Instead of testing the program together with external systems, mock external systems away, and test whether the program's communication patterns with external systems are as expected.

UI test

Usual size: any
Approach: any

When implemented by clicking around in the UI (e.g. Selenium) then it's usually a large test that touches the whole stack. But it's totally possible (and preferred) to test the UI in isolation, i.e. as a small test (unit test).

Acceptance test

Functional test

Usual size: any
Approach: BDD

Tests a feature, a part of the requirements document.

Summary

Existing software testing terms like “unit test”, “component test”, “integration test”, etc. are too ill-defined to allow proper discussions. Thus, this post proposes a new way of reasoning about testing terms. We categorize tests along two dimensions:

The size dimension measures the technical quality of the tests: their potential to be stable and fast.
The approach dimension represents (a part of) the goal of the test: whether to build the thing right, or to build the right thing.

The existing software testing terms have been explained in the context of the above categorization system in order to make more sense of them.

The test pyramid tells us that it’s a good idea to write predominantly small tests.

Conclusion

I hope this post has helped you make more sense of software testing terms and to focus on the things that matter: the goals and the technical quality of the tests.

Although I’ve put forth some definitions here to help you, at the end of the day it’s more important that your team has a common agreement on what things mean and how to reason about things. Please feel free to modify my proposal to your liking.

In a future blog post, I’d like to give more recommendations w.r.t. how to write a good test suite. Please stay tuned.