Paul M. Jones | Why I Prefer Test-Later

Why I Prefer Test-Later

I remain unconvinced of the benefits of test-first and test-driven development (TDD) because I think the underlying principles of TDD are lacking, not because of the way TDD adherents talk about those principles. I believe I understand the test-first adherents very well, and I disagree with them.

I agree with Travis that many TDD and test-first devotees suffer from what Travis calls “expertise syndrome”. I think almost anyone who has a great deal of sophisticated and nuanced knowledge on a particular topic must actively look out for and modify the kind of behavior Travis describes.

But neither the “expertise syndrome” nor the sometimes religious zeal of many TDDers are the cause of my ambivalent feelings about test-first. Instead, I simply fail to see that TDD is really so useful as its adherents anecdotally proclaim. This is not because their communication skills are lacking in some way, but because I find the idea of TDD itself to be lacking.

I have various reasons for thinking test-first is not the fantastic solution its devotees claim. Chief among my reasons is this: a programmer not already familiar with the domain of a non-trivial problem generally doesn’t know enough to write meaningful tests in advance. The act of writing the code as a solution will train the programmer in the domain. He can write more useful tests afterwards to make sure future changes to the code do not break existing behaviors.

The Map Is Not The Territory

I think test-first is not likely to help most programmers understand the problem any better — test-first is only going to tell the programmer if his code works the way he thinks it should, not if it solves the problem he is addressing.

This is because the the map is not the territory. That is, a “paper abstract” map of the problem is not the same as the “physical concrete” territory of the problem. The map may provide useful hints, but until you have explored the territory, your understanding of reality is severely handicapped regardless of how smart, intelligent, or clever you think you are.

Test-first assumes the map of the problem is sufficient. In reality, unexplored assumptions and unexpected interactions abound. Test-later depends on you having explored the territory of the problem, becoming familiar with the reality of the situation by working through it yourself and gaining personal experience with it. Your solutions will be all the better for it, and your tests will then reflect the real territory, not the abstract map.

Thus, I believe test-later is more accurate and more useful, as well as a more effective use of your limited time when checking solutions in code, because it is based on practical experiment, not hypotheticals.

Testing Is Useful

Testing is useful and necessary. Unit, system, and integration testing are a great tool to help find, remove, and prevent new errors in working code. I do not think that test-first is any better than test-later, especially for programmers who have good self-management and self-discipline.

David Sklar has said in regards to testing that “tools are secondary, discipline is primary” (see slide 40). I heartily agree. In this sense, I think for programmers who have good discipline, test-first might well be a good tool. But test-later is just as good a tool, and perhaps a better one.

Similarly, with programmers who have mediocre or poor self-discipline, test-first and TDD may help them produce better code … but the quality of that code will still be lower than that generated by good programmers who take the time to explore the problem domain.

It appears that at least one bit of research supports my otherwise unproven assertions in this regard. Or, rather, the debunking of one bit of research.

The control group (non-TDD or “Test Last”) had higher quality in every dimension"âthey had higher floor, ceiling, mean, and median quality.

The control group produced higher quality with consistently fewer tests.

Quality was better correlated to number of tests for the TDD group (an interesting point of differentiation that I'm not sure the authors caught).

The control group's productivity was highly predictable as a function of number of tests and had a stronger correlation than the TDD group.

So TDD's relationship to quality is problematic at best.

…

Having a big clump in that upper left quadrant is troubling enough but then having the “Test Last” group almost double your “Test First” group in the over 90% quality range is something that should be noticed and highlighted.

While correlation doesn't equal causation, the lack of correlation pretty much requires a lack of causation.

Read the whole article for yourself, as well as the original report, and draw your own conclusions.

Are you stuck with a legacy PHP application? You should buy my book because it gives you a step-by-step guide to improving you codebase, all while keeping it running the whole time.