To Test or Not to Test? That’s a Good Question.

Turns out the eternal verities of software development are neither eternal nor verities. I’m speaking in this case of the role of tests.

Once upon a time tests were seen as someone else’s job (speaking from a programmer’s perspective). Along came XP and said no, tests are everybody’s job, continuously. Then a cult of dogmatism sprang up around testing–if you can conceivably write a test you must.

By insisting that I always write tests I learned that I can test pretty much anything given enough time. I learned that tests can be incredibly valuable technically, psychologically, socially, and economically. However, until recently there was an underlying assumption to my strategy that I wasn’t really clear about.

Software development is often a long game. My favorite software business of all time is the MVS PL/1 compiler. I heard a rumor that at one point it was earning IBM $300M annually with a total staff of 3 developers. To get to such a business you have to be patient, to invest in extending the lifetime of your software for decades if necessary.

It’s that “often” that hid my assumption about testing. Just as golf has a long game and short game requiring related but not identical skills, so software has a long game and a short game. With JUnit Max I am living the short game of software. It’s teaching me the meaning of “related but not identical skills” when applied to software development.

Two Projects

JUnit is a long game–lots of users, stable revenue ($0, alas), bounded scope. We know what JUnit is. We know what attracts and retains users. We just need to stay a bit ahead of slowly evolving needs.

Working on JUnit, the whole bag of XP practices makes sense. We always test-drive development. We refactor whenever we can, sometimes trying 3-4 approaches before hitting one we are willing to live with.

Success in JUnit is defined by keeping the support cost asymptotically close to zero. We have a huge, active user base and no budget for support. The means to success is clear–slow evolution, comprehensive testing, and infrequent releases.

When I started JUnit Max it slowly dawned on me that the rules had changed. The killer question was (is), “What features will attract paying customers?” By definition this is an unanswered question. If JUnit (or any other free-as-in-beer package) implements a feature, no one will pay for it in Max.

Success in JUnit Max is defined by bootstrap revenue: more paying users, more revenue per users, and/or a higher viral coefficient. Since, per definition, the means to achieve success are unknown, what maximizes the chance for success is trying lots of experiments and incorporating feedback from actual use and adoption.

To Test. Or Not.

One form of feedback I put in place is that all internal errors in Max are reported to a central server. Unlike long game projects, runtime errors in short game projects are not necessarily A Bad Thing (that’s a topic for another post). Errors I don’t know about, however, are definitely A Bad Thing.

Looking through the error log I saw two errors I knew how to fix. I didn’t have any experiments that would fit into the available time, so I set out to fix them both.

The first defect was clear–projects that were closed caused an exception. Writing the test was easy–clone an existing test but close the project before running Max. Sure enough, red bar. A two-line fix later, green bar.

The second defect posed a dilemma. I could see how to fix the problem, but I estimated it would take me several hours to learn what was necessary to write an automated test. My solution: fix it and ship it. No test.

I stand behind both decisions. In both cases I maximized the number of validated experiments I could perform. The test for the first defect prevented regressions, added to my confidence, and supported future development. Not writing the test for the second defect gave me time to try a new feature.

No Easy Answer

When I started Max I didn’t have any automated tests for the first month. I did all of my testing manually. After I got the first few subscribers I went back and wrote tests for the existing functionality. Again, I think this sequence maximized the number of validated experiments I could perform per unit time. With little or no code, no tests let me start faster (the first test I wrote took me almost a week). Once the first bit of code was proved valuable (in the sense that a few of my friends would pay for it), tests let me experiment quickly with that code with confidence.

Whether or not to write automated tests requires balancing a range of factors. Even in Max I write a fair number of tests. If I can think of a cheap way to write a test, I develop every feature acceptance-test-first. Especially if I am not sure how to implement the feature, writing a test gives me good ideas. When working on Max, the question of whether or not to write a test boils down to whether a test helps me validate more experiments per unit time. It does, I write it. If not, damn the torpedoes. I am trying to maximize the chance that I’ll achieve wheels-up revenue for Max. The reasoning around design investment is similarly complicated, but again that’s the topic for a future post.

Some day Max will be a long game project, with a clear scope and sustainable revenue. Maintaining flexibility while simultaneously reducing costs will take over as goals. Days invested in one test will pay off. Until then, I need to remember to play the short game.

36 Comments

Ron JeffriesMay 14th, 2009 at 5:24 pm

Lovely comparison, the long game and short game. I trust you, and about three other people, to make good short game decisions. My long experience suggests that there is a sort of knee in the curve of impact for short-game-focused decisions. Make too many and suddenly reliability and the ability to progress drop substantially.

I hope that as you make this transition from short to long, you’ll keep an eye on how you decide, and how you know, and keep us up on what you learn.

Thanks,

R

Chris MountfordMay 14th, 2009 at 5:50 pm

This is it. What timeframe are you optimising for? Unfortunately, many organisations only evaluate performance over a short timeframe whereas they succeed or fail based on optimising their long term prospects.

Related to this I recently wondered about whether quality always has a business case. What you seem to say should not be controversial, it’s a cost-benefit tradeoff for an investment intended to pay off within a specific timeframe. But some test driven people are dogmatic as you put it and so I think for them this idea will be controversial.

On a slightly lighter note I’ve just realised unit testing exemplifies quantum behaviour.

Adam WilliamsMay 14th, 2009 at 6:47 pm

I have found that over ten years of programming, writing tests from the beginning thanks to good rearing, I am getting better at identifying short games. It’s easier to switch games when you’re the only programmer. Like Ron eludes to, a challenge is working with other people who either cannot, do not, or do not yet know how to make that choice. I think testing everything for a good long time is a great way to learn that skill.

Jake BoxerMay 14th, 2009 at 7:25 pm

Great read. The web app I’m currently working on is definitely a “short game” project at the moment, and this article removed the guilt I’ve felt from not providing extremely solid test coverage at this state. Once I launch (beta), I’ll be more focused on tests. Thanks for the confidence boost.

Johannes LinkMay 14th, 2009 at 11:19 pm

I have seen a couple of developers who were able to make a reasonable short-term / long-term decisions. I am yet to see a single team, though; let alone an organization.

Michael O'BrienMay 14th, 2009 at 11:50 pm

A great article and the right decision, I think. It’s too easy to get caught up in beauty and consistency when you’re writing code, and forget what you’re writing code *for*. I write tests because it makes writing code easier and gives me confidence the code does what I think it does. If writing a test isn’t going to help me achieve that, I say skip it.

Dave HooverMay 15th, 2009 at 6:26 am

I think Adam and Johannes make a great point. Short game decisions are simpler on solo projects. (I suppose just about everything is simpler on solo projects.) I have to make these sorts of short/long decisions every week in my development of madmimi.com. Since I usually am the only person developing the core software, I have this freedom. But if I were on a team, the game changes.

Curtis CooleyMay 15th, 2009 at 8:56 am

Elegant as ever. My only concern is that people will say, “See, Kent Beck says I don’t need to test this!” rather than realizing that Kent Beck, through rigor and discipline, as honed his skills to point where he can make that decision.

Ron got it right, you and about three other people can make those decisions. For the rest of us, I lean towards writing the test while learning the intricacies of the short game.

You have, however, inspired me to get my own pet project out the door and quit futzing with testing every little piece. Thank you.

RenzoMay 15th, 2009 at 10:36 am

I also don’t test for specific cases. When I don’t it’s because I’m missing the skills to test effectively, that is, in an easy and consolidated way. Let’s take UI view tests in Rails. I know how to spec them out very quickly, mocks, render directives and so on. Almost a no brainer. But there are other cases where it takes me a long time to figure out how to mock certain APIs, what methods should I call and how to organize the spec suite. So my rule became: if it’s taking too long or it’s too complicated fork a new task to spike and learn about it. Now ship it! It’s brutal I know but usually that’s the correct thing to do. Of course the meaning of “too long” is strictly project dependent. Then I put shame on me because I was missing a “tool” in my tool set so that next time I can’t be taken by surprise. I know I need practice.

Anthony Broad-CrawfordMay 15th, 2009 at 11:36 am

I still think we should all be asking the question “is this appropriate” for any tool in our software development toolkit. Blindly using all the tools in all situations without ever stopping to ask “is this appropriate” scares me more then “See, Kent Beck says I don’t need to test this!”

StephenMay 15th, 2009 at 2:16 pm

Funny enough, this post has inspired me to go back and write a few “guilty conscience” tests. :)

KentBeckMay 15th, 2009 at 7:47 pm

Stephen,

Good to hear. It sounds like you understood my point. Sometimes tests help you validate more concrete ideas with paying customers. Write those tests. Sometimes tests hinder you from validating more concrete ideas with paying customers. Don’t write those tests. And a test may change from the latter to the former over time. When it does, write it.

I am quite certain I am not somehow spectacularly wise in seeing tests in this way. Saying, “Oh, we’re not smart enough to decide so we’ll just write the tests all the time,” is a path to irrelevance. Entrepreneurs who know they need to validate concrete ideas with paying customers will ignore tests altogether, which is a pity. With the right tests they could be validating more ideas.

Elad SoferMay 16th, 2009 at 3:42 am

Overall, interesting article, I just hope this is not going to be misunderstood and provide an excuse for lots of programmers who dislike the notion of unit testing.

OlofBMay 16th, 2009 at 7:18 am

(this message was sent to a mailing list first; hope you can live with the 3rd person language)

I think one relevant idea Kent brings up is feedback flow. If we focus on getting that flow-per-unit-time high, we are heading in the right direction.

For example, he mentions short-term untested-features-adding being a maximizer of feedback-flow in the beginning of the JUnitMax project, since writing the first test was so darn hard to write (took him over a week). He got a higher feedback-flow by just hacking it together and releasing; his “red tests” were the first few users and their feedback.

I’d like to examine that idea more, for example, it does not mention the feedback delay; only the feedback volume/flow. To me, delay is at least as important as feedback volume.

Antony MarcanoMay 17th, 2009 at 10:53 pm

You are both product owner (making many decisions on the end-user’s behalf) and an end-user and the developer in this case… So that, combined with your experience, may be part of the reason you are able to make these choices so quickly and instinctively.

I also like the comments about “flow” from OlofB… nice way to abstract the message.

I wish more decisions about what not to test were made with eyes-wide-open that you will need to come back and write a test when sustainable flow makes it compelling again… Sadly, I see developers deciding not to write tests because they are under pressure to deliver (and do so without “wasting time on testing”) by people who don’t understand the consequences and aren’t thinking in terms of anything but the short game… and so legacy code grows and grows…

And then the same people placing this pressure wonder why development grinds to a halt and suddenly takes so long… Because they aren’t thinking about sustainable flow… they are always in the short-game… even thought they’ve already been playing for a long, long time.

Or, perhaps getting features out faster sets corporate product-owners expectations and it’s hard to change that when we get into the long-game decisions… I try to manage those expectations from the outset but this is fast forgotten when they are using those new features and chomping at the bit for more.

I think this post and the comments will perhaps help many better articulate these choices to customers, making it not so hard for developers to help their customers also play the long-game.

Nice post Kent… generating some interesting and insightful comments.

KentBeckMay 18th, 2009 at 6:31 am

I agree that the decisions to test or not to test should be made with eyes open. What is becoming clear to me is that a product has a profile like a flight. Taxiing is like exploring ideas (this took ~15 years in the case of JUnit Max). Then you reach the runway and need to get up enough speed (revenue) to get the wheels up. This requires lots of experimentation. Then you have the climb, when the problems are mostly about scaling. Then you have level flight, where cost reduction becomes the priority. Eventually the product starts to descend and you need to start the process over again. Each phase requires some different engineering practices, but one principle unites them: use engineering skill, talent and time to the greatest possible business benefit.

KentBeckMay 18th, 2009 at 6:32 am

I don’t think I understand the distinction you are making. Are you saying that I need to consider the entire feedback loop, from experiment to data to conclusion, instead of just the time to create the experiment?

KentBeckMay 18th, 2009 at 6:34 am

I hope so too. I also hope people in startups don’t ignore XP because all those tests seem like they would slow down the feedback cycle. I find that some of the tests I write increase the feedback cycle. I was trying to figure out the difference.

Guilherme ChapiewskiMay 18th, 2009 at 6:57 am

There’s another point of view, the point of view of design. I would agree with you if the benefit of testing was only to guarantee that the code is working, but the best benefit that I see is the improvement in the code design when I use Test-Driven Development, promoting decoupling, enabling dependency injection, etc, etc. And, as you said, sometimes writing tests makes you think better about what you are trying to do and gives you good ideas.

Besides that, sometimes you think it’s a short game but it’s not. I have a experience of a project of mine that I decided many times in the beginning to don’t write much tests, with the excuse (to myself) that it was just a proof of concept. When the thing worked and people started to use it, they found a few bugs that I couldn’t fix quickly. Besides that, I couldn’t refactor to put tests on it because the code was untestable. It was rotten. In the end I had to throw a lot of things away and write them again from scratch…

I think that it wouldn’t have hurt if I had written tests since the beginning (maybe just 20% more time), but the cost of not doing it was really expensive and time consuming…

I’m not trying to beat your arguments, because I sometimes did what you did and I understand your point of view. But considering these things that I said, don’t you think that it’s good to write tests even when it doesn’t look like it’s worthy? What’s your opinion?

KentBeckMay 18th, 2009 at 7:25 am

I agree that confusing the practices and the principles leads to problems. And that tests lead to better designs. That’s why I have ~30 functional tests and ~25 unit tests (odd balance because Eclipse apps are so hard to test). I do almost all of my new feature work acceptance-test-first. It helps reduce the cycle time.

Do you fly a plane the same on the runway, during climb out, and in level flight? The skills and controls are similar but different. Software development is the same. Managing the transitions from one phase to another is a challenge, especially if people cling to their practices.

[...] que a veces es posible, incluso aceptable, no seguir las reglas que él mismo escribió. En su blog personal, Kent escribió, aunque para muchos parece más bien que confesó, había escrito parte del código [...]

Philip SchwarzMay 25th, 2009 at 2:34 am

When blogging about someone allegedly being a heretic, accuracy is, I would have thought, essential.

Blog post Kent Beck’s Heresy (in Spanish) says that Kent blogged that he wrote part of JUnit without tests.

I replied to the post as follows:

He did not. JUnit was and is developed test-first because JUnit is a long-game. It is JUnit Max that is sometimes not developed test-first, because it is a short game

In Kent’s own words:

“JUnit is a long game–lots of users, stable revenue ($0, alas), bounded scope”

“Working on JUnit, the whole bag of XP practices makes sense. We always test-drive development. ”

“With JUnit Max I am living the short game of software.”

“When I started Max I didn’t have any automated tests for the first month. I did all of my testing manually. ”

“Whether or not to write automated tests requires balancing a range of factors. Even in Max I write a fair number of tests.”

“When working on Max, the question of whether or not to write a test boils down to whether a test helps me validate more experiments per unit time. It does, I write it.”

Erik PetersenMay 27th, 2009 at 9:14 pm

Great Post. The issue that fascinates me is as mechanical code-based test generation becomes common-place, will time-poor coders lose the enthusiasm for test first development and all the inherent improvements to the design it brings? At the same time people like Bob Martin are using tools like the CRAP metric to keep design clean as possible, highlighting where wher extra tests or refactorings are needed. Interesting times….

Full post with example links
http://www.testingreflections.com/node/view/8108

Dan MullineuxJune 1st, 2009 at 1:43 pm

This is blatant pragmatism and no cause for embarrassment. It is not backtracking but more that it is refining the definition of ‘development’.

We all know there is some code that is in fact analysis. Even as part of a well ‘analysed’, specified, and test scripted story, developers will often (and should) have to try a couple of things; designs, algorithms, patterns, modified patterns etc. Sometimes, or in fact often, whole designs should be fleshed out in this way, especially designs to solve novel problems. Even problems with common patterns can benefit from a bit of playing, skeleton code, demo, discussion etc.; the infamous tracer bullet.

Little of that that benefits from testing…… Unless of course you are trying to decide how testable your design or code choices are, in that case you obviously need to spike some tests.

In the case of the untestable defect – ‘untestable’ in that an engineering decision is made that it is too expensive to test now, hence the short game analogy. How do you effectively assess the long game impact, not easily, but experience shows it ultimately costs more to over emphasise the short game. However it is also clear to me that getting the balance wrong the other way will also increases costs.

Normally a “difficult to test” fix is either difficult to unit test or functional tests, rarely both. Generally defects are just someone finding a missing test case, which should be coded, but if it is truly too expensive to test then perhaps something more fundamental is wrong with the code.

Dan.

Pete GoslingJune 18th, 2009 at 12:00 pm

“JUnit is a long game–lots of users, stable revenue ($0, alas), bounded scope. We know what JUnit is. ”

I hope people won’t interpret that as a set of criteria for when to use TDD, and that projects that don’t satisfy (all) those criteria don’t need tests.

Many of the managers and agile skeptics I have worked with would focus on that statement and think; “We don’t yet have stable revenue, and our scope is expanding (which we thought agile was supposed to handle) so we’re in the short game, until we get into maintenance in a few months time. Until then, we can go ahead and crank out features, instead of testing.”

This early JUnit Max work sounds more like a spike, in this case a spike used to generate user stories. If it’s a spike, or a screen mockup or a even paper sketch, shouldn’t it be separate from production code? In that case, some of the spikes would be rejected as unprofitable. Others would be chosen to become user stories in the production system, and tests would be written before copying code from that spike to the production project. I know spikes are normally throw away, but I guess code from a spike could be copied across to the real project, provided tests are in place first?

Wouldn’t it be easier to keep a clear distinction between spikes (etc) and production code, than between different types of production project?

Maybe the real issue here is how to get a valid set of initial user stories, for a product development project, as much as ‘to test or not to test?’

Pete

[...] at the pace we did and getting launched when we did with the product we did, it seems to me that going more quickly was the right choice to make at the time. Despite the fact that the net amount of programmer time was eventually greater, we exchanged that [...]

[...] topic that stimulated this whole line of thinking for me was the question of automated testing. I blogged about how some tests didn’t meet business needs in the takeoff phase. Much sound-biting [...]

[...] earliest stage is proving the idea. He later asserted that when you’re doing stuff like this, you can drop TDD, temporarily. I thought about this and it clicked. If you’re working on a prototype, where [...]

[...] Kent Beck , autor de “ Extreme Programming Explained” e “ Test Driven Development: By Example” sugere que um projeto de software, assim como golf, pode ser um jogo longo ou curto. JUnit é um exemplo de projeto longo, muitos usuários, rentabilidade estável (a $0 é triste para qualquer envolvido), onde o objetivo principal é proporcionar funcionalidades além das necessidades dos usuários. [...]

Johnny 99 « Grumpy Old ProgrammerJuly 3rd, 2009 at 3:21 am

[...] not that code without tests is necessarily bad. I mean, heck, even Kent Beck sometimes flies without a safety net. If it’s write-once-and-throw-away code, then there’s an argument for just getting it [...]

Coding: Quick feedback at Mark NeedhamJuly 20th, 2009 at 3:11 am

[...] Beck recently wrote about the trade off between the amount of time it takes to write an automated test and the feedback it gives y… giving an example when developing JUnit Max of a time when his feedback cycle was quicker by not [...]

JimJuly 29th, 2009 at 11:59 pm

Not testing (at the very least the very meaty parts of) a small hobby app is like playing Russian Roulette with your 4 year old daughter, your cat, and Hitler. The odds are not in your favor.

KentBeckJuly 30th, 2009 at 5:43 am

Vivid metaphor, but I don’t think we disagree. Testing that accelerates feedback is great on the runway. Expensive testing that trades slower feedback for long-term sustainability is wasteful on the runway.

[...] (This post is the second in a trilogy that started with To Test Or Not To Test?) [...]

[...] my earlier discussions of testing and defect fixing, I’ll complete the trilogy by discussing the role of design early in [...]

yurikFebruary 2nd, 2010 at 11:25 pm

In my honest opinion unit tests are like sandboxes. You can play tiny game in sandbox, than use same code in your short game, than in long game in your terms. But there is no reason to treat all the universe as set of sandboxes, even if it is possible. Looks obvious. But from the perspective of team leader it is really hard to say if other programmers unit tests were necessary or not.
Does he really need that many or just spend his time to nothing? Can we trust the code or we need more tests?

As for code we have some methods for review, but tests depends on programmer. Programmer keep his eyes open. Or do not keep. He may make tiny steps or big steps. Ok, I see – something is wrong about this programmers test, they take too much time and i see no benefits from this test and that, this one is good, this seems to be useless again.
So i need to do smthing more than “Hey, look! This is bad.” – “Yep” – “But that is good!” – “Yep”
Pair programming looks like good solution but we have only 3 programmers and more than 10 projects (not very valuable but long).