Not in fact any relation to the famous large Greek meal of the same name.

Saturday 9 April 2011

The Failure Mode Of Agile Development

Here’s the “traditional” waterfall model of software development:

+------------+------------+------------+
| Design     | Implement  | Test       |
+------------+------------+------------+
^ Start                                ^ Deadline

And here’s its almost inevitable failure mode: design takes too long, or implementation takes too long, but the deadline doesn’t move, so test is cut short and the product ships full of bugs:

+------------>>>+------------>>>+------+.....+
| Design        | Implement     | Test |
+------------>>>+------------>>>+------+.....+
^ Start                                ^ Deadline

Software development organisations have been wearily familiar with this outcome, whether in their own code or other people’s, since at least Brooks (1975). Both the cause and the effect of schedule crunch are widely-known and well-understood: often the effect is so well-known that it can be specifically detected by reviewers and end-users.

Here, by contrast, is the fashionable new agile model:

+------------+------------+------------+
| Red        | Green      | Refactor   |
+------------+------------+------------+
^ Start                                ^ Deadline

In this case, “red” means you write the unit-tests first, but of course they fail (red light), because you haven’t written the code yet. Then you write the code, so they pass (“green” light). But so far what you’ve done is “the simplest thing that could possibly work” – in other words, you’ve been deliberately closely-focussed, overtly short-termist, to get your tests to pass, and a refactoring stage is needed to reintegrate your new functionality with the bigger picture.

Agile, of course, involves many of these cycles between project start and project deadline, not just one. (Indeed, some say that each cycle should be as small as a small tomato. I find that the going rate is two small tomatoes to one cup of tea.) So the agile diagram isn’t drawn to quite the same scale as the waterfall one: still, though, developers acquire the same sense of schedule crunch, they skip on to the next task too soon, and the corresponding failure mode occurs:

+------------>>>+------------>>>+------+.....+
| Red           | Green         | Refac|
+------------>>>+------------>>>+------+.....+
^ Start                                ^ Deadline

The cause is the same, but what’s the effect? The code was complete, and – if not perfect – then at least unit-tested, at the end of the green phase. So the product as actually shipped works, which is more than can be said for its waterfallen equivalent.

All that’s missing, in fact, is some of the refactoring effort. Unfortunately, that’s the only place in agile development where any large-scale design work gets done: the design debt that an agile shop takes out by not doing Big Design Up-Front, is paid off only in these refactoring installments. This means that in fact the effect of schedule crunch on agile projects, is that the system ends up under-designed and directionless at any other than the lowest level.

And unlike the paper-bag obviousness of the waterfall model’s failure mode, the agile model’s failure mode is subtle and pernicious. Product 1.0 ships and works – because agile development “releases” every sprint, and thus is perfectly fitted for triaging features against time. But the system is a ball of mud. Feature development on Product 1.5 and Product 2.0 takes longer than expected – which agile development also helps to hide, given its stubborn reluctance to countenance long-term planning – because developers eventually spend all their time battling, not intrinsic problems of the target domain, but incidental problems caused by previous instances of themselves.

Only the most obsessed agiliste would claim that agile development doesn’t have a failure mode. But because agile development is new, its failure mode is unfamiliar to us; and because that failure mode is less visibly catastrophic than Brooks’s, it’s easier to overlook. It is, however, real; its very subtlety requires us to pay particular care to look out for it, and to get right on top of fixing it once we see it start to happen.

Conversely, the fact that there are problems that agile development can’t solve, isn’t a fatal blow. Inevitably such problems are the most visible ones – because all the problems which agile development does solve easily, come and go without anyone really noticing. And the failure mode of agile development – the system’s complexity spiralling out of control – can be fixed without doing too much damage to the theory.

And how to fix it? Schedule in some serious refactoring, one subsystem at a time. In his paper Checklist for Planning Software System Production, RW Bemer, writing in August 1966 (August 1966!) says:

Is periodic recoding recommended when a routine has lost a clean structural form?

Nearly 45 years later, that’s still effectively the best available advice. Whether you call it refactoring or periodic recoding, it of course takes advantage of all the unit tests that the ball of mud already contains. This time round, it also takes advantage of knowledge about how all the parts of the subsystem operate. That knowledge is unlikely to be acquired in a single two-week sprint, so unless you can put someone on the task who already knows the subsystem inside-out (and most of the time you’re in this state, there won’t even be such a person), you’ll find yourselves breaking some of the rules of agile development by formally or otherwise block-booking someone’s time for a larger period. (Agile development aims at having any team member able to take on any task in any sprint – but for that to be okay, there mustn’t be any software problems complex enough to require more than two weeks’ thought. Some software problems just are that big, and context-switching can break the developers’ train of thought.)

This is the answer to the sometimes-asked question, “If, in agile development, everyone does design [as often as once per small tomato], what’s the rôle of the architect?”. Agile development is, in a way, Christopher Alexander’s observation that most things can be made piecemeal. But simplicity cannot be made piecemeal. The contribution of the software architect is simplicity.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.