Not in fact any relation to the famous large Greek meal of the same name.

Saturday 9 April 2011

The Failure Mode Of Agile Development

Here’s the “traditional” waterfall model of software development:

+------------+------------+------------+
| Design     | Implement  | Test       |
+------------+------------+------------+
^ Start                                ^ Deadline

And here’s its almost inevitable failure mode: design takes too long, or implementation takes too long, but the deadline doesn’t move, so test is cut short and the product ships full of bugs:

+------------>>>+------------>>>+------+.....+
| Design        | Implement     | Test |
+------------>>>+------------>>>+------+.....+
^ Start                                ^ Deadline

Software development organisations have been wearily familiar with this outcome, whether in their own code or other people’s, since at least Brooks (1975). Both the cause and the effect of schedule crunch are widely-known and well-understood: often the effect is so well-known that it can be specifically detected by reviewers and end-users.

Here, by contrast, is the fashionable new agile model:

+------------+------------+------------+
| Red        | Green      | Refactor   |
+------------+------------+------------+
^ Start                                ^ Deadline

In this case, “red” means you write the unit-tests first, but of course they fail (red light), because you haven’t written the code yet. Then you write the code, so they pass (“green” light). But so far what you’ve done is “the simplest thing that could possibly work” – in other words, you’ve been deliberately closely-focussed, overtly short-termist, to get your tests to pass, and a refactoring stage is needed to reintegrate your new functionality with the bigger picture.

Agile, of course, involves many of these cycles between project start and project deadline, not just one. (Indeed, some say that each cycle should be as small as a small tomato. I find that the going rate is two small tomatoes to one cup of tea.) So the agile diagram isn’t drawn to quite the same scale as the waterfall one: still, though, developers acquire the same sense of schedule crunch, they skip on to the next task too soon, and the corresponding failure mode occurs:

+------------>>>+------------>>>+------+.....+
| Red           | Green         | Refac|
+------------>>>+------------>>>+------+.....+
^ Start                                ^ Deadline

The cause is the same, but what’s the effect? The code was complete, and – if not perfect – then at least unit-tested, at the end of the green phase. So the product as actually shipped works, which is more than can be said for its waterfallen equivalent.

All that’s missing, in fact, is some of the refactoring effort. Unfortunately, that’s the only place in agile development where any large-scale design work gets done: the design debt that an agile shop takes out by not doing Big Design Up-Front, is paid off only in these refactoring installments. This means that in fact the effect of schedule crunch on agile projects, is that the system ends up under-designed and directionless at any other than the lowest level.

And unlike the paper-bag obviousness of the waterfall model’s failure mode, the agile model’s failure mode is subtle and pernicious. Product 1.0 ships and works – because agile development “releases” every sprint, and thus is perfectly fitted for triaging features against time. But the system is a ball of mud. Feature development on Product 1.5 and Product 2.0 takes longer than expected – which agile development also helps to hide, given its stubborn reluctance to countenance long-term planning – because developers eventually spend all their time battling, not intrinsic problems of the target domain, but incidental problems caused by previous instances of themselves.

Only the most obsessed agiliste would claim that agile development doesn’t have a failure mode. But because agile development is new, its failure mode is unfamiliar to us; and because that failure mode is less visibly catastrophic than Brooks’s, it’s easier to overlook. It is, however, real; its very subtlety requires us to pay particular care to look out for it, and to get right on top of fixing it once we see it start to happen.

Conversely, the fact that there are problems that agile development can’t solve, isn’t a fatal blow. Inevitably such problems are the most visible ones – because all the problems which agile development does solve easily, come and go without anyone really noticing. And the failure mode of agile development – the system’s complexity spiralling out of control – can be fixed without doing too much damage to the theory.

And how to fix it? Schedule in some serious refactoring, one subsystem at a time. In his paper Checklist for Planning Software System Production, RW Bemer, writing in August 1966 (August 1966!) says:

Is periodic recoding recommended when a routine has lost a clean structural form?

Nearly 45 years later, that’s still effectively the best available advice. Whether you call it refactoring or periodic recoding, it of course takes advantage of all the unit tests that the ball of mud already contains. This time round, it also takes advantage of knowledge about how all the parts of the subsystem operate. That knowledge is unlikely to be acquired in a single two-week sprint, so unless you can put someone on the task who already knows the subsystem inside-out (and most of the time you’re in this state, there won’t even be such a person), you’ll find yourselves breaking some of the rules of agile development by formally or otherwise block-booking someone’s time for a larger period. (Agile development aims at having any team member able to take on any task in any sprint – but for that to be okay, there mustn’t be any software problems complex enough to require more than two weeks’ thought. Some software problems just are that big, and context-switching can break the developers’ train of thought.)

This is the answer to the sometimes-asked question, “If, in agile development, everyone does design [as often as once per small tomato], what’s the rôle of the architect?”. Agile development is, in a way, Christopher Alexander’s observation that most things can be made piecemeal. But simplicity cannot be made piecemeal. The contribution of the software architect is simplicity.

Saturday 22 January 2011

Is “Factory Method” An Anti-Pattern?

Let’s take another look at this version of the “two implementations, one interface” code from that one about portability:

// Event.h v8
class Event {
public:
  virtual ~Event() {}
  virtual void Wake() = 0;
  ...
};

std::auto_ptr<Event> CreateEvent();

// Event.cpp
std::auto_ptr<Event> CreateEvent()
{
  ... return whichever derived class of Event is appropriate ...
}

What I didn’t say at the time is that this is sort-of the Factory Method pattern, though a strict following of that pattern would instead have us put the CreateEvent function inside the class as a static member, Event::Create(). And the pattern also includes designs where CreateEvent is a factory method on a different class from Event, but it‘s specifically “self-factory methods” such as Event::Create that I’m concerned with here.

(As an aside: the patterns literature comes in for a lot of criticism for being simplistic. Which it is: the GoF book could be a quarter the size if it engaged in less spoon-feeding and pedagoguery. (And by pedagoguery I mean pedagoguery.) But in a sense the simplicity and obviousness of the patterns they’re describing is the whole point: thanks to the patternistas (patternauts?), a lot of simple and obvious things that often come up in programmers’ natural discourse about code, and that didn’t previously have names, now do. Reading a patterns book might not make your code much better unless you’re a n00b, but that’s not what it’s for. It’s for making your discourse about code better. In any n>1 team, being able to discourse well about code is a vital skill.)

But what I also didn’t say at the time, is that whether CreateEvent is a member or not, so long as it’s in event.h, this code appears to have cyclic dependencies — to be a violation of the principle of “levelisability” set out in the Lakos book.

What’s going on, as you can see on the left, is that, although the source file dependencies themselves don’t exhibit any cycles, viewing, as Lakos does, each header and its corresponding .cpp file as forming a component — the three grey rectangles — produces a component dependency graph with cycles: win32/event ↔ event ↔ posix/event.

One way around that would be to move CreateEvent out into its own component — a freestanding event factory — as seen on the right. With this change, the design is fairly clearly levelisable at both the file level and the component level. This refactoring is an example of what Lakos (5.2) calles escalation: knowledge of the multifarious nature of events has been kicked upstairs to a higher-level class that, being higher-level, is allowed to know everything. (The file event.cpp now gets a question-mark because, as the implementation file for what may now be a completely abstract class, it might not exist at all — or it might exist and contain the unit tests.)

But is it worth it? We’ve arguably complicated the code — requiring users of events to know also about event factories — for what’s essentially the synthetic goal of levelisability: synthetic in the sense that it won’t be appearing in any user-story. Any subsequent programmer working on the code would be very tempted to fold the factory back into event.h under the banner of the Factory Method pattern.

Moreover, in this case the warning is basically a false positive: if Event is truly an abstract class (give or take its factory method), then the apparent coupling between, say, posix/event and event is not in fact present: posix/event can be unit-tested without linking against event.o nor win32/event.o. (Not that, in this particular example, posix/event and win32/event would both exist for the same target — but factory methods obviously also get used for cases where both potential concrete products exist in the same build.) Though conversely, if Event had any non-abstract methods — any with implementations in event.cpp — then it’d be a true positive not a false positive, as all the different event types would be undesirably link-time coupled together.

One reason that the refactoring is worth it, is the same sort of reason that fixing compiler warnings, i.e. altering the code so they don’t trigger, is worth it, even in instances when the warning doesn’t point out a bug: because if you let warnings proliferate, real ones will get lost in the noise, and ideally you aim for the zero-warnings state in order that the introduction of any new warning is an easily-visible alert telling you that there’s a new potential bug to check for. Steve Maguire is talking here about unit tests, but the same applies to compiler warnings: “[W]hen they corner a bug, they grab it by the antennae, drag it to the broadcast studio, and interrupt your regularly-scheduled program”.

Exactly like compiler warnings, cyclic-dependency warnings — which are really design warnings — are sometimes false positives, but likewise it’s worth aiming for the “zero design warnings” state, because it makes new design warnings stand out so. I ran a cycle-checker script (in the spirit of the ones in Lakos) over my own Chorale project, and the result was that it effectively shone a spotlight on all the parts of the code where the design was a bit icky. Every cycle it produced — one was dvb::Service ↔ dvb::Recording, another was all the parts of db::steam depending on each other in a big loop — was a place where I’d at some time thought, “Hmm, this isn’t quite right, but it works for now and I’ll come back and do it properly”. And of course without anything to remind me, I never had gone back and done it properly.

So it turns out that you can’t have both factory methods and levelisability. You have to pick one or the other. And levelisability is better.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.