Whitebait, Kleftiko, Ekmek Special: November 2009

Here’s some rubbish code I wrote:

class Connection: public util::Stream
{
    ...

public:
    class Observer
    {
    public:
 virtual ~Observer() {}

 virtual unsigned OnHttpHeader(const std::string& key,
          const std::string& value) = 0;
 virtual unsigned OnHttpData() = 0;
 virtual void OnHttpDone(unsigned int error_code) = 0;
    };

private:
    explicit Connection(Observer*);
    friend class Client;

public:
    ~Connection();

    unsigned int OnActivity();

    // Being a Stream
    unsigned Read(void *buffer, size_t len, size_t *pread);
    unsigned Write(const void *, size_t, size_t*) { return EINVAL; }
};

typedef boost::intrusive_ptr<Connection> ConnectionPtr;

class Client
{
public:
    Client();

    /** Passing a NULL verb means POST (if !body.empty()) or GET (otherwise).
     */
    ConnectionPtr Connect(util::PollerInterface *poller,
     Connection::Observer *obs,
     const std::string& url,
     const std::string& extra_headers = std::string(),
     const std::string& body = std::string(),
     const char *verb = NULL);
};

It’s meant to be an HTTP client implementation, where the Connect() call returns a smart pointer to a connection object that manages its own lifetime. The Connect() call sets it all off, and attaches it to the PollerInterface object, which calls back into the Connection object (the OnActivity method) whenever the socket becomes writable (or readable, depending on where we are in the http transaction).

I mean, just look at it: it’s obviously completely bogus. [Short pause whilst you just look at it.]

Actually, no. It wasn’t at all obvious to me just from looking at it, that it was completely bogus. The only point, in fact, at which it became obvious that it was completely bogus, was when it started causing bugs.

And the bugs it caused were quite awkward ones: crashes (and Valgrind violations), obviously timing-related, to do somehow with Connection pointers still being used after the object had gone away — which should really have been disallowed by the smart-pointer class.

It turned out that the problem was if the transaction completed too quickly. At the time the code was written, the PollerInterface object didn’t own (smart pointers to) the pollable items hung off it. So, in order to stop the Connection object disappearing, due to the final reference going away, during a call to OnActivity, OnActivity itself creates and holds an additional reference to the Connection object during its execution. But if the transaction got started quickly enough, the first call to OnActivity would happen before the constructor returned — in other words, before the pointer had been assigned to the smart-pointer that’s the result of Connect(). So the “additional” reference held inside OnActivity would be the only reference — and when it went away at the end of the function, there’d be no outstanding references and the object would be deleted.

The effect was as if the constructor had included “delete this” — resulting in the new call in “p = new Connection” returning a pointer that was already dead and dangling before even being assigned to p.

Completely bogus. And worse, a completely bogus design; given the constraints, there was nothing that could possibly be done in the methods of the classes Client and Connection that would correctly implement that interface. The interface itself needed to change. Fortunately, I decided it didn’t need to change much:

class Connection
{
    ...
public:

    /** Start the HTTP transaction.
     *
     * Immediate errors (failure to parse host, failure of connect()
     * call) are returned here; errors happening any later come back
     * through Observer::OnHttpDone. In fact, OnHttpDone may be called
     * before Init() returns, i.e. you need to be ready for OnHttpDone
     * calls before you call Init(). If Init() returns a failure,
     * OnHttpDone has not been called, and is guaranteed not to be
     * called afterwards. Otherwise, it's guaranteed it WILL be
     * called.
     */
    unsigned int Init();
    ...
};

So you get a completely inert ConnectionPtr back from Client::Connect, and only then — once you’ve safely squirreled-away the smart-pointer — do you light the blue touch-paper by calling Connection::Init().

But although this version at least makes it possible to use the functionality correctly, it doesn’t really make it easy — and that should be the real goal of any act of API design. There’s that uneasy division of labour between Connect() and Init() — methods on two different classes — and there’s a whole paragraph of complex object-lifetime issues to read and understand (or, as they’re otherwise known, bugs waiting to happen) for users of Init().

This, and particularly the lifetime of the Connection::Observer object, which usually ends up having to call “delete this” in its OnHttpDone() method, left me with one of those code itches that tells me that I (and, the largely theoretical, other users of this API) am writing more complex and icky client code than I should be.

Neatening this up required making the object-lifetime issues more sane, which in turn involved greater use of smart pointers. (Not quite Wheeler’s Law, because there was already a use of indirection, and the change involved only a strengthening from reference to ownership.) In the next release of Chorale, the PollerInterface has been replaced by a Scheduler, which keeps smart pointers to its pollable items, allowing the HTTP client API to be simplified to this:

class Connection: public util::Stream
{
public:
    virtual ~Connection() {}

    /** Called with data from the returned HTTP body (if transaction
     * succeeds).
     */
    virtual unsigned Write(const void *buffer, size_t len, size_t *pwrote);

    /** Called with each incoming HTTP header (if connection succeeds).
     */
    virtual void OnHeader(const std::string& /*key*/, 
     const std::string& /*value*/) {}

    /** Called with the overall result of the connection/transaction attempt.
     */
    virtual void OnDone(unsigned int error_code) = 0;
};

typedef util::CountedPointer<Connection> ConnectionPtr;

class Client
{
    class Task;

public:
    Client();

    /** Passing a NULL verb means POST (if body != NULL) or GET (otherwise).
     */
    unsigned int Connect(util::Scheduler *poller,
    ConnectionPtr target,
    const std::string& url,
    const std::string& extra_headers = std::string(),
    const std::string& body = std::string(),
    const char *verb = NULL);
};

So the Scheduler owns the (unseen to the library user) connection Task objects, and the connection Task objects own the Connection stream target objects. Connect() can return immediate errors from the socket ::connect call or from URL parsing, while deferring any other errors to come back through a later OnDone callback — all without there being any ambiguity of the lifetime of the streams or their observers.

(There would be a problem if the Connection object also had a smart pointer to the Task object, as then neither would ever be deleted and both would become memory leaks. But, because the data is pushed from the Task to the Connection, the Connection never needs to see the Task object — and indeed can’t, because Tasks live and die entirely inside library code, and users of the library can’t even obtain pointers to them.)

Simplicity does not precede complexity, but follows it

—attributed to Alan Perlis

So here’s the thing. The third design is clearly better than the second. Well, it’s clearly better than the first, too, but mainly because the first doesn’t work, which is a boring and trivial way for one design to be better than another. The interesting thing is that it’s better than the second.

And better it certainly is — making this change halved the size of the code that uses the library, as well as making it more intentional and less fragile and stylised.

So why, having got to the second design, was I not satisfied? Why did I carry on thinking about it, waiting for the inspiration of the third design to strike? And why, having come up with the third design, was there a feeling of happiness that wasn’t present when writing the the second one, even when it passed all the unit tests the first one failed?

The only answer I can come up with is to theorise the existence of an almost aesthetic sense of code quality — which is worrying in a couple of ways. Firstly, because what is instinctive is rarely communicable, and what is not communicable is soon lost: a software engineer’s fate is that of the wheelwright in the old story of Duke Huan.

But worse than that: if code quality is in fact an aesthetic, and thus extrarational, experience, then it raises the prospect that others, even other good software engineers, could have a different sense of aesthetics, ultimately resulting in a point where you and they are pulling the same code in opposite directions. (I heard recently of a software organisation, believers in the currently-fashionable “agile”, “refactor-mercilessly” style of development, in which two otherwise talented engineers spent all their time rewriting each others’ code rather than pushing things forward — as their aesthetic senses, and frankly their assumptions about who was “in charge” in the deliberately un-micro-managed environment, butted heads.)

No aesthete could get away with “correcting” the second design above into the first: the failing unit tests would prevent that. But are there those who would correct the third design into the second, in the opposite direction to me? If so, why? And, even more importantly, if not, why not?

Whitebait, Kleftiko, Ekmek Special

Monday 16 November 2009

This Code Is Completely Bogus

Blog Archive

Labels

About Me

Still me, but elsewhere