Whitebait, Kleftiko, Ekmek Special

Not in fact any relation to the famous large Greek meal of the same name.

Tuesday, 4 December 2018

A rebasing vendor-branch workflow for Git

An earlier post laid out a neat way of managing third-party code, in a subdirectory of a larger Git repository, as a vendor branch. Here at Electric Imp we followed that scheme for the subsequent five years, and bitter experience over that time has now convinced us that it doesn’t work well in all circumstances, and a better plan is needed.

The issue is that essentially all local changes to the repository accrete over time into one massive merge commit. If we’re talking about a sizeable amount of code, and if we’ve made lots of changes to it over time, and if upstream have also made lots of changes to it over time, then re-merging that single merge commit soon becomes horrendous. Basically, the original bow-shaped vendor branch workflow does not scale.

So what to do instead? We need to deconstruct that overwhelming single merge commit; we need, ideally, to move from a merging workflow to a rebasing workflow.

The broad outline of the situation is as follows: some time ago we merged an upstream version of some vendor code, but we’ve been hacking on it since. Meanwhile, upstream has produced a newer version whose updates we’d like to take – while keeping our changes, where relevant.

To achieve this, we’re first going to create a new branch, re-import a stock copy of the current upstream version, and then re-apply our changes, one by one, so that the tip of that branch exactly matches the tip of master.

So far, this sounds like a lot of work to achieve literally nothing! But what we now have, is a branch containing just our changes, in a sequence of individual commits. In other words, it’s exactly what’s needed in order to rebase those commits on top of a newer upstream release. So, starting from our re-import, we create a second new branch, then import a stock copy of the new upstream (so that there’s a single commit containing all upstream’s changes between the two), and then rebase our re-apply branch on top of that.

The upstream source in the example below is the same WICED SDK as used in the previous blog post; note that although the whole WICED project was sold by Broadcom to new owners Cypress in the meantime, many references to “Broadcom” still remain in our code, notably in the directory names. This ticks all three boxes for needing the heavyweight integration process: it’s large – 7,000 source files – we’ve made lots of changes, and upstream have also made lots of changes.

Here’s exactly what to do. First we need to know which commits, since the last time we took an upstream version, changed the vendor directory – here, thirdpaty/broadcom. Fortunately, git log is able to tell us exactly that. We need to go back in history (perhaps using gitk’s “find commits touching paths” option) and find the commit where we originally took our current upstream version. In the repo I’m using, that’s ac64f159.

The following command logs all commits that have changed the thirdparty/broadcom directory since then:

git log --format=%H --reverse --ancestry-path ac64f159..master -- thirdparty/broadcom
For me, that lists 198 commits! Because of the --reverse, they’re listed in forwards chronological order: in other words, in the order in which we need to re-apply them on our re-merge branch. Let’s put that list of commits in a file, commits.txt:

git log --format=%H --reverse --ancestry-path ac64f159..master -- thirdparty/broadcom > commits.txt

Now we start a new branch:

git checkout -b pdh-wiced-remerge
...and re-import the current upstream version, here WICED-3.5.2:
rm -rf thirdparty/broadcom
cd thirdparty
7zr x WICED-SDK-3.5.2.7z
cd ..
mv thirdparty/WICED-SDK-3.5.2 thirdparty/broadcom
(clean up the source as necessary, fix CR/LFs etc.)
git add -A thirdparty/broadcom
git commit -m"Import stock WICED 3.5.2"
git tag pdh-wiced-3.5.2-stock

Notice that we tag this commit, as we’ll be referring to it again later.

Now we’d like to replay our long list of local commits on top of that release – to get back, as it were, to where we started. The thing to note here is that we can do this in a completely automated way – there should be no chance of merge conflicts, as we’re re-applying the same commits on top of the same upstream drop. It’s so very automated that we can do each one using a script, which I called apply-just-broadcom:

#! /bin/bash
git checkout $1 -- thirdparty/broadcom
git add -u thirdparty/broadcom
git commit -C $1
This says, first checkout (or, really, apply) the commit named in the script’s first argument – but only where it affects the thirdparty/broadcom directory. Any other parts of the commit aren’t applied. This automatically adds any new files, but it doesn’t delete anything deleted by the commit we’re applying – so we delete those using git add -u. Finally we want to commit those changes to our new branch, but using the commit message they originally had – for which, git commit’s -C option is exactly what we want.

Armed with that script, we can then apply, one-by-one, all the commits we identified before:

git checkout pdh-wiced-remerge
for i in `cat commits.txt` ; do scripts/apply-just-broadcom $i ; done
This will churn through the commits, one by one, re-applying them to the pdh-wiced-remerge branch. Because they’re all getting applied in the same context in which they were originally committed, they should all go in one-after-the-other with no conflicts or warnings. (If they don’t, perhaps your re-import of the old upstream didn’t quite match the original import, so fix that and start again.)

And now we should be, quite circuitously, back where we started, with the tip of the pdh-wiced-remerge branch matching master exactly:

git diff pdh-wiced-remerge master
...which should show no differences at all. What you’d see in gitk is something like the image to the right, showing master with your branch coming off it. And the branch contains the (re-)import commit, then all the work that’s been done since. Scrolled way, way off the top of that screenshot, about 150 commits further up, is the tip of the pdh-wiced-remerge branch.

Optionally but usefully, you can now tidy up the branch to make it easier to apply to the new release. For instance, if the branch contains patches that were later reverted, you can use interactive rebase to remove both the patch and the revert, for the same result but avoiding any chance of the patch later causing conflicts. Doing this should still leave no diff between your branch and master.

Even more optionally, but still usefully, I needed at this stage to rewrite a bunch of the commit messages. Those numbers in square-brackets in the commit summaries, are instructions to our git server to link the commits with the related user-story (or bug) in Pivotal Tracker. (The reason they don’t all have one, is that some of the original commits were themselves on bow-shaped branches, and only the merge commit carried the Pivotal link.) I wanted to remove those links, so that pushing my branch didn’t spam every Pivotal story associated with a thirdparty/broadcom commit, some of which were years old by this stage. But there are tons of them, so I wanted to rewrite the messages programmatically. It’s definitely out-of-scope for this blog post, but I ended up using the powerful and not-to-be-trifled-with git filter-branch command, in conjunction with a tiny sed script:

git filter-branch --msg-filter "sed -e 's/\[\#\([0-9]*\)\]/(\1)/g'" master..pdh-wiced-remerge
This monstrosity rewrites every commit message between master and pdh-wiced-remerge, to replace square-bracket Pivotal links with curved-bracket “links” that don’t trigger the automatic Pivotal integration.

Anyway, that aside, we’re now in a position to merge the new WICED release, upstream version 3.6.3. So let’s tag where we are now, so we can refer to it later:

git checkout pdh-wiced-remerge
git tag pdh-wiced-3.5.2-merged

We want to branch our WICED-3.6.3 branch from the stock 3.5.2 re-import, so that there’s a diff that just shows upstream’s changes between 3.5.2 and 3.6.3. So that’s:

git checkout pdh-wiced-3.5.2-stock
git checkout -b pdh-wiced-3.6.3
And now we can do the same dance as before, removing the old thirdparty/broadcom and replacing it with the new one:
rm -rf thirdparty/broadcom
cd thirdparty
7zr x WICED-SDK-3.6.3.7z
cd ..
mv thirdparty/WICED-SDK-3.6.3 thirdparty/broadcom
(clean up the source as necessary, fix CR/LFs etc.)
git add -A thirdparty/broadcom
git commit -m"Import stock WICED 3.6.3"
git tag pdh-wiced-3.6.3-stock

Again, because this is a complete replacement, there’s no chance of merge conflicts.

Now just the rebasing operation itself remains. Because we tagged all the important points, we can use those tag names in the rebase command:

git rebase --onto pdh-wiced-3.6.3-stock pdh-wiced-3.5.2.stock pdh-wiced-remerge

This will be a lengthy rebasing operation, and merge conflicts are likely to occur. These can be fixed “in the usual way” – this isn’t a process usable be people who aren’t already experienced in fixing rebase conflicts, so I won’t say much more about that fixing here. But note that the rebase operation always tells you which commit it got stuck on – so you can go and look at the original version of that commit to check what its intention was, and also at the pdh-wiced-3.6.3-stock commit, containing exactly upstream’s changes between 3.5.2 and 3.6.3, in the hope that you can glean some insight into what upstream’s intention was.

If you’ve previously been using our original vendor-branch scheme, the first of those commits will be the worst trouble to integrate, as it will be the single big-bang merge commit from the previous integration. But at least you can console yourself now that future integrations will be no harder than this – as opposed to the previous scheme where they just kept getting harder.

Once you’ve completed the rebase, the repository will look basically as it is on the right. Notice that the rebase operation has “moved” the branch pdh-wiced-remerge so that it’s now on top of the 3.6.3 merge; it’s not still where the pdh-wiced-3.5.2-merged tag points to. (Rebasing always moves the branch that you’re rebasing; it never moves any tags.)

Now you get to build and test the pdh-wiced-remerge branch; it’s likely that it currently doesn’t even compile, and here is where you make any major changes needed to deal with the new drop. (Minor, textual changes may have already been fixed during the rebase process.) Add any new commits as necessary, on the pdh-wiced-remerge branch, until everything builds, runs, and passes all its tests. This may or may not be particularly arduous, depending on the nature of the changes made upstream. (But either way it’s still less arduous than the same thing would have been with the merging workflow.)

Now all that remains is to merge the pdh-wiced-remerge branch back to master as a normal bow-shaped branch:

git checkout master
git merge --no-ff pdh-wiced-remerge

And you should end up with a repository that looks like the one at left: a bow-shaped branch containing, in order, the re-imported 3.5.2; the newly-imported 3.6.3; the rebased versions of all your previous fixes; and any new fixes occasioned by 3.6.3 itself.

Notice that the replayed fixes leading up to the tag pdh-wiced-3.5.2-merged don’t appear in the branch as finally merged, but that the stock WICED 3.5.2 commit does. This is probably what you want: it’s important to have a commit that denotes only the changes made upstream between releases, but it’s not very important to record the replayed fixes – after all, those commits were generated by a script to start with, so the information in them can’t be that significant.

So now 3.6.3 is merged. And when the next upstream release comes along, you get to repeat the whole process.

But what if several upstream releases have happened since you last took one? There are two options: either you do the above process once, taking the latest release – or, you do it repeatedly, once for each intermediate release. The latter is the best option if widespread changes have occurred, as it lets you fix each conflict separately, rather than all in one go. And in fact if you know you’ve got several upstream releases to merge, you don’t have to bother merging back to master each time (especially if you too have 198 commits on your remerge branch): you can keep branching and rebasing and then merge the final result to master in one go, as seen on the right. The branch, as finally merged, would contain, in order:

  • stock 3.5.2
  • stock 3.6.3
  • stock 3.7.0
  • ... further upstream releases ... (red)
  • fixes from 3.5.2 and earlier (pink)
  • fixes from 3.6.3 (pale blue)
  • fixes from 3.7.0 (mid-blue)
  • ... further sets of fixes ... (green)

All of which means that the tip of that branch would be the latest upstream version, with all relevant fixes applied. Which is what’s needed to be merged.

Friday, 14 September 2018

Enable and disable methods are an anti-pattern

Here's a C++ API for an abstracted I2C master peripheral, such as might be found on a microcontroller – an API which I invented:

class I2C
{
public:
    virtual ~I2C() {}

    virtual void setClockSpeed(ClockSpeed speed) = 0;
    virtual void enable() = 0;
    virtual void disable() = 0;
    virtual ssize_t read(uint8_t target, const uint8_t *subAddress,
                         size_t subAddressBytes, uint8_t *buffer,
                         size_t bufferBytes) = 0;
    virtual ssize_t write(uint8_t target, const uint8_t *message,
                          size_t messageLength) = 0;
};

The details of how an I2C bus actually works aren't so relevant to the current issue, so I've removed the comments. What's interesting here is what the calling code needs to look like: the caller has to set the desired clock-speed, enable the peripheral, read and/or write using it, and then disable it again.

This code has been happily chugging away inside Electric Imp devices of various types for about seven years now. Which is why it comes as such a surprise to me that I now think it's terrible.

The realisation came after spending some time learning the Rust language, whereby I've been trained into caring a lot about object lifetimes – because the compiler actually rejects code containing use-after-free errors and whole swathes of further bugs of that sort. I've seen bugs we diagnosed via in telemetry reports from crashing units in production, which would have been caught as compile-time errors in Rust. So Rust is great, once you've got your head round it; but in fact it's so great that I now find myself writing better C++ because of the experience of writing some Rust.

From an object-lifetime point-of-view, the API listed above is error-prone and so undesirable. It's too easy for client code to get things wrong: calling read() before you've called enable(), calling read() after you've called disable(), or calling setClockSpeed() – er, wait, when should you call setClockSpeed()? Is it only allowed before enable? Or is it only allowed afterwards? Or can it be in either position? What guarantees on that point are offered by implementations of this interface?

An act of API design should be an act of humility; you should, to a certain extent, be aiming to make other people's lives easier at the expense of making your own life harder. I failed to do that here. Offering a read() method that just returns ERR_NOT_ENABLED if you haven't called enable() yet, is just leading callers down the garden path. That ordering should be expressed in the API, so that, in the Rust style, it's a compile-time error to get it wrong (or alternatively, in another situation familiar to Rust programmers, so that the error simply isn't expressible in the first place).

And how can that be arranged? By not giving the caller access to the read() method until they've called the enable() method. What if we remove the read() and write() methods from class I2C altogether, and put them in a "handle" class which callers can't get hold of except by enabling the peripheral? This ends up looking like the API below:

class EnabledI2C
{
public:
    virtual ~EnabledI2C() {}

    virtual ssize_t read(uint8_t target, const uint8_t *subAddress,
                         size_t subAddressBytes, uint8_t *buffer,
                         size_t bufferBytes) = 0;
    virtual ssize_t write(uint8_t target, const uint8_t *message,
                          size_t messageLength) = 0;
};

class I2C
{
public:
    virtual ~I2C() {}

    virtual std::unique_ptr<EnabledI2C> open(ClockSpeed cs);
};

The disable() method has disappeared altogether; when the calling code has finished with its I2C controller, it can just reset or destroy the unique_ptr, and the destructor of EnabledI2C (or rather, the destructors of its subclasses) can do the work of disable(). Likewise, the work of enable() can be done in the constructor, bringing some RAII to the situation and making the implementations, as well as the calling code, less error-prone.

Without a method called "disable", "enable" looked like the wrong name, so it became "open". And this refactoring has also cleared up the ambiguity around setClockSpeed(); in fact, one implementation of this I2C interface allowed it to be called at any time, even in-between reads, but in another implementation it only took effect at enable() time. In practice, the calling code all expected it to work in the latter way, so we can codify that directly into the new API by incorporating it into the "enable" ("open") call itself, so that it's impossible to call it too early or too late.

And again but in C

C++'s support for RAII definitely adds both neatness and robustness to the code here. But part of the principle – that where API calls must be ordered or nested in order to work properly, that ordering or nesting should be enforced by the code rather than meekly whined-about by its documentation – applies even in a C version of the same API:

struct I2C; // opaque
struct EnabledI2C; // opaque

EnabledI2C *I2C_OpenEnabled(I2C *i2c, unsigned clockSpeed);

ssize_t EnabledI2C_Read(EnabledI2C *ei2c, int8_t target,
                        const uint8_t *subAddress,
                        size_t subAddressBytes, uint8_t *buffer,
                        size_t bufferBytes);
ssize_t EnabledI2C_Write(EnabledI2C *ei2c, uint8_t target,
                         const uint8_t *message, size_t messageLength);
void EnabledI2C_Close(EnabledI2C **pei2c);

Without RAII, the "close" corresponding to the "open" must be explicit, but at least it's still not possible for the client to call things in the wrong order.

And again but in Squirrel

This part is a bit of a cheat; the Electric Imp Squirrel API isn't actually implemented in Squirrel; rather, it's a set of bindings from Squirrel to the C++ code in the improm. But even so the same pattern can apply; here's (the native-Squirrel equivalent of) what it currently looks like:

class I2C {
    function configure(clockSpeed) { ... }
    function read(deviceAddress, registerAddress, numberOfBytes) { ... }
    function write(deviceAddress, registerPlusData) { ... }
    function disable() { ... }
};

This API is used by our customers left, right, and centre, and we can't ever change it. (We could extend it, perhaps.) But if we had our time over again, I'd have done it like this:

class I2C {
    function open(clockSpeed) { ... return EnabledI2C(...); }
};

class EnabledI2C {
    function read(deviceAddress, registerAddress, numberOfBytes) { ... }
    function write(deviceAddress, registerPlusData) { ... }
};

As in C++, the disable() call has disappeared, because it's all done in the destructor of the EnabledI2C object, which is called when the Squirrel reference-counting system determines that the object is no longer being used. The calling code might look like this:

local e = hardware.i2cAB.open(100000);
e.read(...);
e.write(...);
e = null;

Alternatively, it could just rely on e going out of scope at the end of the function, instead of explicitly setting it to null. But the cheating here has intensified: native Squirrel does not include destructors, and so doesn't allow for RAII. However, Squirrel objects that are bindings to other languages – such as the C++ bindings used by the Electric Imp API – can have destructors using Squirrel's sq_setreleasehook mechanism, and so can be used with RAII.

Wednesday, 13 April 2016

How to create an SSH-only shell account in Ubuntu

Because I keep forgetting.
$ sudo adduser --gecos 'Fred Smith' --disabled-password fred
$ cd /home/fred
$ sudo mkdir .ssh
$ sudo cp ~/fred.public.key .ssh/authorized_keys
$ sudo chmod 0700 .ssh
$ sudo chown -R fred.fred .ssh
If Fred's public key starts "BEGIN SSH2 PUBLIC KEY" then turn it into the one-line form first.

Tuesday, 25 August 2015

C++: Deduced templates are smarter than explicit templates

Suppose you’re writing a C++ class that encapsulates the idea of a callback, a routine that’s invoked when an asynchronous operation completes. (For instance, such a thing would be useful when implementing promises.) Effectively what you want is a lightweight version of std::function, perhaps a bit like this:

template <typename RET, typename ...ARGS>
class Callback
{
    typedef RET (*pfn)(void*, ARGS...);

    void *m_ptr;
    pfn m_pfn;

public:
    Callback() :  m_ptr(NULL), m_pfn(NULL) {}
    Callback(void *ptr, pfn ppfn) : m_ptr(ptr), m_pfn(ppfn) {}

    RET operator()(ARGS... args) const { return (*m_pfn)(m_ptr, args...); }
};

And here’s what a corresponding callback-using interface might look like:

typedef Callback<unsigned int, unsigned int, unsigned int, const std::string&> BDC;

class CDA
{
public:
    unsigned browse(uint32_t starting_index, BDC callback)
    {
        return callback(2, starting_index+1, "hello");
    }
};

All that, of course, uses C++11 variadic templates, in order to encompass callbacks with zero, one or several arguments. But in fact the technique described here works equally well with good old C++98 (where you’d need separate callback classes for each different number of arguments).

Now the client – the code supplying the callback – is itself probably a C++ class, in which case we don’t want to pass a bare C function pointer into Callback’s constructor. What we want is the ability to pass a method of our object as a Callback – or, in other words, to bind a method and a class instance together so that they function as a Callback.

So how do you write this bind() function? What you’d like to write is something like this (where the final template parameter is not a type, but the pointer-to-member itself, thus ensuring that a different Callback is instantiated for each different method called):

template <typename T, typename RET, typename ...ARGS, RET (T::*FN)(ARGS...)>
Callback<RET, ARGS...> bind(T *t)
{
    //...
}

That would probably work, but would look a mess at the call site, because you’d have to explicitly state all the template parameters:

auto c = bind<unsigned, unsigned, unsigned, std::string&, &Foo::onBrowseDone>(this);

Really you’d like all those other types to be worked out from the pointer-to-member, whose own type does after all incorporate all the information:

auto c = bind<&Foo::onBrowseDone>(this);

Sadly you can’t do that, because you can’t parameterise bind() on just the pointer-to-member – because you can’t even name the necessary pointer-to-member type without using the names of other types, which aren’t defined yet:

template <RET (T::*FN)(ARGS...)>  // doesn't compile
Callback<RET,ARGS...> bind(T *t)
{
    //...
}

Such are promises: all lies and jest.

But wait: it turns out that there are template invocations in C++ (both ’98 and ’11) that are more powerful than these explicit invocations. These are the deduced templates, and they happen in two contexts: when a call is made to an overloaded function, and the parameter types are used to pick which overload to use; and when a partially-specialised class or function is used, and a particular specialisation must be chosen. The first of those doesn’t help us much, but we can work with the second.

If we declare a class Binder, parameterised on any type, but then specialise it for the generic type of pointers-to-member, then the C++ compiler is required to use what is essentially a full functional-programmer’s unification algorithm to match the caller’s type against the list of specialisations. This unification is much like what you see in ML’s pattern-matching, or Erlang’s “=” operator. Truly devious people use it for things such as SFINAE and template meta-programming, but here we can get it to deduce all the other needed types from the pointer-to-member’s type alone:

template <typename T>
class Binder;

template <typename RET, typename T, typename ...ARGS>
class Binder<RET (T::*)(ARGS...)>
{
    template <RET (T::*FN)(ARGS...)>
    static RET caller(void *ptr, ARGS... args)
    {
        return (((T*)ptr)->*FN)(args...);
    }

public:
    template <RET (T::*FN)(ARGS...)>
    static Callback<RET,ARGS...> bind(T *t)
    {
        return Callback<RET,ARGS...>((void*)t, &caller<FN>);
    }
};

#define BIND(x,y) Binder<typeof(x)>::bind<x>(y)

It’s not very obvious what’s up with the template parameters there, so to make it explicit: first we declare class Binder as a template class with one template parameter, which is a type. Then we specialise class Binder for those types which match the pattern RET (T::*)(ARGS...) for arbitrary RET, T, and ARGS – which is precisely the types of pointers-to-member. (And that’s it: there’s only one specialisation of Binder.)

Inside class Binder is static method bind(), which is a template method: it’s parameterised again. This time the template parameter is not a type: it’s a non-type template parameter, a value, an actual pointer-to-member. So both bind() and caller() are instantiated once for each different member called.

That difference between the two template parameters, of Binder and bind(), isn’t very obvious from the syntax – but compare it to this slightly contrived but analogous example, where again the outer template argument is a type, and the inner one is a value of that type:

template <typename T>
struct IntFuncs
{
    template <T N>
    static T IntFunc()
    {
        return N;
    }
};

unsigned answer = IntFuncs<unsigned>::IntFunc<42u>();

The only awkwardness when using class Binder – the only thing separating it from the idealised call to bind() – is the need to specify the pointer-to-member twice: once as a type, once as a value. But that repetition can be parcelled-up in a macro, as seen in BIND() above. And now the client code looks very straightforward:

class Foo
{
    unsigned onBrowseDone(unsigned int, unsigned int, const std::string&);

public:
    void fnord(CDA *cda)
    {
        cda->browse(3, BIND(&Foo::onBrowseDone, this));
    }
};

Once again the Law of Conservation of Ick applies: we’ve removed ickiness from both the API (the callback-caller) and the client (the callback-supplier), while somewhat concentrating it in the definitions of classes Callback and particularly Binder. But as Binder only needs writing once (or indeed even less often than that, if you just nick it from this blog post, Creative Commons Zero and all that) and it’s generic enough to be used be many APIs and clients, that seems like a good trade-off.

Friday, 19 December 2014

Optimising Squirrel for speed

TL;DR: avoiding slow Squirrel constructs in favour of faster ones, makes this program run four times faster.

Electric Imp’s little embedded WiFi gadgets are programmed in a scripting language called Squirrel. Like Javascript or Python, Squirrel is compiled to bytecode and executed in an interpreter. This brings lots of advantages – notably, that the interpreter acts as a sandbox and allows crashes and other bugs to be debugged remotely – but sometimes, in embedded systems, the performance consequences of that design can be an issue.

So it’s helpful to know how to write Squirrel that runs as fast as possible on the Electric Imp platform. Electric Imp’s own website contains some generic guidance, but it’s helpful to see it applied to a real example: in this case the impoclock, a ring of sixty WS2812B “Neopixels” implementing an analogue clock.

You can follow the story as a Github gist: starting with the baseline, which consists mostly of the stock NeoPixels class from the Electric Imp examples repository. (And perfectly good example code it is too: the optimisation methods in this post will make the code faster, but also less readable and so less suitable to learn from or adapt.) You can also sneak a peek at the finished version as a Github gist.

0. Baseline

As always, there’s no point trying to optimise something you can’t measure: optimisations applied blindly could as likely be unhelpful as helpful. So the first patch does nothing except add code to time one iteration of the main displaytime routine, and also to log the available memory:

@@ -132,6 +132,8 @@ pixel <- NeoPixels(spi, NUM_PIXELS)
 
 server.log(imp.getsoftwareversion() + " / " + imp.getmacaddress())
 
+server.disconnect();
+
 background <- [0,0,0]
 
 function proximity(a, b, limit) {
@@ -177,4 +179,9 @@ function displaytime(){
   imp.wakeup(0.25, displaytime)
 }
 
+local getus = hardware.micros.bindenv(hardware);
+local us = getus();
 displaytime()
+us = getus() - us;
+local freemem = imp.getmemoryfree();
+server.log(us+" "+freemem);

(The version history is also visible as Gist history, each section of this post as a separate commit.) The code disconnects from the server before taking the measurements: this helps with getting consistent results, as no WiFi interrupts can change the timings, nor any network buffers change the memory usage. The baseline measurements are:

Time/μsCode sizeFree memory
49,3736.58%60,712b

That’s about 50,000 microseconds, or 50 milliseconds, or a twentieth of a second to update sixty Neopixels: actually plenty for an analogue clock with a one-second resolution, but other Neopixel applications might be more demanding.

1. Global variables

So let’s get amongst it. The main thing about optimising Squirrel, is that string lookups are expensive. In Squirrel, table members are accessed by string lookup – and that includes members of the global table. However, local variables are accessed more rapidly by (integer) stack offset – and that includes locals of the implicit top-level function. That suggests that it would make a difference just to turn some of the global variables into locals:

@@ -128,15 +128,15 @@ const NUM_PIXELS = 60
 
 spi <- hardware.spi257
 spi.configure(MSB_FIRST, 7500)
-pixel <- NeoPixels(spi, NUM_PIXELS)
+local pixel = NeoPixels(spi, NUM_PIXELS)
 
 server.log(imp.getsoftwareversion() + " / " + imp.getmacaddress())
 
 server.disconnect();
 
-background <- [0,0,0]
+local background = [0,0,0];
 
-function proximity(a, b, limit) {
+local function proximity(a, b, limit) {
 //  return a == b
   local diff = math.abs(a-b)
   if (diff > 60-limit) {

And indeed this does make a measurable difference: it’s already almost ten per cent faster.

Time/μsCode sizeFree memory
Baseline49,3736.58%60,712b
Global variables44,6666.57%60,752b

2. Optimising writePixel

String lookups definitely seem like a profitable area of investigation. So let’s look at all the string lookups done inside the writePixel method: there are lots, both to look up frame and bits in the surrounding class, and to look up the seek and writeblob members of the blob object.

All of these can be replaced by references to local variables. Squirrel allows functions to refer to local variables in enclosing functions: this can be used to collect together a collection of values plus a function, in such a way that the function always gets called with those values available. In this patch, the variables frameSeeker, frameWriter, and refBits are local variables of the NeoPixels constructor: this means that under normal circumstances, they would evaporate at the end of the constructor call. But here, because they’re referenced inside the function assigned to this.pixelWriter, they are “captured” as “outers”, their lifetime continues, and they remain available to all future calls to the pixelWriter function. The whole collection – the function plus its collected-together variables – is called a closure, and the function is described as having closed over the collected-together variables; similar concepts (differing sometimes in details) exist in other dynamic languages, such as Javascript.

This is probably the least idiomatic and most confusing optimisation in the whole post, but it’s an important one and is worth taking the time to understand. Access to local variables, even “outers”, is really fast – and the new pixelWriter, unlike the old writePixel, refers only to local variables.

@@ -51,6 +51,8 @@ class NeoPixels {
  frameSize       = null; // number of pixels per frame
  frame           = null; // a blob to hold the current frame
 
+ pixelWriter     = null;
+
  // _spi - A configured spi (MSB_FIRST, 7.5MHz)
  // _frameSize - Number of Pixels per frame
     
@@ -66,6 +68,18 @@ class NeoPixels {
         
   clearFrame();
   writeFrame();
+
+  local frameSeeker = frame.seek.bindenv(frame);
+  local frameWriter = frame.writeblob.bindenv(frame);
+  local refBits = bits;
+  this.pixelWriter = function(p, color) {
+      frameSeeker(p*BYTESPERPIXEL);
+
+      // red and green are swapped for some reason, so swizzle them back
+      frameWriter(refBits[color[1]]);
+      frameWriter(refBits[color[0]]);
+      frameWriter(refBits[color[2]]);
+  }
  }
     
  // fill the array of representative 1-wire waveforms. 
@@ -99,13 +113,7 @@ class NeoPixels {
  // color is an array of the form [r, g, b]
     
  function writePixel(p, color) {
-  frame.seek(p*BYTESPERPIXEL);
-        
-  // red and green are swapped for some reason, so swizzle them back 
-        
-  frame.writeblob(bits[color[1]]);
-  frame.writeblob(bits[color[0]]);
-  frame.writeblob(bits[color[2]]);    
+     pixelWriter(p, color);
  }
     
  // Clears the frame buffer

The results justify all that effort: the code is now more than 25% faster than when we started. (It uses a bit more memory and code space, but for many purposes that’s a fair exchange.)

Time/μsCode sizeFree memory
Baseline49,3736.58%60,712b
Global variables44,6666.57%60,752b
Optimising writePixel36,4007.06%60,048b

3. Optimising writeFrame

The same trick applies to the writeFrame method. In fact, the original method can be replaced completely: notice that client code which calls pixel.writeFrame() doesn’t need to change. The syntax for calling class-members which are functions, is the same as for calling methods.

@@ -52,6 +52,7 @@ class NeoPixels {
  frame           = null; // a blob to hold the current frame
 
  pixelWriter     = null;
+ writeFrame      = null;
 
  // _spi - A configured spi (MSB_FIRST, 7.5MHz)
  // _frameSize - Number of Pixels per frame
@@ -61,6 +62,15 @@ class NeoPixels {
   this.frameSize = _frameSize;
   this.frame = blob(frameSize*BYTESPERPIXEL + 1);
   this.frame[frameSize*BYTESPERPIXEL] = 0;
+
+  local spiWriter = _spi.write.bindenv(_spi);
+  local refFrame = frame;
+
+                // writes the frame buffer to the pixel strip
+                // ie - this function changes the pixel strip
+  this.writeFrame = function() {
+      spiWriter(refFrame);
+  }
         
   // prepare the bits array and the clearblob blob
         
@@ -123,13 +133,6 @@ class NeoPixels {
   frame.seek(0);
   for (local p = 0; p < frameSize; p++) frame.writeblob(clearblob);
  }
-    
- // writes the frame buffer to the pixel strip
- // ie - this function changes the pixel strip
-    
- function writeFrame() {
-  spi.write(frame);
- }
 }
 
 const NUM_PIXELS = 60

As the example code only calls writeFrame once per iteration – as opposed to the sixty times it calls writePixel – the results are a lot less dramatic. Still a slight improvement, though. The remaining methods of class NeoPixels aren’t called even once per frame – just at startup – so there’s little point optimising them.

Time/μsCode sizeFree memory
Baseline49,3736.58%60,712b
Global variables44,6666.57%60,752b
Optimising writePixel36,4007.06%60,048b
Optimising writeFrame36,3967.17%59,908b

4. Optimising Neopixel calls

The faster pixelWriter function sped up writePixels without changing the API to class NeoPixels. But there’s further speed gains to be had by changing the client code to use it directly. The same is true, to a lesser extent, of writeFrame.

No bindenv is needed for pixelWriter, because it wasn’t a member-function to start with – just a plain free-standing function.

Also in this patch, string lookups for the class-member BYTESPERPIXEL are replaced with the use of a Squirrel const. Consts, like locals, are fast. (This part of the patch really belonged in Stage 2, but I was confused by the SHOUTY name into thinking that BYTESPERPIXEL was already a constant.)

The patch follows the convention that constants belonging to a particular class – which can’t, in Squirrel, be declared as consts inside the class – are given names starting with the class’s name. The same could also be done for the other unchanging values, ONE and NONE – but they’re only used at startup, so the update speed wouldn’t be improved.

@@ -1,3 +1,6 @@
+
+const NEOPIXELS_BYTESPERPIXEL = 24;
+
 class NeoPixels {
     
  // Driver for the World Semi WS2812 - aka Adafruit NeoPixel
@@ -33,7 +36,6 @@ class NeoPixels {
     
  ZERO            = 0xC0;
  ONE             = 0xF8;
- BYTESPERPIXEL   = 24;
     
  // when instantiated, the neopixel class will fill this array with blobs to 
  // represent the waveforms to send the numbers 0 to 255. This allows the blobs to be
@@ -60,8 +62,8 @@ class NeoPixels {
  constructor(_spi, _frameSize) {
   this.spi = _spi;
   this.frameSize = _frameSize;
-  this.frame = blob(frameSize*BYTESPERPIXEL + 1);
-  this.frame[frameSize*BYTESPERPIXEL] = 0;
+  this.frame = blob(frameSize*NEOPIXELS_BYTESPERPIXEL + 1);
+  this.frame[frameSize*NEOPIXELS_BYTESPERPIXEL] = 0;
 
   local spiWriter = _spi.write.bindenv(_spi);
   local refFrame = frame;
@@ -83,7 +85,7 @@ class NeoPixels {
   local frameWriter = frame.writeblob.bindenv(frame);
   local refBits = bits;
   this.pixelWriter = function(p, color) {
-      frameSeeker(p*BYTESPERPIXEL);
+      frameSeeker(p*NEOPIXELS_BYTESPERPIXEL);
 
       // red and green are swapped for some reason, so swizzle them back 
       frameWriter(refBits[color[1]]);
@@ -100,7 +102,7 @@ class NeoPixels {
         
   bits = array(256);
   for (local i = 0; i < 256; i++) {
-   local valblob = blob(BYTESPERPIXEL / 3);
+   local valblob = blob(NEOPIXELS_BYTESPERPIXEL / 3);
    valblob.writen((i & 0x80) ? ONE:ZERO,'b');
    valblob.writen((i & 0x40) ? ONE:ZERO,'b');
    valblob.writen((i & 0x20) ? ONE:ZERO,'b');
@@ -113,7 +115,7 @@ class NeoPixels {
   }
         
   // now fill the clearblob
-  for(local j = 0; j < BYTESPERPIXEL; j++) {
+  for(local j = 0; j < NEOPIXELS_BYTESPERPIXEL; j++) {
    clearblob.writen(ZERO, 'b');
   }
  }
@@ -140,6 +142,8 @@ const NUM_PIXELS = 60
 spi <- hardware.spi257
 spi.configure(MSB_FIRST, 7500)
 local pixel = NeoPixels(spi, NUM_PIXELS)
+local pixelWriter = pixel.pixelWriter;
+local frameWriter = pixel.writeFrame;
 
 server.log(imp.getsoftwareversion() + " / " + imp.getmacaddress())
 
@@ -184,9 +188,9 @@ function displaytime(){
       colour[1] = 0
       colour[2] = 100
     }
-    pixel.writePixel(i,colour)
+    pixelWriter(i,colour)
   }  
-  pixel.writeFrame()
+  frameWriter();
   imp.wakeup(0.25, displaytime)
 }
 

Even this small change makes another 6% difference.

Time/μsCode sizeFree memory
Baseline49,3736.58%60,712b
Global variables44,6666.57%60,752b
Optimising writePixel36,4007.06%60,048b
Optimising writeFrame36,3967.17%59,908b
Optimising Neopixel calls34,0437.15%60,060b

5. Removing array references

In Squirrel, arrays such as colour are reference-counted, heap-allocated objects. That sounds a bit expensive: perhaps it’d be faster to pass the three elements (red, green and blue) around separately.

(This was also the motivation for making pixelWriter separate from writePixel: this way, the existing API of class NeoPixels isn’t changed.)

@@ -84,13 +84,13 @@ class NeoPixels {
   local frameSeeker = frame.seek.bindenv(frame);
   local frameWriter = frame.writeblob.bindenv(frame);
   local refBits = bits;
-  this.pixelWriter = function(p, color) {
+  this.pixelWriter = function(p, r,g,b) {
       frameSeeker(p*NEOPIXELS_BYTESPERPIXEL);
 
       // red and green are swapped for some reason, so swizzle them back 
-      frameWriter(refBits[color[1]]);
-      frameWriter(refBits[color[0]]);
-      frameWriter(refBits[color[2]]);
+      frameWriter(refBits[g]);
+      frameWriter(refBits[r]);
+      frameWriter(refBits[b]);
   }
  }
     
@@ -125,7 +125,7 @@ class NeoPixels {
  // color is an array of the form [r, g, b]
     
  function writePixel(p, color) {
-     pixelWriter(p, color);
+     pixelWriter(p, color[0], color[1], color[2]);
  }
     
  // Clears the frame buffer
@@ -170,25 +170,27 @@ function displaytime(){
     hourled = hourled - 60
   }
   for(local i = 0; i<NUM_PIXELS; i++){
-    local colour = [background[0], background[1], background[2]]
+    local red = background[0];
+    local green = background[1];
+    local blue = background[2];
     local hourprox = proximity(i, hourled, 3.0)
     if (hourprox != 0) {
-      colour[0] = (100 * hourprox * hourprox).tointeger()
-      colour[1] = (100 * hourprox * hourprox).tointeger()
-      colour[2] = 0
+      red = (100 * hourprox * hourprox).tointeger()
+      green= (100 * hourprox * hourprox).tointeger()
+      blue = 0
     }
     local minprox = proximity(i, now.min, 2.0)
     if (minprox != 0) {
-      colour[0] = 0
-      colour[1] = (100 * minprox * minprox).tointeger()
-      colour[2] = (100 * minprox * minprox).tointeger()
+      red = 0
+      green = (100 * minprox * minprox).tointeger()
+      blue = (100 * minprox * minprox).tointeger()
     }
     if (proximity(i, now.sec, 0)) {
-      colour[0] = 100
-      colour[1] = 0
-      colour[2] = 100
+      red = 100
+      green = 0
+      blue = 100
     }
-    pixelWriter(i,colour)
+    pixelWriter(i,red,green,blue);
   }  
   frameWriter();
   imp.wakeup(0.25, displaytime)

And yes, it turns out heap objects are slower than the alternative.

Time/μsCode sizeFree memory
Baseline49,3736.58%60,712b
Global variables44,6666.57%60,752b
Optimising writePixel36,4007.06%60,048b
Optimising writeFrame36,3967.17%59,908b
Optimising Neopixel calls34,0437.15%60,060b
Removing array references31,8767.13%60,044b

6. Global functions

Back on the warpath against string lookups. And there are several still to squash: for instance, the call to math.abs looks up the string “math” in the global table to find the maths library, then looks up the string “abs” in that table (well, actually the maths library is a special table-like object) to find the function to call.

This patch also hoists out the lookup of “date” in the global table, “imp” in the global table, and “wakeup” in the imp table – though, as those were only called once per update, the impact from those is less.

The rule of thumb is: any time you see a dot between two names, such as in “math.abs”, then that’s a string lookup. And any time you see a reference to something from the global table, such as the imp object or date function, then that’s a string lookup too. Accesses to local variables – whether those of the current function or “outers” from any function enclosing it – are not string lookups.

@@ -151,9 +151,12 @@ server.disconnect();
 
 local background = [0,0,0];
 
+local mabs = math.abs.bindenv(math);
+local getdate = date;
+local impwaker = imp.wakeup.bindenv(imp);
+
 local function proximity(a, b, limit) {
-//  return a == b
-  local diff = math.abs(a-b)
+  local diff = mabs(a-b);
   if (diff > 60-limit) {
     diff = diff - (60 - limit) + 1
   }
@@ -164,7 +167,7 @@ local function proximity(a, b, limit) {
 }
 
 function displaytime(){
-  local now = date()
+  local now = getdate()
   local hourled = (now.hour + 0) * 5 + (now.min / 12.0).tointeger()
   if (hourled > 59) {
     hourled = hourled - 60
@@ -193,7 +196,7 @@ function displaytime(){
     pixelWriter(i,red,green,blue);
   }  
   frameWriter();
-  imp.wakeup(0.25, displaytime)
+  impwaker(0.25, displaytime)
 }
 
 local getus = hardware.micros.bindenv(hardware);

Because math.abs gets called several times per pixel, the speed-up is quite significant: the time for the entire update is halved. The small increases in code size and memory usage are easily forgiven.

Time/μsCode sizeFree memory
Baseline49,3736.58%60,712b
Global variables44,6666.57%60,752b
Optimising writePixel36,4007.06%60,048b
Optimising writeFrame36,3967.17%59,908b
Optimising Neopixel calls34,0437.15%60,060b
Removing array references31,8767.13%60,044b
Global functions15,5637.31%59,752b

7. Hoist table lookups

The few remaining string lookups in displaytime are mostly references to members of the table returned by Squirrel’s date() function. As they’re used over and over again inside the loop, it should be faster to do each lookup just once, before entering the loop: in the compiler literature, this is called loop hoisting.

@@ -166,9 +166,13 @@ local function proximity(a, b, limit) {
   return 0
 }
 
-function displaytime(){
+local displaytime;
+displaytime = function(){
   local now = getdate()
-  local hourled = (now.hour + 0) * 5 + (now.min / 12.0).tointeger()
+  local hour = now.hour;
+  local min  = now.min;
+  local sec  = now.sec;
+  local hourled = (hour + 0) * 5 + (min / 12.0).tointeger()
   if (hourled > 59) {
     hourled = hourled - 60
   }
@@ -182,13 +186,13 @@ function displaytime(){
       green= (100 * hourprox * hourprox).tointeger()
       blue = 0
     }
-    local minprox = proximity(i, now.min, 2.0)
+    local minprox = proximity(i, min, 2.0)
     if (minprox != 0) {
       red = 0
       green = (100 * minprox * minprox).tointeger()
       blue = (100 * minprox * minprox).tointeger()
     }
-    if (proximity(i, now.sec, 0)) {
+    if (proximity(i, sec, 0)) {
       red = 100
       green = 0
       blue = 100
And it’s shaved a tiny bit more off the time.
Time/μsCode sizeFree memory
Baseline49,3736.58%60,712b
Global variables44,6666.57%60,752b
Optimising writePixel36,4007.06%60,048b
Optimising writeFrame36,3967.17%59,908b
Optimising Neopixel calls34,0437.15%60,060b
Removing array references31,8767.13%60,044b
Global functions15,5637.31%59,752b
Hoist table lookups14,6577.38%60,344b

8. Other optimisations

Right, the gloves are off now. Anything goes that will make this code faster, and readability be gosh-darned.

First of all, we can get rid of all the string lookups in the date structure by just using Squirrel’s time() call instead, which is just a few integer divisions away from giving us the right answer. (Or, in my case, the wrong answer – the code below incorporates a bugfix which in the Gist only appears in a later commit.)

Sort-of similarly, it turns out that open-coding the equivalent of the math.abs function is fractionally faster than actually calling it.

Then we can hoist the lookups in the background array out of the loop. These are integer lookups – not the expensive string sort – but every little helps. (And if you’re thinking, “Why not just get rid of background altogether?”, then the answer is that the actual impoclock has an agent.on() method for setting the background colour, which isn’t shown here because it’s not relevant to the optimisations.)

One final micro-optimisation is to observe that a pixel showing the second hand can’t also be showing the minute hand, and one showing the minute hand can’t also be showing the hour hand. So the ifs in the loop can be if..elses, and we can save about a gnat’s crotchet of time by skipping calculations which we know we aren’t going to use.

@@ -151,12 +151,14 @@ server.disconnect();
 
 local background = [0,0,0];
 
-local mabs = math.abs.bindenv(math);
-local getdate = date;
+local gettime = time;
 local impwaker = imp.wakeup.bindenv(imp);
 
 local function proximity(a, b, limit) {
-  local diff = mabs(a-b);
+  local diff = a-b;
+  if (diff < 0) {
+      diff = -diff;
+  }
   if (diff > 60-limit) {
     diff = diff - (60 - limit) + 1
   }
@@ -168,34 +170,32 @@ local function proximity(a, b, limit) {
 
 local displaytime;
 displaytime = function(){
-  local now = getdate()
-  local hour = now.hour;
-  local min  = now.min;
-  local sec  = now.sec;
+  local now = gettime()
+  local hour = (now/3600)%12;
+  local min  = (now/60)%60;
+  local sec  = now%60;
   local hourled = (hour + 0) * 5 + (min / 12.0).tointeger()
   if (hourled > 59) {
     hourled = hourled - 60
   }
+  local bgred = background[0];
+  local bggreen = background[1];
+  local bgblue = background[2];
   for(local i = 0; i<NUM_PIXELS; i++){
-    local red = background[0];
-    local green = background[1];
-    local blue = background[2];
-    local hourprox = proximity(i, hourled, 3.0)
-    if (hourprox != 0) {
-      red = (100 * hourprox * hourprox).tointeger()
-      green= (100 * hourprox * hourprox).tointeger()
-      blue = 0
-    }
-    local minprox = proximity(i, min, 2.0)
-    if (minprox != 0) {
-      red = 0
-      green = (100 * minprox * minprox).tointeger()
-      blue = (100 * minprox * minprox).tointeger()
-    }
+    local red = bgred, green = bggreen, blue = bgblue;
+    local hourprox, minprox;
     if (proximity(i, sec, 0)) {
-      red = 100
-      green = 0
-      blue = 100
+        red = 100
+        green = 0
+        blue = 100
+    } else if ((hourprox = proximity(i, hourled, 3.0)) != 0) {
+        red = (100 * hourprox * hourprox)
+        green= (100 * hourprox * hourprox)
+        blue = 0
+    } else if ((minprox = proximity(i, min, 2.0)) != 0) {
+        red = 0
+        green = (100 * minprox * minprox)
+        blue = (100 * minprox * minprox)
     }
     pixelWriter(i,red,green,blue);
   }  

And these final few things have between them eked out another 15% or so.

Time/μsCode sizeFree memory
Baseline49,3736.58%60,712b
Global variables44,6666.57%60,752b
Optimising writePixel36,4007.06%60,048b
Optimising writeFrame36,3967.17%59,908b
Optimising Neopixel calls34,0437.15%60,060b
Removing array references31,8767.13%60,044b
Global functions15,5637.31%59,752b
Hoist table lookups14,6577.38%60,344b
Other optimisations12,1967.44%60,392b
Total change-75.3%+13%+0.5%

9. So the final scores there

The impoclock update routine which, written straightforwardly, took nearly 50ms, got optimised down to a little over 12ms: four times as fast. Or, put differently, a peak update rate of 20 frames per second became 80 frames per second.

Most, though perhaps not quite all, of these sorts of optimisations can be applied quite widely to different Squirrel programs. And if you’re writing code that might get widely reused, you never know what timing requirements other users of your code will have, so it behoves you to optimise as much as is sensible. (Conversely, it’s not as sensible to optimise every microsecond out of a routine that nobody’s going to be calling all that often.)

The optimisations have come at some cost in readability, as well as in code size and memory usage. There’s not much that can be done about the readability – but optimising Squirrel code for size, as opposed to speed, will be the subject of another post.

Monday, 15 December 2014

Help, my X clipboard is broken

Whenever I drag-select anything in xterm, the selection immediately gets cleared. Similarly in XEmacs, even when using shift and arrow-keys to select, the selection vanishes straightaway.

Bashing all the modifier keys doesn't help. Switching to console (Ctrl-Alt-F2) and back doesn't help. Unplugging and re-plugging the keyboard doesn't help.

The culprit? gitk. For some reason, when you use the "Line diff"/"Markup words" drop-down, it sometimes mucks up the X clipboard so badly that you have to quit and restart gitk to restore sanity.

I actually figured this out once before, but then had forgotten by the time it happened to me again, so this time I thought I'd write it down.

Wednesday, 22 October 2014

Software Practices At Electric Imp

This post is an elaboration of a discussion held at Electric Imp’s headquarters in October 2013, where we ran through a collection of software engineering best-practices that various team members have found useful (whether actually at EI or at previous jobs) for producing robust pieces of software. None of these are rocket science, and none are all that innovative, but having them all written down in one place helps everyone know what to expect – especially new team-members such as contractors.

Coding standards

We have coding-style guides for the main languages we use: Erlang, Javascript and C++. It’s a specific goal that we shouldn’t be able to tell, just by looking at the style of a section of code, which of us wrote it. We should probably have style guides for our other languages too: Squirrel, Objective-C, Java/Dalvik.

Each should resemble the accepted standards (or at least an accepted standard) for the language in question – as opposed to resembling each other. Our Javascript should look like idiomatic Javascript, not like idiomatic C++.

Of course, sometimes this isn’t possible. Both in Javascript and in C++, our code relies on and interacts with a large number of third-party components – which in many cases are themselves written in different styles.

Code reviews

All production code gets reviewed. Where convenient (developers in the same office), it gets reviewed before git push, but sometimes that’s not convenient (remote developers), so review-after-push is the best we can do. But where the change is large enough to have been done on a feature branch, we can get the best of both worlds by pushing that branch to the central server and doing the review before merge to master.

One obvious benefit of code reviews, is that a second pair of eyes checks over the code for things the original developer didn’t spot. (If done thoughtfully, it can be a great learning tool for more junior members of the team.)

But that isn’t actually the main benefit. The best thing about code reviews, is that they spread out knowledge about the codebase among all the team members, helping to avoid the situation where individual developers have their own “fiefdoms”, or code that only they know about. It’s the idea of the “Smallest Bus Queue Accident”, the slightly ghoulish concept of the number of team-members who would need to simultaneously be hit by a bus for the project to be imperilled. If everyone’s an irreplaceable specialist, SBQA=1, and even just one accident (or poaching, or mid-life crisis) will leave you in trouble. If everyone is familiar with everything (at least enough to maintain it), SBQA=n, the team size, and long-term risk to the project is much lower.

The film “Jurassic Park” is, at some level, a commentary on just how badly things can go wrong in a system with an inadequate (absent) culture of code review.

Unit tests

All commits that change production code, should also contain a unit-test for the change. All the unit-tests are run in every build of the codebase (top-level scons invocation). The primary and most obvious reason for this, is that we’ve all written code that flat-out didn’t do what we intended, and having our assumptions checked automatically before git push keeps those misunderstandings out of the master sources. But it turns out that there are also other benefits of cultivating a habit of unit-testing.

For one thing, a healthy collection of unit-tests can act as documentation for the code under test, showing its expected behaviour and suggesting which parts of its behaviour are intended and essential, and which are accidental and possibly unwanted. Unlike many other forms of documentation (including, uh, the form of the blog), documentation-by-unit-test is guaranteed to be up-to-date, as the build breaks any time it doesn’t match the implementation.

Another benefit of unit-testing is that it acts as a forcing-function for modularity: if it’s hard to see how to test your code one part at a time, because you can’t get in between the parts, then that’s a warning sign that the parts are more tightly-coupled is healthy for maintainability.

There is, though, a trade-off there, a judgement call to be made. In C++ there’s a cost associated with virtual functions (which are needed for mocking and for test doubles in general), and there’s also a human cost associated with the added complexity of introducing test doubles for every single class (even if they’re systematically named and located, that’s still doubling the number of classes in any library). But in general the design of the system should offer enough modularity to provide an adequate sprinkling of seams for introducing test doubles, which means that in general extracting an interface solely so that it can be mocked-out, is frowned upon. In Javascript, the performance cost part doesn’t apply (all functions are, in C++ terms, “virtual”, and so all dependencies can be mocked), but it may still be worth choosing carefully how large the “unit” in each unit-test is, with the overarching goal of retaining readability of the codebase as a whole. That is to say, the “unit” might most easily be more than one class, or more than one file.

Judgement is also needed when writing the tests themselves. It’s all too easy to write tests that are either too precise and thus too fragile against correct changes to the code, or too vague and thus capable of missing incorrect changes. One of our developers was unlucky enough to encounter a bogus test in the very first change he made to our C++ codebase: adding some debug logging of the imp’s current list of servers. A unit-test started failing because getServer() was being called “too many” times. But pretty clearly, it wasn’t part of the external contract of the class how many times it called getServer() – the test was too precise. The mantra should be: test the functionality of the unit, not the implementation. If I want to know whether the implementation is the same as it was yesterday, I’ll use diff – I want the tests to tell me whether it’s fulfilling its external contract the same way it was yesterday, following any changes to its implementation. Practice makes perfect, here: the more tests you write, the better you get at judging these things.

There’s a school of thought that says that tests are so important that you should write them first, before the implementation: test-driven development. In our experience that works well for some tasks – self-contained, purely algorithmic ones – and less well for others. For instance, when writing driver code, the largest source of bugs is an incorrect mental model of how the hardware works: but as the same developer writes the tests then the code, it’s too easy for the tests and the code to be consistent with each other but both wrong. Where it does work, though, it works brilliantly: the imp’s memory allocator was written test-first – and, as it’s a well-defined and well-understood problem, writing the tests was relatively easy.

Test-driven development is as much about improving the quality of the tests as it is of the code. If you know (or think you know) that some particular functionality will be needed in the implementation, but the tests you’ve written so far haven’t exercised it yet, it does make you stop and think: how do I write a test for that – how do I get the system into that state? Either it’s forcing you to come up with a test you didn’t realise you needed – or, even better, it’s telling you that your expected elaboration wasn’t needed at all.

All told, I don’t think it’s a huge exaggeration to say, “If you don’t have tests, you don’t really have the code” – because without the tests, you can’t do anything with it: the path to being able to refactor it, is nearly as long as the path to writing it again from scratch.

System tests

Unit tests are good at ensuring that individual modules fulfill their contracts. But the task of writing quality software is, sadly, still not done at that point: assembling these contracts together to form an overall system is still a human process, fertile with opportunities for human error. Unit testing reduces the risk of inter-module bugs, but cannot eliminate it altogether.

So at some point you’re going to have to test larger parts of the system. Michael Feathers’ rules of thumb as to what is a unit test and what isn’t, are widely quoted: the key distinction being, a unit test should be basically instantaneous and invisible, at least when successful. System tests are a bit more heavyweight, and might be more involved to set up: for instance, target hardware might be needed.

In theory, system testing proceeds via structural induction: prove that component A works, then prove that A+B works, then A+B+C, and so on until you’ve tested the whole software system. In practice, it usually suffices to test major components, and then the system as a whole: for instance, the imp hardware, then the imp server, then end-to-end tests of the whole enchilada.

Just as unit tests can be viewed as documentation for the code, system tests can be viewed as high-level specifications of the entire system’s behaviour: “when an imp contacts the server, it’s sent the appropriate Squirrel for the device it’s plugged into”.

Static analysis

The big idea with static analysis (and, for that matter, dynamic analysis), is this: for any property that your code must have in order to be correct, it’s more reliable to have an automated check that your code has that property, than it is to rely on every programmer to manually keep it in mind the whole time. For instance, Javascript: The Good Parts – a book conspicuously much thinner than “Javascript: The Definitive Guide” – is really about avoiding language features that cause misleading or buggy code. Most of the traps the author describes, can be detected by the JSLint or JSHint static-analysis tools; running one of those tools before committing, can thus keep questionable or hard-to-maintain constructs out of the codebase.

In the wild-west days of C, similar coding traps were caught by a program called lint – though when C++ came along,

Lots of warnings:

g++ -Wall -Wextra -Werror -Wundef -Wno-unused-parameter -Woverloaded-virtual -Wlogical-op -fstrict-enums -Wno-long-long -Wpointer-arith -Wnon-virtual-dtor -Wno-sign-conversion -Wunused-but-set-variable
it designed out most of the need for lint, and incorporated much of the rest into the compiler. Some of it is in the compiler only in optional warnings or errors – so we always run GCC in a mode with lots of warnings and errors. (On projects that also use other compilers – MSVC for example – it’s a good idea to turn on all the warnings in all of them, as every compiler detects different problems. But here at EI, all our targets are GCC-based.) We’re happy to keep looking, but we’ve yet to find a lint-like tool for C++ that finds real bugs.

Other languages have their own static-analysis tools: Erlang has Dialyzer, and XCode has some stuff for Objective-C. Though more are always welcome: for instance, code written against node.js makes widespread use of callbacks as part of asynchronous APIs. So it’s a correctness constraint that any routine that takes a parameter called cb, must call it exactly once (not twice, not no times) on every path through the function. That should, in theory, be amenable to automated testing (at least for most easy cases).

Dynamic analysis

Dynamic analysis tools also tend to be language-specific. In C++, the use of Valgrind is well-known; Helgrind, bundled with Valgrind, can be awkward to use but can also be invaluable; and mutrace and perf are helpful at that stage of project maturity when the question becomes, “yes, yes, it works, but why isn’t it faster?” (That’s usually a pretty late stage of maturity, on the timeless basis that it’s easier to optimise correct code, than it is to correct optimised code.)

The one act of dynamic analysis that all types of code can enjoy, though, is code-coverage analysis. Code-coverage analysis is the ugly cousin of test-driven development, but is good enough for a rainy weekend in Norfolk. Developing while keeping an eye on the coverage statistics is almost, but not quite, as reliable as test-driven development as a method of ensuring that your code has good-quality tests.

Continuous integration

We use the very splendid Jenkins continuous-integration server (autobuilder). Its main responsibility is running, after each commit, all the tests and analyses that would be too time-consuming for developers to run before each commit. Or too awkward – Jenkins runs all the system tests that require special hardware, and also runs builds on all supported host platforms.

The quality bar for pushes to master, is clean runs on all of these test suites; any failures are stop-the-line emergencies. If a build or tests is failing, the very next push must be the fix, or other developers can’t continue pushing (because they can’t know whether their own work passes that test or not). This is usually known as “do not commit on red” – although with Git, it’s actually the “push”, not “commit”, operation that’s the relevant one. Because of that, a failing build means nobody else can make progress, so fixing it is viewed as an emergency.

Using Pivotal

We use Pivotal Labs’ Pivotal Tracker to keep track of work items. As Electric Imp is spread out over several time-zones, it’s important to have an online tool that we can use to look at the status of projects or tasks without having to bother actual human beings. We looked at a few online Agile tools, and the simplicity and easy visibility of Pivotal made it definitely the best.

For those who haven’t seen it, it arranges tasks into (basically) three columns: “current”, “backlog”, and “icebox”. “Current” is for the current sprint, “backlog” is for upcoming sprints, and “icebox” for everything else. Tasks can be moved around and re-ordered by drag-and-drop. A task is either a user-story, a “chore” (which we use for refactoring or technical-debt tasks that aren’t end-user-visible), or a bug.

We use this system to work on tasks in priority order, as prioritised by the product owner:

  • Anyone can add new stories, chores, or bugs to a “new tasks here” section in the icebox.
  • The product owner has a think about the new task (if it’s a story or a bug), and prioritises it either into the backlog, or into a “nice to haves” section in the icebox.
  • If it’s a chore, the developers prioritise it between themselves (typically, just before a story which would be aided by the cleanup, or which would need re-doing if done before the cleanup).
  • As developers finish up previous tasks, they pick the topmost one in the backlog each time. (We don’t really deal in “sprints” like canonical Agile – it’s more task-by-task, like kanban.)

Using Git

Unlike, say, Subversion, Git makes having a neat project history achievable. So we try to achieve it.

Bow-shaped feature branches let us view individual features either entire or as composed of a series of patches. That’s good for cherry-picking, it’s good for (ahem) reverting, and it’s good for rereading old commits to understand the origins of a piece of code (or of a bug). In general we name single-developer feature branches with the developer’s initials plus a very short indicator of the feature, such as pdh-discovery or rs-schema. (The developer’s initials serve as a kind of “watch out, I might rebase this branch at any moment”.)

The server and the client have different needs from their respective release processes, mainly because of “in-the-drawer” syndrome: previous client releases remain important in perpetuity, because a user could stick one in a drawer, forget about it, remember a year later, and then try and connect it to modern-day servers. The same doesn’t apply the other way around: previous server releases are, to a large extent, yesterday’s newspapers.

Ensure it all happens

We’re the engineering department; we’ve all been hired to do engineering. It’s pretty likely that we’re the last line of defence for things being well-engineered and done right: very few customers or project-managers have ever been heard to say, “Yes, yes, but can’t you take a bit longer and do it properly?” – indeed, many have been heard to say the exact opposite.

So for instance, adding automated tests is always an integral part of the development process – whether it’s done strictly before coding the functionality, as the test-driven school would say, or in a more intermingled way. It’s not an optional extra. Developers incorporate the building of tests into the original development estimates, and project owners don’t (shouldn’t) accept a story as “done” unless the testing is in-place too.

Being the engineering department, we don’t necessarily get to decide the balance between engineering goals and commercial goals (in fact, we probably don’t want to do that). But it does behove us to make sure that the people who do make those decisions, know what the engineering consequences will be.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.