Whitebait, Kleftiko, Ekmek Special

Wednesday, 13 April 2016

How to create an SSH-only shell account in Ubuntu

Because I keep forgetting.

$ sudo adduser --gecos 'Fred Smith' -p '*' fred
$ cd /home/fred
$ sudo mkdir .ssh
$ sudo cp ~/fred.public.key .ssh/authorized_keys
$ sudo chmod 0700 .ssh
$ sudo chown -R fred.fred .ssh

If Fred’s public key starts “BEGIN SSH2 PUBLIC KEY” then turn it into the one-line form first.

[2025 edit:] A previous version of this post suggested the --disabled-password option to adduser – but since then, OpenSSH has been changed to regard such accounts (with password set to “!”) as completely locked and unavailable even to SSH. Accounts with password set to “*” are still available to SSH, though, even though they cannot be used for direct login.

Tuesday, 25 August 2015

C++: Deduced templates are smarter than explicit templates

Suppose you’re writing a C++ class that encapsulates the idea of a callback, a routine that’s invoked when an asynchronous operation completes. (For instance, such a thing would be useful when implementing promises.) Effectively what you want is a lightweight version of std::function, perhaps a bit like this:

template <typename RET, typename ...ARGS>
class Callback
{
    typedef RET (*pfn)(void*, ARGS...);

    void *m_ptr;
    pfn m_pfn;

public:
    Callback() :  m_ptr(NULL), m_pfn(NULL) {}
    Callback(void *ptr, pfn ppfn) : m_ptr(ptr), m_pfn(ppfn) {}

    RET operator()(ARGS... args) const { return (*m_pfn)(m_ptr, args...); }
};

And here’s what a corresponding callback-using interface might look like:

typedef Callback<unsigned int, unsigned int, unsigned int, const std::string&> BDC;

class CDA
{
public:
    unsigned browse(uint32_t starting_index, BDC callback)
    {
        return callback(2, starting_index+1, "hello");
    }
};

All that, of course, uses C++11 variadic templates, in order to encompass callbacks with zero, one or several arguments. But in fact the technique described here works equally well with good old C++98 (where you’d need separate callback classes for each different number of arguments).

Now the client – the code supplying the callback – is itself probably a C++ class, in which case we don’t want to pass a bare C function pointer into Callback’s constructor. What we want is the ability to pass a method of our object as a Callback – or, in other words, to bind a method and a class instance together so that they function as a Callback.

So how do you write this bind() function? What you’d like to write is something like this (where the final template parameter is not a type, but the pointer-to-member itself, thus ensuring that a different Callback is instantiated for each different method called):

template <typename T, typename RET, typename ...ARGS, RET (T::*FN)(ARGS...)>
Callback<RET, ARGS...> bind(T *t)
{
    //...
}

That would probably work, but would look a mess at the call site, because you’d have to explicitly state all the template parameters:

auto c = bind<unsigned, unsigned, unsigned, std::string&, &Foo::onBrowseDone>(this);

Really you’d like all those other types to be worked out from the pointer-to-member, whose own type does after all incorporate all the information:

auto c = bind<&Foo::onBrowseDone>(this);

Sadly you can’t do that, because you can’t parameterise bind() on just the pointer-to-member – because you can’t even name the necessary pointer-to-member type without using the names of other types, which aren’t defined yet:

template <RET (T::*FN)(ARGS...)>  // doesn't compile
Callback<RET,ARGS...> bind(T *t)
{
    //...
}

Such are promises: all lies and jest.

But wait: it turns out that there are template invocations in C++ (both ’98 and ’11) that are more powerful than these explicit invocations. These are the deduced templates, and they happen in two contexts: when a call is made to an overloaded function, and the parameter types are used to pick which overload to use; and when a partially-specialised class or function is used, and a particular specialisation must be chosen. The first of those doesn’t help us much, but we can work with the second.

If we declare a class Binder, parameterised on any type, but then specialise it for the generic type of pointers-to-member, then the C++ compiler is required to use what is essentially a full functional-programmer’s unification algorithm to match the caller’s type against the list of specialisations. This unification is much like what you see in ML’s pattern-matching, or Erlang’s “=” operator. Truly devious people use it for things such as SFINAE and template meta-programming, but here we can get it to deduce all the other needed types from the pointer-to-member’s type alone:

template <typename T>
class Binder;

template <typename RET, typename T, typename ...ARGS>
class Binder<RET (T::*)(ARGS...)>
{
    template <RET (T::*FN)(ARGS...)>
    static RET caller(void *ptr, ARGS... args)
    {
        return (((T*)ptr)->*FN)(args...);
    }

public:
    template <RET (T::*FN)(ARGS...)>
    static Callback<RET,ARGS...> bind(T *t)
    {
        return Callback<RET,ARGS...>((void*)t, &caller<FN>);
    }
};

#define BIND(x,y) Binder<typeof(x)>::bind<x>(y)

It’s not very obvious what’s up with the template parameters there, so to make it explicit: first we declare class Binder as a template class with one template parameter, which is a type. Then we specialise class Binder for those types which match the pattern RET (T::*)(ARGS...) for arbitrary RET, T, and ARGS – which is precisely the types of pointers-to-member. (And that’s it: there’s only one specialisation of Binder.)

Inside class Binder is static method bind(), which is a template method: it’s parameterised again. This time the template parameter is not a type: it’s a non-type template parameter, a value, an actual pointer-to-member. So both bind() and caller() are instantiated once for each different member called.

That difference between the two template parameters, of Binder and bind(), isn’t very obvious from the syntax – but compare it to this slightly contrived but analogous example, where again the outer template argument is a type, and the inner one is a value of that type:

template <typename T>
struct IntFuncs
{
    template <T N>
    static T IntFunc()
    {
        return N;
    }
};

unsigned answer = IntFuncs<unsigned>::IntFunc<42u>();

The only awkwardness when using class Binder – the only thing separating it from the idealised call to bind() – is the need to specify the pointer-to-member twice: once as a type, once as a value. But that repetition can be parcelled-up in a macro, as seen in BIND() above. And now the client code looks very straightforward:

class Foo
{
    unsigned onBrowseDone(unsigned int, unsigned int, const std::string&);

public:
    void fnord(CDA *cda)
    {
        cda->browse(3, BIND(&Foo::onBrowseDone, this));
    }
};

Once again the Law of Conservation of Ick applies: we’ve removed ickiness from both the API (the callback-caller) and the client (the callback-supplier), while somewhat concentrating it in the definitions of classes Callback and particularly Binder. But as Binder only needs writing once (or indeed even less often than that, if you just nick it from this blog post, Creative Commons Zero and all that) and it’s generic enough to be used be many APIs and clients, that seems like a good trade-off.

Friday, 19 December 2014

Optimising Squirrel for speed

TL;DR: avoiding slow Squirrel constructs in favour of faster ones, makes this program run four times faster.

Electric Imp’s little embedded WiFi gadgets are programmed in a scripting language called Squirrel. Like Javascript or Python, Squirrel is compiled to bytecode and executed in an interpreter. This brings lots of advantages – notably, that the interpreter acts as a sandbox and allows crashes and other bugs to be debugged remotely – but sometimes, in embedded systems, the performance consequences of that design can be an issue.

So it’s helpful to know how to write Squirrel that runs as fast as possible on the Electric Imp platform. Electric Imp’s own website contains some generic guidance, but it’s helpful to see it applied to a real example: in this case the impoclock, a ring of sixty WS2812B “Neopixels” implementing an analogue clock.

You can follow the story as a Github gist: starting with the baseline, which consists mostly of the stock NeoPixels class from the Electric Imp examples repository. (And perfectly good example code it is too: the optimisation methods in this post will make the code faster, but also less readable and so less suitable to learn from or adapt.) You can also sneak a peek at the finished version as a Github gist.

0. Baseline

As always, there’s no point trying to optimise something you can’t measure: optimisations applied blindly could as likely be unhelpful as helpful. So the first patch does nothing except add code to time one iteration of the main displaytime routine, and also to log the available memory:

@@ -132,6 +132,8 @@ pixel <- NeoPixels(spi, NUM_PIXELS)
 
 server.log(imp.getsoftwareversion() + " / " + imp.getmacaddress())
 
+server.disconnect();
+
 background <- [0,0,0]
 
 function proximity(a, b, limit) {
@@ -177,4 +179,9 @@ function displaytime(){
   imp.wakeup(0.25, displaytime)
 }
 
+local getus = hardware.micros.bindenv(hardware);
+local us = getus();
 displaytime()
+us = getus() - us;
+local freemem = imp.getmemoryfree();
+server.log(us+" "+freemem);

(The version history is also visible as Gist history, each section of this post as a separate commit.) The code disconnects from the server before taking the measurements: this helps with getting consistent results, as no WiFi interrupts can change the timings, nor any network buffers change the memory usage. The baseline measurements are:

Time/μs	Code size	Free memory
49,373	6.58%	60,712b

That’s about 50,000 microseconds, or 50 milliseconds, or a twentieth of a second to update sixty Neopixels: actually plenty for an analogue clock with a one-second resolution, but other Neopixel applications might be more demanding.

1. Global variables

So let’s get amongst it. The main thing about optimising Squirrel, is that string lookups are expensive. In Squirrel, table members are accessed by string lookup – and that includes members of the global table. However, local variables are accessed more rapidly by (integer) stack offset – and that includes locals of the implicit top-level function. That suggests that it would make a difference just to turn some of the global variables into locals:

@@ -128,15 +128,15 @@ const NUM_PIXELS = 60
 
 spi <- hardware.spi257
 spi.configure(MSB_FIRST, 7500)
-pixel <- NeoPixels(spi, NUM_PIXELS)
+local pixel = NeoPixels(spi, NUM_PIXELS)
 
 server.log(imp.getsoftwareversion() + " / " + imp.getmacaddress())
 
 server.disconnect();
 
-background <- [0,0,0]
+local background = [0,0,0];
 
-function proximity(a, b, limit) {
+local function proximity(a, b, limit) {
 //  return a == b
   local diff = math.abs(a-b)
   if (diff > 60-limit) {

And indeed this does make a measurable difference: it’s already almost ten per cent faster.

	Time/μs	Code size	Free memory
Baseline	49,373	6.58%	60,712b
Global variables	44,666	6.57%	60,752b

2. Optimising writePixel

String lookups definitely seem like a profitable area of investigation. So let’s look at all the string lookups done inside the writePixel method: there are lots, both to look up frame and bits in the surrounding class, and to look up the seek and writeblob members of the blob object.

All of these can be replaced by references to local variables. Squirrel allows functions to refer to local variables in enclosing functions: this can be used to collect together a collection of values plus a function, in such a way that the function always gets called with those values available. In this patch, the variables frameSeeker, frameWriter, and refBits are local variables of the NeoPixels constructor: this means that under normal circumstances, they would evaporate at the end of the constructor call. But here, because they’re referenced inside the function assigned to this.pixelWriter, they are “captured” as “outers”, their lifetime continues, and they remain available to all future calls to the pixelWriter function. The whole collection – the function plus its collected-together variables – is called a closure, and the function is described as having closed over the collected-together variables; similar concepts (differing sometimes in details) exist in other dynamic languages, such as Javascript.

This is probably the least idiomatic and most confusing optimisation in the whole post, but it’s an important one and is worth taking the time to understand. Access to local variables, even “outers”, is really fast – and the new pixelWriter, unlike the old writePixel, refers only to local variables.

@@ -51,6 +51,8 @@ class NeoPixels {
  frameSize       = null; // number of pixels per frame
  frame           = null; // a blob to hold the current frame
 
+ pixelWriter     = null;
+
  // _spi - A configured spi (MSB_FIRST, 7.5MHz)
  // _frameSize - Number of Pixels per frame
     
@@ -66,6 +68,18 @@ class NeoPixels {
         
   clearFrame();
   writeFrame();
+
+  local frameSeeker = frame.seek.bindenv(frame);
+  local frameWriter = frame.writeblob.bindenv(frame);
+  local refBits = bits;
+  this.pixelWriter = function(p, color) {
+      frameSeeker(p*BYTESPERPIXEL);
+
+      // red and green are swapped for some reason, so swizzle them back
+      frameWriter(refBits[color[1]]);
+      frameWriter(refBits[color[0]]);
+      frameWriter(refBits[color[2]]);
+  }
  }
     
  // fill the array of representative 1-wire waveforms. 
@@ -99,13 +113,7 @@ class NeoPixels {
  // color is an array of the form [r, g, b]
     
  function writePixel(p, color) {
-  frame.seek(p*BYTESPERPIXEL);
-        
-  // red and green are swapped for some reason, so swizzle them back 
-        
-  frame.writeblob(bits[color[1]]);
-  frame.writeblob(bits[color[0]]);
-  frame.writeblob(bits[color[2]]);    
+     pixelWriter(p, color);
  }
     
  // Clears the frame buffer

The results justify all that effort: the code is now more than 25% faster than when we started. (It uses a bit more memory and code space, but for many purposes that’s a fair exchange.)

	Time/μs	Code size	Free memory
Baseline	49,373	6.58%	60,712b
Global variables	44,666	6.57%	60,752b
Optimising writePixel	36,400	7.06%	60,048b

3. Optimising writeFrame

The same trick applies to the writeFrame method. In fact, the original method can be replaced completely: notice that client code which calls pixel.writeFrame() doesn’t need to change. The syntax for calling class-members which are functions, is the same as for calling methods.

@@ -52,6 +52,7 @@ class NeoPixels {
  frame           = null; // a blob to hold the current frame
 
  pixelWriter     = null;
+ writeFrame      = null;
 
  // _spi - A configured spi (MSB_FIRST, 7.5MHz)
  // _frameSize - Number of Pixels per frame
@@ -61,6 +62,15 @@ class NeoPixels {
   this.frameSize = _frameSize;
   this.frame = blob(frameSize*BYTESPERPIXEL + 1);
   this.frame[frameSize*BYTESPERPIXEL] = 0;
+
+  local spiWriter = _spi.write.bindenv(_spi);
+  local refFrame = frame;
+
+                // writes the frame buffer to the pixel strip
+                // ie - this function changes the pixel strip
+  this.writeFrame = function() {
+      spiWriter(refFrame);
+  }
         
   // prepare the bits array and the clearblob blob
         
@@ -123,13 +133,6 @@ class NeoPixels {
   frame.seek(0);
   for (local p = 0; p < frameSize; p++) frame.writeblob(clearblob);
  }
-    
- // writes the frame buffer to the pixel strip
- // ie - this function changes the pixel strip
-    
- function writeFrame() {
-  spi.write(frame);
- }
 }
 
 const NUM_PIXELS = 60

As the example code only calls writeFrame once per iteration – as opposed to the sixty times it calls writePixel – the results are a lot less dramatic. Still a slight improvement, though. The remaining methods of class NeoPixels aren’t called even once per frame – just at startup – so there’s little point optimising them.

	Time/μs	Code size	Free memory
Baseline	49,373	6.58%	60,712b
Global variables	44,666	6.57%	60,752b
Optimising writePixel	36,400	7.06%	60,048b
Optimising writeFrame	36,396	7.17%	59,908b

4. Optimising Neopixel calls

The faster pixelWriter function sped up writePixels without changing the API to class NeoPixels. But there’s further speed gains to be had by changing the client code to use it directly. The same is true, to a lesser extent, of writeFrame.

No bindenv is needed for pixelWriter, because it wasn’t a member-function to start with – just a plain free-standing function.

Also in this patch, string lookups for the class-member BYTESPERPIXEL are replaced with the use of a Squirrel const. Consts, like locals, are fast. (This part of the patch really belonged in Stage 2, but I was confused by the SHOUTY name into thinking that BYTESPERPIXEL was already a constant.)

The patch follows the convention that constants belonging to a particular class – which can’t, in Squirrel, be declared as consts inside the class – are given names starting with the class’s name. The same could also be done for the other unchanging values, ONE and NONE – but they’re only used at startup, so the update speed wouldn’t be improved.

@@ -1,3 +1,6 @@
+
+const NEOPIXELS_BYTESPERPIXEL = 24;
+
 class NeoPixels {
     
  // Driver for the World Semi WS2812 - aka Adafruit NeoPixel
@@ -33,7 +36,6 @@ class NeoPixels {
     
  ZERO            = 0xC0;
  ONE             = 0xF8;
- BYTESPERPIXEL   = 24;
     
  // when instantiated, the neopixel class will fill this array with blobs to 
  // represent the waveforms to send the numbers 0 to 255. This allows the blobs to be
@@ -60,8 +62,8 @@ class NeoPixels {
  constructor(_spi, _frameSize) {
   this.spi = _spi;
   this.frameSize = _frameSize;
-  this.frame = blob(frameSize*BYTESPERPIXEL + 1);
-  this.frame[frameSize*BYTESPERPIXEL] = 0;
+  this.frame = blob(frameSize*NEOPIXELS_BYTESPERPIXEL + 1);
+  this.frame[frameSize*NEOPIXELS_BYTESPERPIXEL] = 0;
 
   local spiWriter = _spi.write.bindenv(_spi);
   local refFrame = frame;
@@ -83,7 +85,7 @@ class NeoPixels {
   local frameWriter = frame.writeblob.bindenv(frame);
   local refBits = bits;
   this.pixelWriter = function(p, color) {
-      frameSeeker(p*BYTESPERPIXEL);
+      frameSeeker(p*NEOPIXELS_BYTESPERPIXEL);
 
       // red and green are swapped for some reason, so swizzle them back 
       frameWriter(refBits[color[1]]);
@@ -100,7 +102,7 @@ class NeoPixels {
         
   bits = array(256);
   for (local i = 0; i < 256; i++) {
-   local valblob = blob(BYTESPERPIXEL / 3);
+   local valblob = blob(NEOPIXELS_BYTESPERPIXEL / 3);
    valblob.writen((i & 0x80) ? ONE:ZERO,'b');
    valblob.writen((i & 0x40) ? ONE:ZERO,'b');
    valblob.writen((i & 0x20) ? ONE:ZERO,'b');
@@ -113,7 +115,7 @@ class NeoPixels {
   }
         
   // now fill the clearblob
-  for(local j = 0; j < BYTESPERPIXEL; j++) {
+  for(local j = 0; j < NEOPIXELS_BYTESPERPIXEL; j++) {
    clearblob.writen(ZERO, 'b');
   }
  }
@@ -140,6 +142,8 @@ const NUM_PIXELS = 60
 spi <- hardware.spi257
 spi.configure(MSB_FIRST, 7500)
 local pixel = NeoPixels(spi, NUM_PIXELS)
+local pixelWriter = pixel.pixelWriter;
+local frameWriter = pixel.writeFrame;
 
 server.log(imp.getsoftwareversion() + " / " + imp.getmacaddress())
 
@@ -184,9 +188,9 @@ function displaytime(){
       colour[1] = 0
       colour[2] = 100
     }
-    pixel.writePixel(i,colour)
+    pixelWriter(i,colour)
   }  
-  pixel.writeFrame()
+  frameWriter();
   imp.wakeup(0.25, displaytime)
 }

Even this small change makes another 6% difference.

	Time/μs	Code size	Free memory
Baseline	49,373	6.58%	60,712b
Global variables	44,666	6.57%	60,752b
Optimising writePixel	36,400	7.06%	60,048b
Optimising writeFrame	36,396	7.17%	59,908b
Optimising Neopixel calls	34,043	7.15%	60,060b

5. Removing array references

In Squirrel, arrays such as colour are reference-counted, heap-allocated objects. That sounds a bit expensive: perhaps it’d be faster to pass the three elements (red, green and blue) around separately.

(This was also the motivation for making pixelWriter separate from writePixel: this way, the existing API of class NeoPixels isn’t changed.)

@@ -84,13 +84,13 @@ class NeoPixels {
   local frameSeeker = frame.seek.bindenv(frame);
   local frameWriter = frame.writeblob.bindenv(frame);
   local refBits = bits;
-  this.pixelWriter = function(p, color) {
+  this.pixelWriter = function(p, r,g,b) {
       frameSeeker(p*NEOPIXELS_BYTESPERPIXEL);
 
       // red and green are swapped for some reason, so swizzle them back 
-      frameWriter(refBits[color[1]]);
-      frameWriter(refBits[color[0]]);
-      frameWriter(refBits[color[2]]);
+      frameWriter(refBits[g]);
+      frameWriter(refBits[r]);
+      frameWriter(refBits[b]);
   }
  }
     
@@ -125,7 +125,7 @@ class NeoPixels {
  // color is an array of the form [r, g, b]
     
  function writePixel(p, color) {
-     pixelWriter(p, color);
+     pixelWriter(p, color[0], color[1], color[2]);
  }
     
  // Clears the frame buffer
@@ -170,25 +170,27 @@ function displaytime(){
     hourled = hourled - 60
   }
   for(local i = 0; i<NUM_PIXELS; i++){
-    local colour = [background[0], background[1], background[2]]
+    local red = background[0];
+    local green = background[1];
+    local blue = background[2];
     local hourprox = proximity(i, hourled, 3.0)
     if (hourprox != 0) {
-      colour[0] = (100 * hourprox * hourprox).tointeger()
-      colour[1] = (100 * hourprox * hourprox).tointeger()
-      colour[2] = 0
+      red = (100 * hourprox * hourprox).tointeger()
+      green= (100 * hourprox * hourprox).tointeger()
+      blue = 0
     }
     local minprox = proximity(i, now.min, 2.0)
     if (minprox != 0) {
-      colour[0] = 0
-      colour[1] = (100 * minprox * minprox).tointeger()
-      colour[2] = (100 * minprox * minprox).tointeger()
+      red = 0
+      green = (100 * minprox * minprox).tointeger()
+      blue = (100 * minprox * minprox).tointeger()
     }
     if (proximity(i, now.sec, 0)) {
-      colour[0] = 100
-      colour[1] = 0
-      colour[2] = 100
+      red = 100
+      green = 0
+      blue = 100
     }
-    pixelWriter(i,colour)
+    pixelWriter(i,red,green,blue);
   }  
   frameWriter();
   imp.wakeup(0.25, displaytime)

And yes, it turns out heap objects are slower than the alternative.

	Time/μs	Code size	Free memory
Baseline	49,373	6.58%	60,712b
Global variables	44,666	6.57%	60,752b
Optimising writePixel	36,400	7.06%	60,048b
Optimising writeFrame	36,396	7.17%	59,908b
Optimising Neopixel calls	34,043	7.15%	60,060b
Removing array references	31,876	7.13%	60,044b

6. Global functions

Back on the warpath against string lookups. And there are several still to squash: for instance, the call to math.abs looks up the string “math” in the global table to find the maths library, then looks up the string “abs” in that table (well, actually the maths library is a special table-like object) to find the function to call.

This patch also hoists out the lookup of “date” in the global table, “imp” in the global table, and “wakeup” in the imp table – though, as those were only called once per update, the impact from those is less.

The rule of thumb is: any time you see a dot between two names, such as in “math.abs”, then that’s a string lookup. And any time you see a reference to something from the global table, such as the imp object or date function, then that’s a string lookup too. Accesses to local variables – whether those of the current function or “outers” from any function enclosing it – are not string lookups.

@@ -151,9 +151,12 @@ server.disconnect();
 
 local background = [0,0,0];
 
+local mabs = math.abs.bindenv(math);
+local getdate = date;
+local impwaker = imp.wakeup.bindenv(imp);
+
 local function proximity(a, b, limit) {
-//  return a == b
-  local diff = math.abs(a-b)
+  local diff = mabs(a-b);
   if (diff > 60-limit) {
     diff = diff - (60 - limit) + 1
   }
@@ -164,7 +167,7 @@ local function proximity(a, b, limit) {
 }
 
 function displaytime(){
-  local now = date()
+  local now = getdate()
   local hourled = (now.hour + 0) * 5 + (now.min / 12.0).tointeger()
   if (hourled > 59) {
     hourled = hourled - 60
@@ -193,7 +196,7 @@ function displaytime(){
     pixelWriter(i,red,green,blue);
   }  
   frameWriter();
-  imp.wakeup(0.25, displaytime)
+  impwaker(0.25, displaytime)
 }
 
 local getus = hardware.micros.bindenv(hardware);

Because math.abs gets called several times per pixel, the speed-up is quite significant: the time for the entire update is halved. The small increases in code size and memory usage are easily forgiven.

	Time/μs	Code size	Free memory
Baseline	49,373	6.58%	60,712b
Global variables	44,666	6.57%	60,752b
Optimising writePixel	36,400	7.06%	60,048b
Optimising writeFrame	36,396	7.17%	59,908b
Optimising Neopixel calls	34,043	7.15%	60,060b
Removing array references	31,876	7.13%	60,044b
Global functions	15,563	7.31%	59,752b

7. Hoist table lookups

The few remaining string lookups in displaytime are mostly references to members of the table returned by Squirrel’s date() function. As they’re used over and over again inside the loop, it should be faster to do each lookup just once, before entering the loop: in the compiler literature, this is called loop hoisting.

@@ -166,9 +166,13 @@ local function proximity(a, b, limit) {
   return 0
 }
 
-function displaytime(){
+local displaytime;
+displaytime = function(){
   local now = getdate()
-  local hourled = (now.hour + 0) * 5 + (now.min / 12.0).tointeger()
+  local hour = now.hour;
+  local min  = now.min;
+  local sec  = now.sec;
+  local hourled = (hour + 0) * 5 + (min / 12.0).tointeger()
   if (hourled > 59) {
     hourled = hourled - 60
   }
@@ -182,13 +186,13 @@ function displaytime(){
       green= (100 * hourprox * hourprox).tointeger()
       blue = 0
     }
-    local minprox = proximity(i, now.min, 2.0)
+    local minprox = proximity(i, min, 2.0)
     if (minprox != 0) {
       red = 0
       green = (100 * minprox * minprox).tointeger()
       blue = (100 * minprox * minprox).tointeger()
     }
-    if (proximity(i, now.sec, 0)) {
+    if (proximity(i, sec, 0)) {
       red = 100
       green = 0
       blue = 100

And it’s shaved a tiny bit more off the time.

	Time/μs	Code size	Free memory
Baseline	49,373	6.58%	60,712b
Global variables	44,666	6.57%	60,752b
Optimising writePixel	36,400	7.06%	60,048b
Optimising writeFrame	36,396	7.17%	59,908b
Optimising Neopixel calls	34,043	7.15%	60,060b
Removing array references	31,876	7.13%	60,044b
Global functions	15,563	7.31%	59,752b
Hoist table lookups	14,657	7.38%	60,344b

8. Other optimisations

Right, the gloves are off now. Anything goes that will make this code faster, and readability be gosh-darned.

First of all, we can get rid of all the string lookups in the date structure by just using Squirrel’s time() call instead, which is just a few integer divisions away from giving us the right answer. (Or, in my case, the wrong answer – the code below incorporates a bugfix which in the Gist only appears in a later commit.)

Sort-of similarly, it turns out that open-coding the equivalent of the math.abs function is fractionally faster than actually calling it.

Then we can hoist the lookups in the background array out of the loop. These are integer lookups – not the expensive string sort – but every little helps. (And if you’re thinking, “Why not just get rid of background altogether?”, then the answer is that the actual impoclock has an agent.on() method for setting the background colour, which isn’t shown here because it’s not relevant to the optimisations.)

One final micro-optimisation is to observe that a pixel showing the second hand can’t also be showing the minute hand, and one showing the minute hand can’t also be showing the hour hand. So the ifs in the loop can be if..elses, and we can save about a gnat’s crotchet of time by skipping calculations which we know we aren’t going to use.

@@ -151,12 +151,14 @@ server.disconnect();
 
 local background = [0,0,0];
 
-local mabs = math.abs.bindenv(math);
-local getdate = date;
+local gettime = time;
 local impwaker = imp.wakeup.bindenv(imp);
 
 local function proximity(a, b, limit) {
-  local diff = mabs(a-b);
+  local diff = a-b;
+  if (diff < 0) {
+      diff = -diff;
+  }
   if (diff > 60-limit) {
     diff = diff - (60 - limit) + 1
   }
@@ -168,34 +170,32 @@ local function proximity(a, b, limit) {
 
 local displaytime;
 displaytime = function(){
-  local now = getdate()
-  local hour = now.hour;
-  local min  = now.min;
-  local sec  = now.sec;
+  local now = gettime()
+  local hour = (now/3600)%12;
+  local min  = (now/60)%60;
+  local sec  = now%60;
   local hourled = (hour + 0) * 5 + (min / 12.0).tointeger()
   if (hourled > 59) {
     hourled = hourled - 60
   }
+  local bgred = background[0];
+  local bggreen = background[1];
+  local bgblue = background[2];
   for(local i = 0; i<NUM_PIXELS; i++){
-    local red = background[0];
-    local green = background[1];
-    local blue = background[2];
-    local hourprox = proximity(i, hourled, 3.0)
-    if (hourprox != 0) {
-      red = (100 * hourprox * hourprox).tointeger()
-      green= (100 * hourprox * hourprox).tointeger()
-      blue = 0
-    }
-    local minprox = proximity(i, min, 2.0)
-    if (minprox != 0) {
-      red = 0
-      green = (100 * minprox * minprox).tointeger()
-      blue = (100 * minprox * minprox).tointeger()
-    }
+    local red = bgred, green = bggreen, blue = bgblue;
+    local hourprox, minprox;
     if (proximity(i, sec, 0)) {
-      red = 100
-      green = 0
-      blue = 100
+        red = 100
+        green = 0
+        blue = 100
+    } else if ((hourprox = proximity(i, hourled, 3.0)) != 0) {
+        red = (100 * hourprox * hourprox)
+        green= (100 * hourprox * hourprox)
+        blue = 0
+    } else if ((minprox = proximity(i, min, 2.0)) != 0) {
+        red = 0
+        green = (100 * minprox * minprox)
+        blue = (100 * minprox * minprox)
     }
     pixelWriter(i,red,green,blue);
   }

And these final few things have between them eked out another 15% or so.

	Time/μs	Code size	Free memory
Baseline	49,373	6.58%	60,712b
Global variables	44,666	6.57%	60,752b
Optimising writePixel	36,400	7.06%	60,048b
Optimising writeFrame	36,396	7.17%	59,908b
Optimising Neopixel calls	34,043	7.15%	60,060b
Removing array references	31,876	7.13%	60,044b
Global functions	15,563	7.31%	59,752b
Hoist table lookups	14,657	7.38%	60,344b
Other optimisations	12,196	7.44%	60,392b
Total change	-75.3%	+13%	+0.5%

9. So the final scores there

The impoclock update routine which, written straightforwardly, took nearly 50ms, got optimised down to a little over 12ms: four times as fast. Or, put differently, a peak update rate of 20 frames per second became 80 frames per second.

Most, though perhaps not quite all, of these sorts of optimisations can be applied quite widely to different Squirrel programs. And if you’re writing code that might get widely reused, you never know what timing requirements other users of your code will have, so it behoves you to optimise as much as is sensible. (Conversely, it’s not as sensible to optimise every microsecond out of a routine that nobody’s going to be calling all that often.)

The optimisations have come at some cost in readability, as well as in code size and memory usage. There’s not much that can be done about the readability – but optimising Squirrel code for size, as opposed to speed, will be the subject of another post.

Monday, 15 December 2014

Help, my X clipboard is broken

Whenever I drag-select anything in xterm, the selection immediately gets cleared. Similarly in XEmacs, even when using shift and arrow-keys to select, the selection vanishes straightaway.

Bashing all the modifier keys doesn't help. Switching to console (Ctrl-Alt-F2) and back doesn't help. Unplugging and re-plugging the keyboard doesn't help.

The culprit? gitk. For some reason, when you use the "Line diff"/"Markup words" drop-down, it sometimes mucks up the X clipboard so badly that you have to quit and restart gitk to restore sanity.

I actually figured this out once before, but then had forgotten by the time it happened to me again, so this time I thought I'd write it down.

Wednesday, 22 October 2014

Software Practices At Electric Imp

This post is an elaboration of a discussion held at Electric Imp’s headquarters in October 2013, where we ran through a collection of software engineering best-practices that various team members have found useful (whether actually at EI or at previous jobs) for producing robust pieces of software. None of these are rocket science, and none are all that innovative, but having them all written down in one place helps everyone know what to expect – especially new team-members such as contractors.

Coding standards

We have coding-style guides for the main languages we use: Erlang, Javascript and C++. It’s a specific goal that we shouldn’t be able to tell, just by looking at the style of a section of code, which of us wrote it. We should probably have style guides for our other languages too: Squirrel, Objective-C, Java/Dalvik.

Each should resemble the accepted standards (or at least an accepted standard) for the language in question – as opposed to resembling each other. Our Javascript should look like idiomatic Javascript, not like idiomatic C++.

Of course, sometimes this isn’t possible. Both in Javascript and in C++, our code relies on and interacts with a large number of third-party components – which in many cases are themselves written in different styles.

Code reviews

All production code gets reviewed. Where convenient (developers in the same office), it gets reviewed before git push, but sometimes that’s not convenient (remote developers), so review-after-push is the best we can do. But where the change is large enough to have been done on a feature branch, we can get the best of both worlds by pushing that branch to the central server and doing the review before merge to master.

One obvious benefit of code reviews, is that a second pair of eyes checks over the code for things the original developer didn’t spot. (If done thoughtfully, it can be a great learning tool for more junior members of the team.)

But that isn’t actually the main benefit. The best thing about code reviews, is that they spread out knowledge about the codebase among all the team members, helping to avoid the situation where individual developers have their own “fiefdoms”, or code that only they know about. It’s the idea of the “Smallest Bus Queue Accident”, the slightly ghoulish concept of the number of team-members who would need to simultaneously be hit by a bus for the project to be imperilled. If everyone’s an irreplaceable specialist, SBQA=1, and even just one accident (or poaching, or mid-life crisis) will leave you in trouble. If everyone is familiar with everything (at least enough to maintain it), SBQA=n, the team size, and long-term risk to the project is much lower.

The film “Jurassic Park” is, at some level, a commentary on just how badly things can go wrong in a system with an inadequate (absent) culture of code review.

Unit tests

All commits that change production code, should also contain a unit-test for the change. All the unit-tests are run in every build of the codebase (top-level scons invocation). The primary and most obvious reason for this, is that we’ve all written code that flat-out didn’t do what we intended, and having our assumptions checked automatically before git push keeps those misunderstandings out of the master sources. But it turns out that there are also other benefits of cultivating a habit of unit-testing.

For one thing, a healthy collection of unit-tests can act as documentation for the code under test, showing its expected behaviour and suggesting which parts of its behaviour are intended and essential, and which are accidental and possibly unwanted. Unlike many other forms of documentation (including, uh, the form of the blog), documentation-by-unit-test is guaranteed to be up-to-date, as the build breaks any time it doesn’t match the implementation.

Another benefit of unit-testing is that it acts as a forcing-function for modularity: if it’s hard to see how to test your code one part at a time, because you can’t get in between the parts, then that’s a warning sign that the parts are more tightly-coupled is healthy for maintainability.

There is, though, a trade-off there, a judgement call to be made. In C++ there’s a cost associated with virtual functions (which are needed for mocking and for test doubles in general), and there’s also a human cost associated with the added complexity of introducing test doubles for every single class (even if they’re systematically named and located, that’s still doubling the number of classes in any library). But in general the design of the system should offer enough modularity to provide an adequate sprinkling of seams for introducing test doubles, which means that in general extracting an interface solely so that it can be mocked-out, is frowned upon. In Javascript, the performance cost part doesn’t apply (all functions are, in C++ terms, “virtual”, and so all dependencies can be mocked), but it may still be worth choosing carefully how large the “unit” in each unit-test is, with the overarching goal of retaining readability of the codebase as a whole. That is to say, the “unit” might most easily be more than one class, or more than one file.

Judgement is also needed when writing the tests themselves. It’s all too easy to write tests that are either too precise and thus too fragile against correct changes to the code, or too vague and thus capable of missing incorrect changes. One of our developers was unlucky enough to encounter a bogus test in the very first change he made to our C++ codebase: adding some debug logging of the imp’s current list of servers. A unit-test started failing because getServer() was being called “too many” times. But pretty clearly, it wasn’t part of the external contract of the class how many times it called getServer() – the test was too precise. The mantra should be: test the functionality of the unit, not the implementation. If I want to know whether the implementation is the same as it was yesterday, I’ll use diff – I want the tests to tell me whether it’s fulfilling its external contract the same way it was yesterday, following any changes to its implementation. Practice makes perfect, here: the more tests you write, the better you get at judging these things.

There’s a school of thought that says that tests are so important that you should write them first, before the implementation: test-driven development. In our experience that works well for some tasks – self-contained, purely algorithmic ones – and less well for others. For instance, when writing driver code, the largest source of bugs is an incorrect mental model of how the hardware works: but as the same developer writes the tests then the code, it’s too easy for the tests and the code to be consistent with each other but both wrong. Where it does work, though, it works brilliantly: the imp’s memory allocator was written test-first – and, as it’s a well-defined and well-understood problem, writing the tests was relatively easy.

Test-driven development is as much about improving the quality of the tests as it is of the code. If you know (or think you know) that some particular functionality will be needed in the implementation, but the tests you’ve written so far haven’t exercised it yet, it does make you stop and think: how do I write a test for that – how do I get the system into that state? Either it’s forcing you to come up with a test you didn’t realise you needed – or, even better, it’s telling you that your expected elaboration wasn’t needed at all.

All told, I don’t think it’s a huge exaggeration to say, “If you don’t have tests, you don’t really have the code” – because without the tests, you can’t do anything with it: the path to being able to refactor it, is nearly as long as the path to writing it again from scratch.

System tests

Unit tests are good at ensuring that individual modules fulfill their contracts. But the task of writing quality software is, sadly, still not done at that point: assembling these contracts together to form an overall system is still a human process, fertile with opportunities for human error. Unit testing reduces the risk of inter-module bugs, but cannot eliminate it altogether.

So at some point you’re going to have to test larger parts of the system. Michael Feathers’ rules of thumb as to what is a unit test and what isn’t, are widely quoted: the key distinction being, a unit test should be basically instantaneous and invisible, at least when successful. System tests are a bit more heavyweight, and might be more involved to set up: for instance, target hardware might be needed.

In theory, system testing proceeds via structural induction: prove that component A works, then prove that A+B works, then A+B+C, and so on until you’ve tested the whole software system. In practice, it usually suffices to test major components, and then the system as a whole: for instance, the imp hardware, then the imp server, then end-to-end tests of the whole enchilada.

Just as unit tests can be viewed as documentation for the code, system tests can be viewed as high-level specifications of the entire system’s behaviour: “when an imp contacts the server, it’s sent the appropriate Squirrel for the device it’s plugged into”.

Static analysis

The big idea with static analysis (and, for that matter, dynamic analysis), is this: for any property that your code must have in order to be correct, it’s more reliable to have an automated check that your code has that property, than it is to rely on every programmer to manually keep it in mind the whole time. For instance, Javascript: The Good Parts – a book conspicuously much thinner than “Javascript: The Definitive Guide” – is really about avoiding language features that cause misleading or buggy code. Most of the traps the author describes, can be detected by the JSLint or JSHint static-analysis tools; running one of those tools before committing, can thus keep questionable or hard-to-maintain constructs out of the codebase.

In the wild-west days of C, similar coding traps were caught by a program called lint – though when C++ came along,

Lots of warnings:

g++ -Wall -Wextra -Werror -Wundef -Wno-unused-parameter -Woverloaded-virtual -Wlogical-op -fstrict-enums -Wno-long-long -Wpointer-arith -Wnon-virtual-dtor -Wno-sign-conversion -Wunused-but-set-variable

it designed out most of the need for lint, and incorporated much of the rest into the compiler. Some of it is in the compiler only in optional warnings or errors – so we always run GCC in a mode with lots of warnings and errors. (On projects that also use other compilers – MSVC for example – it’s a good idea to turn on all the warnings in all of them, as every compiler detects different problems. But here at EI, all our targets are GCC-based.) We’re happy to keep looking, but we’ve yet to find a lint-like tool for C++ that finds real bugs.

Other languages have their own static-analysis tools: Erlang has Dialyzer, and XCode has some stuff for Objective-C. Though more are always welcome: for instance, code written against node.js makes widespread use of callbacks as part of asynchronous APIs. So it’s a correctness constraint that any routine that takes a parameter called cb, must call it exactly once (not twice, not no times) on every path through the function. That should, in theory, be amenable to automated testing (at least for most easy cases).

Dynamic analysis

Dynamic analysis tools also tend to be language-specific. In C++, the use of Valgrind is well-known; Helgrind, bundled with Valgrind, can be awkward to use but can also be invaluable; and mutrace and perf are helpful at that stage of project maturity when the question becomes, “yes, yes, it works, but why isn’t it faster?” (That’s usually a pretty late stage of maturity, on the timeless basis that it’s easier to optimise correct code, than it is to correct optimised code.)

The one act of dynamic analysis that all types of code can enjoy, though, is code-coverage analysis. Code-coverage analysis is the ugly cousin of test-driven development, but is good enough for a rainy weekend in Norfolk. Developing while keeping an eye on the coverage statistics is almost, but not quite, as reliable as test-driven development as a method of ensuring that your code has good-quality tests.

Continuous integration

We use the very splendid Jenkins continuous-integration server (autobuilder). Its main responsibility is running, after each commit, all the tests and analyses that would be too time-consuming for developers to run before each commit. Or too awkward – Jenkins runs all the system tests that require special hardware, and also runs builds on all supported host platforms.

The quality bar for pushes to master, is clean runs on all of these test suites; any failures are stop-the-line emergencies. If a build or tests is failing, the very next push must be the fix, or other developers can’t continue pushing (because they can’t know whether their own work passes that test or not). This is usually known as “do not commit on red” – although with Git, it’s actually the “push”, not “commit”, operation that’s the relevant one. Because of that, a failing build means nobody else can make progress, so fixing it is viewed as an emergency.

Using Pivotal

We use Pivotal Labs’ Pivotal Tracker to keep track of work items. As Electric Imp is spread out over several time-zones, it’s important to have an online tool that we can use to look at the status of projects or tasks without having to bother actual human beings. We looked at a few online Agile tools, and the simplicity and easy visibility of Pivotal made it definitely the best.

For those who haven’t seen it, it arranges tasks into (basically) three columns: “current”, “backlog”, and “icebox”. “Current” is for the current sprint, “backlog” is for upcoming sprints, and “icebox” for everything else. Tasks can be moved around and re-ordered by drag-and-drop. A task is either a user-story, a “chore” (which we use for refactoring or technical-debt tasks that aren’t end-user-visible), or a bug.

We use this system to work on tasks in priority order, as prioritised by the product owner:

Anyone can add new stories, chores, or bugs to a “new tasks here” section in the icebox.
The product owner has a think about the new task (if it’s a story or a bug), and prioritises it either into the backlog, or into a “nice to haves” section in the icebox.
If it’s a chore, the developers prioritise it between themselves (typically, just before a story which would be aided by the cleanup, or which would need re-doing if done before the cleanup).
As developers finish up previous tasks, they pick the topmost one in the backlog each time. (We don’t really deal in “sprints” like canonical Agile – it’s more task-by-task, like kanban.)

Using Git

Unlike, say, Subversion, Git makes having a neat project history achievable. So we try to achieve it.

Bow-shaped feature branches let us view individual features either entire or as composed of a series of patches. That’s good for cherry-picking, it’s good for (ahem) reverting, and it’s good for rereading old commits to understand the origins of a piece of code (or of a bug). In general we name single-developer feature branches with the developer’s initials plus a very short indicator of the feature, such as pdh-discovery or rs-schema. (The developer’s initials serve as a kind of “watch out, I might rebase this branch at any moment”.)

The server and the client have different needs from their respective release processes, mainly because of “in-the-drawer” syndrome: previous client releases remain important in perpetuity, because a user could stick one in a drawer, forget about it, remember a year later, and then try and connect it to modern-day servers. The same doesn’t apply the other way around: previous server releases are, to a large extent, yesterday’s newspapers.

Ensure it all happens

We’re the engineering department; we’ve all been hired to do engineering. It’s pretty likely that we’re the last line of defence for things being well-engineered and done right: very few customers or project-managers have ever been heard to say, “Yes, yes, but can’t you take a bit longer and do it properly?” – indeed, many have been heard to say the exact opposite.

So for instance, adding automated tests is always an integral part of the development process – whether it’s done strictly before coding the functionality, as the test-driven school would say, or in a more intermingled way. It’s not an optional extra. Developers incorporate the building of tests into the original development estimates, and project owners don’t (shouldn’t) accept a story as “done” unless the testing is in-place too.

Being the engineering department, we don’t necessarily get to decide the balance between engineering goals and commercial goals (in fact, we probably don’t want to do that). But it does behove us to make sure that the people who do make those decisions, know what the engineering consequences will be.

Saturday, 12 April 2014

Is everything in this release pushed upstream?

Engineering is repeatability. If what you’re doing isn’t repeatable, it isn’t engineering, and may in fact just be performance art instead. (Not that there’s anything wrong with performance art per se, but if you bought a tin marked “Engineering”, and instead it contained performance art, you’d take it back to the shop.)

Part of being able to repeat something is knowing what it is that you’re trying to repeat. Or, now that we’re moving past the “fanciful opening paragraph” phase of this blog post and into specifics: part of making a software release is knowing exactly what source code it was built from, so that if there’s ever a problem with the release, the exactly-corresponding source code can be analysed to discover the cause. (Sometimes tools can make this hard; sometimes they also have an option that makes it easier again.)

In the days of CVS and Subversion, when there was a single repository for a project, repeatability was typically achieved by using the tagging mechanism of those version-control systems: anyone could later check out that same tag and obtain code identical to the release. (In the days before CVS and Subversion, we probably just wrote a code snapshot to a CD or a floppy and kept it somewhere safe. We were cowboys once and young.)

But now we live in the days of Git. It’s possible to use Git in a purely peer-to-peer fashion, every developer being their own island – but that could easily cause unrepeatability of releases, releases as performance art. To make releases as acts of engineering, you’re going to have to nominate one repository as the golden or master one, the one where the tags and SHAs of releases live.

The Linux kernel project, for which Git was originally designed, does this by having a release manager (Linus Torvalds himself) whose personal (but centrally-hosted and shared) repository is the golden one, and who makes releases from code he’s pulled from other developers’ repositories as they complete features.

There’s another way of doing it, though: one which is more familiar to developers who have used CVS or Subversion-era tools. And that is to have a central golden repository for the project, which everyone has write access to, and to use git push as if it were svn commit. But it may be inconvenient to build the releases on the machine hosting the Git repository, so release managers will still do that on their own machines in their own local Git repositories.

So, especially if the release manager is also a developer, the engineering question becomes: is this local commit, from which I’m building the release, also in the golden repository? Is my SHA upstream? (Or am I just pleased to see you?) This is an important question because, if you use bow-shaped branches – or any other rebase workflow – then the SHA of your commit may well be different by the time it has gone upstream.

What’s really needed is a short and pithy Git command that looks a bit like “git is-releasable HEAD” – but there isn’t one, so in order to answer the question “Is it OK to build a release from this commit?”, you need to build that facility from several Git commands.

First off, clearly it’s not OK to build the release if you have uncommitted changes:

git status # Sometimes needed to refresh the index (?) git diff-index --quiet HEAD || echo Nope

Now you need to check whether your SHA exists upstream:

test "`git branch -r --contains HEAD`" != "" || echo Nope

That works by listing all the branches in the golden repository which contain your current commit. If there’s at least one such branch, it’s OK to build the release – otherwise, no branch in the golden repository contains your commit, so it isn’t currently repeatable, so it isn’t OK.

Some of this functionality is offered by git describe --dirty, but apart from being less fun than it sounds, it doesn’t answer the question about the golden repository. (Perhaps because, in its original Linux home, releases are always made from the golden repository.)

At Electric Imp, the version information that we incorporate into every build of the software, includes not just the Git SHA but also the “OK to release” information and a corresponding Git tag name (if any). The “OK to release” is marked by having a “+” in the version number when it’s not OK, i.e. when this must be an internal build only. (You can’t always force people to do the Right Thing, but you can at least make sure that they’re told about it when they’re doing the Wrong Thing.) Here’s the relevant part of our top-level SConstruct file:

gitRevision = subprocess.Popen(["git","rev-parse","--short","HEAD"],
                               stdout=subprocess.PIPE
                               ).communicate()[0].strip()
subprocess.check_output(["git", "status"]) # Refresh index ready for diff-index
if subprocess.call(["git", "diff-index", "--quiet", "HEAD"]):
    # Local diffs, give it a plus
    gitRevision += "+"
    print "Local diffs -- NOT FOR PRODUCTION USE"
else:
    gitRemote = subprocess.Popen(["git", "branch", "-r", "--contains", "HEAD"],
                                 stdout=subprocess.PIPE
                                 ).communicate()[0].strip()
    if gitRemote == "":
        # Doesn't exist in upstream branches (i.e., not pushed): give it a plus
        gitRevision += "+"
        print "Ahead of upstream -- NOT FOR PRODUCTION USE"

gitTag=''

if "+" not in gitRevision:
  gitTagPipe = subprocess.Popen(["git", "describe", "--exact-match"],
                            stdout=subprocess.PIPE)
  rawtag = gitTagPipe.communicate()
  if gitTagPipe.returncode:
    print "Not tagged -- NOT FOR EXTERNAL RELEASE"
  else:
    gitTag = " - " + rawtag[0].strip()

ei_version = gitRevision + gitTag + " - " + datetime.datetime.now().strftime("%c")

So a build from a known and stable SHA (i.e., one which exists in the golden repository) gets a version looking like this:

92a5ff6 - Fri Feb 7 18:25:04 2014

whereas a local build with a bunch of stuff I haven’t pushed upstream yet gets a version looking like this:

c47c6da+ - Sat Apr 12 20:51:25 2014

with the tell-tale “+” in the SHA. And a version that’s tagged in Git for external release has the tag added to the version string too:

af0f28a - release-27.10 - Fri Dec 13 11:08:38 2013

and that is, in fact, the current format of the string returned on the imp itself by imp.getsoftwareversion() – though we make deliberately very weak guarantees about the format of the string (it’s human-readable, it’s different for different releases) in case it ever needs to change in the future.

Wednesday, 25 September 2013

A Git workflow at EI

Electric Imp have used Git from the very beginning of the company, and in that time we’ve evolved what I at least reckon is a useful way of using it, a useful workflow.

It’s ended up similar to, but not quite the same as, Vincent Driessen’s “Gitflow” model, and this blog post purposely uses similar diagrams, terminology, and colour-coding to that one, to make comparisons easier (though hopefully it also stands alone, for those who haven’t read it).

The big picture

There’s a single central Git repository, origin, from which all releases are made and in which all tags reside. Because Git is “decentralised”, each developer has one or more local repositories too.

This diagram, like Vincent Driessen’s original, is drawn with oldest at the top, newest at the bottom, which is the opposite of the convention used by gitk.

Quick summary of differences from “Gitflow”

The yellow (main, integration) branch is, for historical reasons, called master;
The blue (deployment) branch is called production;
Bug-fixes are cherry-picked out from yellow to release branches wherever possible, rather than being merged from release branches back to yellow;
Pink feature branches (those done by single individuals, at least) are done as bow-shaped merges;
Because of the bow-shaped merges, yellow is never merged out to feature branches: if a feature branch needs some new stuff that’s landed on yellow, it gets rebased on top of yellow;
Because we do two kinds of releases from the same codebase – server deployments which are lightweight and rapid, and client firmware upgrades which are more heavyweight and intrusive – there are two kinds of green release branch which are treated slightly differently. (But the server deployment one works much like the “Gitflow” equivalent.)

These are mostly fairly minor differences. (But notice how there are very few non-rebased merges.)

The two long-lived branches and their relationship

The integration branch (“master”, yellow) and the deployment branch (“production”, blue) are the only branches that continue to get new commits indefinitely.

All new work happens on master;

A one-commit story:

$ git checkout master
$ git pull
hack ... hack ... hack
test ... test ... test
$ git commit
$ git pull --rebase
test ... test ... test
$ git push

work that consists of only one or two commits goes straight in, and work that’s more involved than that lands via the merging of a feature branch, of which more below.

The Jenkins continuous-integration server runs whenever new commits are made to master: it builds the whole codebase for all relevant platforms, runs all the unit tests, runs all the integration tests, and finally runs some system-tests on a test farm of real hardware. The quality bar for pushes to master, is clean runs on all of these test suites; any failures are stop-the-line emergencies. If a build or tests is failing, the very next push must be the fix, or other developers can’t continue pushing (because they can’t know whether their own work passes that test or not). This is usually known as “do not commit on red” – although with Git, it’s actually the “push”, not “commit”, operation that’s the relevant one.

This achieves the goal that “There is a known production branch, so you don’t have to think. If you checkout the equivalent of production, it’s either exactly what’s currently in production or it’s what’s about to be in production.”

Also, “The production branch is known-good. It is never a mistake to push the production branch to production servers, ever.” This eases communication with the Operations team. New work is never done directly onto production: it arrives there due to either merges or cherry-picks from master (possibly via an intermediate release or hot-fix branch).

Feature branches

Feature branches are usually short-lived, and indeed usually exist as named branches only in developers’ local repositories. (With Git, if you merge a branch locally into master and then push the result, the branching structure is pushed to origin and becomes part of permanent history, but the branch name isn’t pushed, and doesn’t appear in the origin repository except perhaps in the commit comment of the merge.)

Feature branches are usually named with the developer’s initials and a brief hint to the branch’s purpose: for instance, pdh-regexp was my branch for implementing a regular-expressions feature.

Starting a feature branch:

$ git checkout master
$ git pull
$ git checkout -b pdh-modbus
hack ... hack ... hack
test ... test ... test

There are two exceptions to the above description, both (hopefully) pretty rare: the first is that, if a feature branch is getting so big or so long-lived that it could do with living on the origin server too purely as a backup strategy, then its developer can push it to origin. Prefixing the name with the initials, though, makes clear that it’s a private branch, in the sense that it is likely to get forcibly rebased by that developer, so caveat emptor.

The second exception is when several developers are working on the same feature. This is also probably relatively rare (Kanban and Agile encourage single-developer, or single-pair, working), but it doesn’t fit the same model, because a branch that gets commits from two different sources, can’t be rebased without messing up the other developers. So in that situation, you’d keep the feature branch on origin, the co-operating developers would pull it using git pull --rebase and push it using git push. Once the feature is reviewed, QA’d, and delivered, the collaborative feature branch can be merged to master. This is the only situation in which a non-rebased branch gets merged to master. (“Gitflow” also suggests the use of developer-to-developer, not developer-to-origin, Git pulls and pushes for managing this case, but that sounds to me like a recipe for confusion, plus it’s hard to do with a rebase workflow.)

Once a feature on a branch is complete (and reviewed, and tested), the feature branch can be merged back to master. This is done by rebasing the feature branch on top of master, then doing a no-fast-forward (--no-ff) merge; the thinking behind that style of merge, and full information and walk-throughs of how to perform one, can be found at Bow-shaped branches: a Git workflow.

Because, in order to do a bow-shaped merge, every feature branch eventually gets rebased on top of master, there shouldn’t be any merges from master out to a feature branch. If the feature branch needs some functionality that only landed on master after the feature branch started, it should be rebased on top of master instead. Indeed, it’s good practice to rebase all your feature branches on top of master fairly regularly, as it eases and subdivides the final rebasing process that happens before the delivery merge.

Notice that with the bow-shaped merge construction, although there can be several current unmerged feature branches at any time – mostly in developers’ local repositories – the merging process serialises them completely (by always rebasing before pushing), so that Git permanent history never contains overlapping or nested ones. This makes it easier to find problems using git bisect.

Two different release patterns

Electric Imp has a single repository from which all parts of the system are built: this eases system testing, and the addition of system-wide features, but it does mean that two different types or cadences of “release” happen from the same codebase.

Server releases are deployed to our cloud service. As is best-practice in the server software culture, this is (close to) continuous deployment. Releases are made really quite often, sometimes several times per day – so often, in fact, that it’s pointless even to tag or number them (we’d be in the hundreds). This is achievable because it’s relatively easy for automated testing to cover the entire gamut of server functionality, because upgrades themselves and reverts or hot-fixes are so straightforward as to be virtually push-button, and because (assuming the revert script works as-tested) the impact of a “bad” release is relatively minor. The pace of server releases demands a lightweight release process.

None of those considerations apply to client firmware releases: covering the gamut of firmware functionality can require custom hardware, upgrades get downloaded over the Internet and programmed into flash memory (which is a bit disruptive and can be time-consuming) – and, in theory at least, a “bad” release could be quite awkward to recover from (requiring careful actions by individual end-users). So firmware releases are performed with considerably more caution: the QA, beta-test, and qualification process for a newly-made release branch typically takes a number of weeks. This is (by our standards at least) a heavyweight release process.

Another important difference is that the end-user can at any time get bored of the device, put it away in a drawer for an arbitrary length of time, then rekindle their interest, retrieve the device, and try to use it. This means that the current server release must work with all previous client releases (at least enough for them to upgrade themselves), a criterion fortunately not present in the reverse direction. This concern makes it worth our while keeping the total number of client releases down (and getting cross when “beta” or “test” releases go out without being tagged).

The heavyweight release process

The heavyweight release process, which we use for firmware releases, is based mainly on an abundance of caution.

Once the required collection of new functionality has landed on master, a new release branch is made. This is named after the first release that’s expected to be made from the branch: every release is numbered, with (for instance) releases 25, 25.1, and 25.2 all coming from the release-25-dev branch.

Once the branch is made, it is subjected to the unblinking eye of QA – even a culture of good unit-tests, integration tests, and system tests does not rule out the need for exploratory testing before release.

For major new functionality there may even be a closed beta process, where end-users hand-picked for both eagerness and cluefulness get given tagged beta releases from the branch to supplement our internal testing.

Once a release branch is made, the only subsequent changes are bug fixes. If and when issues are found on a release branch, we adopt GCC’s rule that fixes must (wherever possible) be made on master first and then cherry-picked out to the release branch. This is what ensures that the fix will also end up in subsequent releases: unlike in “Gitflow”, the release branch is not merged back to master.

And if (horrors!) an issue should crop up in a the release after is tagged and rolled out, it again gets fixed on master first and cherry-picked out to the release branch. A point release gets tagged and rolled out: release-27.1, say.

Only if master has moved on so much in the meantime, that the fix for master doesn’t apply on the branch, would fixing take place directly on the release branch.

The lightweight release process

The lightweight release process, which we use for server releases, is based on responding with alacrity to new requirements or to current events – for instance, unexpected load on the servers might require new logging or instrumentation to be added basically immediately.

Releases are made so often that they don’t even get names (and nobody would remember or use them if they did). So to indicate the current state of the production servers, a deployment branch is used. (This is the same as the “blue branch” of “Gitflow”, except that we call it production rather than master.) It’s also the case that, because when we upgrade the server everyone gets it straightaway, previous versions are dead and gone: they don’t hang around in the way that previous firmware releases do. To a much larger extent than with firmware, at any given time only the most recent release matters at all.

As for updating production: if major replumbing or massive new functionality has landed in the server code, it might sometimes be useful to use the heavyweight process – except, with the success event being merging out to production rather than tagging and releasing. More often, though, the necessary alacrity is achieved by a reduced process: picking a suitable version of master, testing it (perhaps by deploying it to staging servers), applying fixes directly to master where necessary, and then simply merging out to production and pushing.

Hot-fixes, small patches to the production code done for emergency situations, can be written on master, cherry-picked locally into production, passed by code-reviewers and/or QA, and then pushed to origin/production. (In “Gitflow”, hot-fixes are landed via a short-lived hot-fix branch. That would be useful where a hot-fix itself consists of a series of commits, not just one – but that seems like it would rarely actually happen.)

Scaling out to enormous development organisations

All of the above assumes that the development organisation is small enough to operate as a single team. Above a certain size, this starts to become awkward: even the rare bad commits on master start to happen too often, and the (lock-free but not wait-free) bow-shaped merge process starts to become a bottle-neck.

In this situation, all you can do is introduce more process (and hope that the increase in developer numbers offsets the decrease in per-developer productivity – an outcome far from guaranteed). What you end up doing is dividing into teams and running the heavyweight release process – but, instead of releasing directly, releasing to an internal “meta-integration” branch where the “best available” versions of each team’s work are combined, to then face further automated and manual testing before actual release.

Really enormous organisations would end up with meta-meta-integration branches, or worse. Releases become great tides that ripple through the organisation, to be taken at the flood or omitted as necessary: the magic phrase to Google for to read more about Agile-in-the-large seems to be “release train”...