This post is an elaboration of a discussion held at Electric Imp’s
headquarters in October 2013, where we ran through a collection of
software engineering best-practices that various team members have
found useful (whether actually at EI or at previous jobs) for
producing robust pieces of software. None of these are rocket science,
and none are all that innovative, but having them all written down in
one place helps everyone know what to expect – especially new
team-members such as contractors.
Coding standards
We have coding-style guides for the main languages we use: Erlang,
Javascript and C++. It’s a specific goal that we shouldn’t
be able to tell, just by looking at the style of a section of code,
which of us wrote it. We should probably have style guides for our
other languages too: Squirrel, Objective-C, Java/Dalvik.
Each should resemble the accepted standards (or at least
an accepted standard) for the language in question – as
opposed to resembling each other. Our Javascript should look
like idiomatic Javascript, not like idiomatic C++.
Of course, sometimes this isn’t possible. Both in Javascript and in
C++, our code relies on and interacts with a large number of
third-party components – which in many cases are themselves written
in different styles.
Code reviews
All production code gets reviewed. Where convenient (developers in
the same office), it gets reviewed before git push, but
sometimes that’s not convenient (remote developers), so
review-after-push is the best we can do. But where the change is large
enough to have been done on a feature branch, we can get the best of
both worlds by pushing that branch to the central server and doing the
review before merge to master.
One obvious benefit of code reviews, is that a second pair of eyes
checks over the code for things the original developer didn’t
spot. (If done thoughtfully, it can be a great learning tool for more
junior members of the team.)
But that isn’t actually the main benefit. The best thing
about code reviews, is that they spread out knowledge about the
codebase among all the team members, helping to avoid the situation
where individual developers have their own “fiefdoms”, or code that
only they know about. It’s the idea of the “Smallest Bus Queue
Accident”, the slightly ghoulish concept of the number of team-members who
would need to simultaneously be hit by a bus for the project to be imperilled. If
everyone’s an irreplaceable specialist,
SBQA=1, and even just one accident (or poaching, or mid-life
crisis) will leave you in trouble. If everyone is familiar with
everything (at least enough to maintain it), SBQA=n,
the team size, and long-term risk to the project is much lower.
The film “Jurassic Park” is, at some level, a
commentary on just how badly things can go wrong in a system with an
inadequate (absent) culture of code review.
Unit tests
All commits that change production code, should also contain a
unit-test for the change. All the unit-tests are run in every build of
the codebase (top-level scons invocation). The primary and
most obvious reason for this, is that we’ve all written code that
flat-out didn’t do what we intended, and having our assumptions
checked automatically before git push keeps those
misunderstandings out of the master sources. But it turns out that
there are also other benefits of cultivating a habit of
unit-testing.
For one thing, a healthy collection of unit-tests can act as
documentation for the code under test, showing its expected behaviour
and suggesting which parts of its behaviour are intended and
essential, and which are accidental and possibly unwanted. Unlike many
other forms of documentation (including, uh, the form of the blog),
documentation-by-unit-test is guaranteed to be up-to-date, as the
build breaks any time it doesn’t match the implementation.
Another benefit of unit-testing is that it acts as a
forcing-function for modularity: if it’s hard to see how to test your
code one part at a time, because you can’t get in between the parts,
then that’s a warning sign that the parts are more tightly-coupled is
healthy for maintainability.
There is, though, a trade-off there, a judgement call to be
made. In C++ there’s a cost associated with virtual functions (which
are needed for mocking and for test doubles in general), and there’s
also a human cost associated with the added complexity of introducing
test doubles for every single class (even if they’re systematically
named and located, that’s still doubling the number of classes in any
library). But in general the design of the system should
offer enough modularity to provide an adequate sprinkling of seams
for introducing test doubles, which means that in general
extracting an interface solely so that it can be mocked-out, is
frowned upon. In Javascript, the performance cost part doesn’t apply
(all functions are, in C++ terms, “virtual”, and so
all dependencies can be mocked), but it may still be worth
choosing carefully how large the “unit” in each unit-test is, with the
overarching goal of retaining readability of the codebase as a
whole. That is to say, the “unit” might most easily be more than one
class, or more than one file.
Judgement is also needed when writing the tests themselves. It’s
all too easy to write tests that are either too precise and thus too
fragile against correct changes to the code, or too vague and thus
capable of missing incorrect changes. One of our developers was
unlucky enough to encounter a bogus test in the very first change he
made to our C++ codebase: adding some debug logging of the imp’s
current list of servers. A unit-test started failing because
getServer() was being called “too many” times. But pretty
clearly, it wasn’t part of the external contract of the class how many
times it called getServer() – the test was too precise. The
mantra should be: test the functionality of the unit, not the
implementation. If I want to know whether the implementation
is the same as it was yesterday, I’ll use diff – I want the
tests to tell me whether it’s fulfilling its external contract the
same way it was yesterday, following any changes to its
implementation. Practice makes perfect, here: the more tests you write,
the better you get at judging these things.
There’s a school of thought that says that tests are so
important that you should write them first, before the implementation:
test-driven development. In our experience that works well for some
tasks – self-contained, purely algorithmic ones – and less
well for others. For instance, when writing driver code, the largest
source of bugs is an incorrect mental model of how the hardware works:
but as the same developer writes the tests then the code, it’s
too easy for the tests and the code to be consistent with each other
but both wrong. Where it does work, though, it works
brilliantly: the imp’s memory allocator was written test-first
– and, as it’s a well-defined and well-understood problem,
writing the tests was relatively easy.
Test-driven development is as much about improving the quality of
the tests as it is of the code. If you know (or think you
know) that some particular functionality will be needed in the
implementation, but the tests you’ve written so far
haven’t exercised it yet, it does make you stop and think:
how do I write a test for that – how do I get the
system into that state? Either it’s forcing you to come up with
a test you didn’t realise you needed – or, even better,
it’s telling you that your expected elaboration wasn’t
needed at all.
All told, I don’t think it’s a huge exaggeration to say, “If you
don’t have tests, you don’t really have the code” – because without
the tests, you can’t do anything with it: the path to being
able to refactor it, is nearly as long as the path to writing it again
from scratch.
System tests
Unit tests are good at ensuring that individual modules fulfill
their contracts. But the task of writing quality software is, sadly,
still not done at that point: assembling these contracts together to
form an overall system is still a human process, fertile with
opportunities for human error. Unit testing reduces the risk of
inter-module bugs, but cannot eliminate it altogether.
So at some point you’re going to have to test larger parts of the
system. Michael
Feathers’ rules of thumb as to what is a unit test and what isn’t,
are widely quoted: the key distinction being, a unit test should be
basically instantaneous and invisible, at least when
successful. System tests are a bit more heavyweight, and might be more
involved to set up: for instance, target hardware might be needed.
In theory, system testing proceeds via structural induction: prove
that component A works, then prove that A+B works,
then A+B+C, and so on until you’ve tested the whole software
system. In practice, it usually suffices to test major components, and
then the system as a whole: for instance, the imp hardware, then the
imp server, then end-to-end tests of the whole enchilada.
Just as unit tests can be viewed as documentation for the code,
system tests can be viewed as high-level specifications of the entire
system’s behaviour: “when an imp contacts the server, it’s sent the
appropriate Squirrel for the device it’s plugged into”.
Static analysis
The big idea with static analysis (and, for that matter, dynamic
analysis), is this: for any property that your code must have in order
to be correct, it’s more reliable to have an automated
check that your code has that property, than it is to rely on every
programmer to manually keep it in mind the whole time. For
instance, Javascript:
The Good Parts – a book conspicuously much thinner than
“Javascript: The Definitive Guide” – is really about
avoiding language features that cause misleading or buggy code. Most
of the traps the author describes, can be detected by the JSLint or JSHint static-analysis tools;
running one of those tools before committing, can thus keep
questionable or hard-to-maintain constructs out of the codebase.
In the wild-west days of C, similar coding traps were caught by a
program called lint – though when C++ came along,
Lots of warnings:
g++ -Wall -Wextra -Werror -Wundef -Wno-unused-parameter
-Woverloaded-virtual -Wlogical-op -fstrict-enums -Wno-long-long
-Wpointer-arith -Wnon-virtual-dtor -Wno-sign-conversion
-Wunused-but-set-variable
|
it designed out most of the need for
lint, and incorporated
much of the rest into the compiler. Some of it is in the compiler
only in optional warnings or errors – so we always run GCC in a mode
with lots of warnings and errors. (On projects that also use other
compilers – MSVC for example – it’s a good idea to turn on all the
warnings in all of them, as every compiler detects different
problems. But here at EI, all our targets are GCC-based.) We’re happy
to keep looking, but we’ve yet to find a lint-like tool for C++ that
finds real bugs.
Other languages have their own static-analysis tools: Erlang has Dialyzer, and
XCode has some stuff for Objective-C. Though more are always welcome:
for instance, code written against node.js makes widespread use of
callbacks as part of asynchronous APIs. So it’s a correctness
constraint that any routine that takes a parameter called cb,
must call it exactly once (not twice, not no times) on every
path through the function. That should, in theory, be amenable to
automated testing (at least for most easy cases).
Dynamic analysis
Dynamic analysis tools also tend to be language-specific. In C++,
the use of Valgrind is well-known;
Helgrind, bundled with Valgrind, can be awkward to use but can also be
invaluable; and mutrace and
perf
are helpful at that stage of project maturity when the question
becomes, “yes, yes, it works, but why isn’t it
faster?” (That’s usually a pretty late stage of maturity, on
the timeless basis that it’s easier to optimise correct code, than it
is to correct optimised code.)
The one act of dynamic analysis that all types of code can enjoy,
though, is code-coverage analysis. Code-coverage analysis is
the ugly cousin of test-driven development, but is good enough for a
rainy weekend in Norfolk. Developing while keeping an eye on
the coverage statistics is almost, but not quite, as reliable as
test-driven development as a method of ensuring that your code has
good-quality tests.
Continuous integration
We use the very splendid Jenkins continuous-integration
server (autobuilder). Its main responsibility is running, after each
commit, all the tests and analyses that would be too time-consuming
for developers to run before each commit. Or too awkward – Jenkins
runs all the system tests that require special hardware, and also
runs builds on all supported host platforms.
The quality bar for pushes to master, is clean runs on all of these
test suites; any failures are stop-the-line emergencies. If a build or
tests is failing, the very next push must be the fix, or other
developers can’t continue pushing (because they can’t know whether
their own work passes that test or not). This is usually known as “do
not commit on red” – although with Git, it’s actually the “push”, not
“commit”, operation that’s the relevant one. Because of that, a
failing build means nobody else can make progress, so fixing it is
viewed as an emergency.
Using Pivotal
We use Pivotal Labs’ Pivotal Tracker to keep
track of work items. As Electric Imp is spread out over several
time-zones, it’s important to have an online tool that we can use to
look at the status of projects or tasks without having to bother
actual human beings. We looked at a few online Agile tools, and the
simplicity and easy visibility of Pivotal made it definitely the
best.
For those who haven’t seen it, it arranges tasks into (basically)
three columns: “current”, “backlog”, and “icebox”. “Current” is for
the current sprint, “backlog” is for upcoming sprints, and “icebox”
for everything else. Tasks can be moved around and re-ordered by
drag-and-drop. A task is either a user-story, a “chore” (which we use
for refactoring or technical-debt tasks that aren’t end-user-visible),
or a bug.
We use this system to work on tasks in priority order, as prioritised by
the product
owner:
- Anyone can add new stories, chores, or bugs to a “new tasks here”
section in the icebox.
- The product owner has a think about the new task (if it’s a story
or a bug), and prioritises it either into the backlog, or into a “nice
to haves” section in the icebox.
- If it’s a chore, the developers prioritise it between themselves
(typically, just before a story which would be aided by the cleanup,
or which would need re-doing if done before the cleanup).
- As developers finish up previous tasks, they pick the topmost one
in the backlog each time. (We don’t really deal in “sprints” like
canonical Agile – it’s more task-by-task, like kanban.)
Using Git
Unlike, say, Subversion, Git makes having a neat project history
achievable. So we try to achieve it.
Bow-shaped
feature branches let us view individual features either entire or
as composed of a series of patches. That’s good for cherry-picking,
it’s good for (ahem) reverting, and it’s good for rereading old
commits to understand the origins of a piece of code (or of a bug). In
general we name single-developer feature branches with the developer’s
initials plus a very short indicator of the feature, such as
pdh-discovery or rs-schema. (The developer’s
initials serve as a kind of “watch out, I might rebase this branch at
any moment”.)
The server and the client have different
needs from their respective release processes, mainly because of
“in-the-drawer” syndrome: previous client releases remain important in
perpetuity, because a user could stick one in a drawer, forget about
it, remember a year later, and then try and connect it to modern-day
servers. The same doesn’t apply the other way around: previous server
releases are, to a large extent, yesterday’s newspapers.
Ensure it all happens
We’re the engineering department; we’ve all been hired to do
engineering. It’s pretty likely that we’re the last line of
defence for things being well-engineered and done right: very few
customers or project-managers have ever been heard to say, “Yes, yes,
but can’t you take a bit longer and do it properly?” – indeed, many
have been heard to say the exact opposite.
So for instance, adding automated tests is always an integral part
of the development process – whether it’s done strictly
before coding the functionality, as the test-driven school would say,
or in a more intermingled way. It’s not an optional
extra. Developers incorporate the building of tests into the original
development estimates, and project owners don’t
(shouldn’t) accept a story as “done” unless the
testing is in-place too.
Being the engineering department, we don’t necessarily get to
decide the balance between engineering goals and commercial goals (in
fact, we probably don’t want to do that). But it does
behove us to make sure that the people who do make those decisions,
know what the engineering consequences will be.