Whitebait, Kleftiko, Ekmek Special

Saturday, 12 April 2014

Is everything in this release pushed upstream?

Engineering is repeatability. If what you’re doing isn’t repeatable, it isn’t engineering, and may in fact just be performance art instead. (Not that there’s anything wrong with performance art per se, but if you bought a tin marked “Engineering”, and instead it contained performance art, you’d take it back to the shop.)

Part of being able to repeat something is knowing what it is that you’re trying to repeat. Or, now that we’re moving past the “fanciful opening paragraph” phase of this blog post and into specifics: part of making a software release is knowing exactly what source code it was built from, so that if there’s ever a problem with the release, the exactly-corresponding source code can be analysed to discover the cause. (Sometimes tools can make this hard; sometimes they also have an option that makes it easier again.)

In the days of CVS and Subversion, when there was a single repository for a project, repeatability was typically achieved by using the tagging mechanism of those version-control systems: anyone could later check out that same tag and obtain code identical to the release. (In the days before CVS and Subversion, we probably just wrote a code snapshot to a CD or a floppy and kept it somewhere safe. We were cowboys once and young.)

But now we live in the days of Git. It’s possible to use Git in a purely peer-to-peer fashion, every developer being their own island – but that could easily cause unrepeatability of releases, releases as performance art. To make releases as acts of engineering, you’re going to have to nominate one repository as the golden or master one, the one where the tags and SHAs of releases live.

The Linux kernel project, for which Git was originally designed, does this by having a release manager (Linus Torvalds himself) whose personal (but centrally-hosted and shared) repository is the golden one, and who makes releases from code he’s pulled from other developers’ repositories as they complete features.

There’s another way of doing it, though: one which is more familiar to developers who have used CVS or Subversion-era tools. And that is to have a central golden repository for the project, which everyone has write access to, and to use git push as if it were svn commit. But it may be inconvenient to build the releases on the machine hosting the Git repository, so release managers will still do that on their own machines in their own local Git repositories.

So, especially if the release manager is also a developer, the engineering question becomes: is this local commit, from which I’m building the release, also in the golden repository? Is my SHA upstream? (Or am I just pleased to see you?) This is an important question because, if you use bow-shaped branches – or any other rebase workflow – then the SHA of your commit may well be different by the time it has gone upstream.

What’s really needed is a short and pithy Git command that looks a bit like “git is-releasable HEAD” – but there isn’t one, so in order to answer the question “Is it OK to build a release from this commit?”, you need to build that facility from several Git commands.

First off, clearly it’s not OK to build the release if you have uncommitted changes:

git status # Sometimes needed to refresh the index (?) git diff-index --quiet HEAD || echo Nope

Now you need to check whether your SHA exists upstream:

test "`git branch -r --contains HEAD`" != "" || echo Nope

That works by listing all the branches in the golden repository which contain your current commit. If there’s at least one such branch, it’s OK to build the release – otherwise, no branch in the golden repository contains your commit, so it isn’t currently repeatable, so it isn’t OK.

Some of this functionality is offered by git describe --dirty, but apart from being less fun than it sounds, it doesn’t answer the question about the golden repository. (Perhaps because, in its original Linux home, releases are always made from the golden repository.)

At Electric Imp, the version information that we incorporate into every build of the software, includes not just the Git SHA but also the “OK to release” information and a corresponding Git tag name (if any). The “OK to release” is marked by having a “+” in the version number when it’s not OK, i.e. when this must be an internal build only. (You can’t always force people to do the Right Thing, but you can at least make sure that they’re told about it when they’re doing the Wrong Thing.) Here’s the relevant part of our top-level SConstruct file:

gitRevision = subprocess.Popen(["git","rev-parse","--short","HEAD"],
                               stdout=subprocess.PIPE
                               ).communicate()[0].strip()
subprocess.check_output(["git", "status"]) # Refresh index ready for diff-index
if subprocess.call(["git", "diff-index", "--quiet", "HEAD"]):
    # Local diffs, give it a plus
    gitRevision += "+"
    print "Local diffs -- NOT FOR PRODUCTION USE"
else:
    gitRemote = subprocess.Popen(["git", "branch", "-r", "--contains", "HEAD"],
                                 stdout=subprocess.PIPE
                                 ).communicate()[0].strip()
    if gitRemote == "":
        # Doesn't exist in upstream branches (i.e., not pushed): give it a plus
        gitRevision += "+"
        print "Ahead of upstream -- NOT FOR PRODUCTION USE"

gitTag=''

if "+" not in gitRevision:
  gitTagPipe = subprocess.Popen(["git", "describe", "--exact-match"],
                            stdout=subprocess.PIPE)
  rawtag = gitTagPipe.communicate()
  if gitTagPipe.returncode:
    print "Not tagged -- NOT FOR EXTERNAL RELEASE"
  else:
    gitTag = " - " + rawtag[0].strip()

ei_version = gitRevision + gitTag + " - " + datetime.datetime.now().strftime("%c")

So a build from a known and stable SHA (i.e., one which exists in the golden repository) gets a version looking like this:

92a5ff6 - Fri Feb 7 18:25:04 2014

whereas a local build with a bunch of stuff I haven’t pushed upstream yet gets a version looking like this:

c47c6da+ - Sat Apr 12 20:51:25 2014

with the tell-tale “+” in the SHA. And a version that’s tagged in Git for external release has the tag added to the version string too:

af0f28a - release-27.10 - Fri Dec 13 11:08:38 2013

and that is, in fact, the current format of the string returned on the imp itself by imp.getsoftwareversion() – though we make deliberately very weak guarantees about the format of the string (it’s human-readable, it’s different for different releases) in case it ever needs to change in the future.

Wednesday, 25 September 2013

A Git workflow at EI

Electric Imp have used Git from the very beginning of the company, and in that time we’ve evolved what I at least reckon is a useful way of using it, a useful workflow.

It’s ended up similar to, but not quite the same as, Vincent Driessen’s “Gitflow” model, and this blog post purposely uses similar diagrams, terminology, and colour-coding to that one, to make comparisons easier (though hopefully it also stands alone, for those who haven’t read it).

The big picture

There’s a single central Git repository, origin, from which all releases are made and in which all tags reside. Because Git is “decentralised”, each developer has one or more local repositories too.

This diagram, like Vincent Driessen’s original, is drawn with oldest at the top, newest at the bottom, which is the opposite of the convention used by gitk.

Quick summary of differences from “Gitflow”

The yellow (main, integration) branch is, for historical reasons, called master;
The blue (deployment) branch is called production;
Bug-fixes are cherry-picked out from yellow to release branches wherever possible, rather than being merged from release branches back to yellow;
Pink feature branches (those done by single individuals, at least) are done as bow-shaped merges;
Because of the bow-shaped merges, yellow is never merged out to feature branches: if a feature branch needs some new stuff that’s landed on yellow, it gets rebased on top of yellow;
Because we do two kinds of releases from the same codebase – server deployments which are lightweight and rapid, and client firmware upgrades which are more heavyweight and intrusive – there are two kinds of green release branch which are treated slightly differently. (But the server deployment one works much like the “Gitflow” equivalent.)

These are mostly fairly minor differences. (But notice how there are very few non-rebased merges.)

The two long-lived branches and their relationship

The integration branch (“master”, yellow) and the deployment branch (“production”, blue) are the only branches that continue to get new commits indefinitely.

All new work happens on master;

A one-commit story:

$ git checkout master
$ git pull
hack ... hack ... hack
test ... test ... test
$ git commit
$ git pull --rebase
test ... test ... test
$ git push

work that consists of only one or two commits goes straight in, and work that’s more involved than that lands via the merging of a feature branch, of which more below.

The Jenkins continuous-integration server runs whenever new commits are made to master: it builds the whole codebase for all relevant platforms, runs all the unit tests, runs all the integration tests, and finally runs some system-tests on a test farm of real hardware. The quality bar for pushes to master, is clean runs on all of these test suites; any failures are stop-the-line emergencies. If a build or tests is failing, the very next push must be the fix, or other developers can’t continue pushing (because they can’t know whether their own work passes that test or not). This is usually known as “do not commit on red” – although with Git, it’s actually the “push”, not “commit”, operation that’s the relevant one.

This achieves the goal that “There is a known production branch, so you don’t have to think. If you checkout the equivalent of production, it’s either exactly what’s currently in production or it’s what’s about to be in production.”

Also, “The production branch is known-good. It is never a mistake to push the production branch to production servers, ever.” This eases communication with the Operations team. New work is never done directly onto production: it arrives there due to either merges or cherry-picks from master (possibly via an intermediate release or hot-fix branch).

Feature branches

Feature branches are usually short-lived, and indeed usually exist as named branches only in developers’ local repositories. (With Git, if you merge a branch locally into master and then push the result, the branching structure is pushed to origin and becomes part of permanent history, but the branch name isn’t pushed, and doesn’t appear in the origin repository except perhaps in the commit comment of the merge.)

Feature branches are usually named with the developer’s initials and a brief hint to the branch’s purpose: for instance, pdh-regexp was my branch for implementing a regular-expressions feature.

Starting a feature branch:

$ git checkout master
$ git pull
$ git checkout -b pdh-modbus
hack ... hack ... hack
test ... test ... test

There are two exceptions to the above description, both (hopefully) pretty rare: the first is that, if a feature branch is getting so big or so long-lived that it could do with living on the origin server too purely as a backup strategy, then its developer can push it to origin. Prefixing the name with the initials, though, makes clear that it’s a private branch, in the sense that it is likely to get forcibly rebased by that developer, so caveat emptor.

The second exception is when several developers are working on the same feature. This is also probably relatively rare (Kanban and Agile encourage single-developer, or single-pair, working), but it doesn’t fit the same model, because a branch that gets commits from two different sources, can’t be rebased without messing up the other developers. So in that situation, you’d keep the feature branch on origin, the co-operating developers would pull it using git pull --rebase and push it using git push. Once the feature is reviewed, QA’d, and delivered, the collaborative feature branch can be merged to master. This is the only situation in which a non-rebased branch gets merged to master. (“Gitflow” also suggests the use of developer-to-developer, not developer-to-origin, Git pulls and pushes for managing this case, but that sounds to me like a recipe for confusion, plus it’s hard to do with a rebase workflow.)

Once a feature on a branch is complete (and reviewed, and tested), the feature branch can be merged back to master. This is done by rebasing the feature branch on top of master, then doing a no-fast-forward (--no-ff) merge; the thinking behind that style of merge, and full information and walk-throughs of how to perform one, can be found at Bow-shaped branches: a Git workflow.

Because, in order to do a bow-shaped merge, every feature branch eventually gets rebased on top of master, there shouldn’t be any merges from master out to a feature branch. If the feature branch needs some functionality that only landed on master after the feature branch started, it should be rebased on top of master instead. Indeed, it’s good practice to rebase all your feature branches on top of master fairly regularly, as it eases and subdivides the final rebasing process that happens before the delivery merge.

Notice that with the bow-shaped merge construction, although there can be several current unmerged feature branches at any time – mostly in developers’ local repositories – the merging process serialises them completely (by always rebasing before pushing), so that Git permanent history never contains overlapping or nested ones. This makes it easier to find problems using git bisect.

Two different release patterns

Electric Imp has a single repository from which all parts of the system are built: this eases system testing, and the addition of system-wide features, but it does mean that two different types or cadences of “release” happen from the same codebase.

Server releases are deployed to our cloud service. As is best-practice in the server software culture, this is (close to) continuous deployment. Releases are made really quite often, sometimes several times per day – so often, in fact, that it’s pointless even to tag or number them (we’d be in the hundreds). This is achievable because it’s relatively easy for automated testing to cover the entire gamut of server functionality, because upgrades themselves and reverts or hot-fixes are so straightforward as to be virtually push-button, and because (assuming the revert script works as-tested) the impact of a “bad” release is relatively minor. The pace of server releases demands a lightweight release process.

None of those considerations apply to client firmware releases: covering the gamut of firmware functionality can require custom hardware, upgrades get downloaded over the Internet and programmed into flash memory (which is a bit disruptive and can be time-consuming) – and, in theory at least, a “bad” release could be quite awkward to recover from (requiring careful actions by individual end-users). So firmware releases are performed with considerably more caution: the QA, beta-test, and qualification process for a newly-made release branch typically takes a number of weeks. This is (by our standards at least) a heavyweight release process.

Another important difference is that the end-user can at any time get bored of the device, put it away in a drawer for an arbitrary length of time, then rekindle their interest, retrieve the device, and try to use it. This means that the current server release must work with all previous client releases (at least enough for them to upgrade themselves), a criterion fortunately not present in the reverse direction. This concern makes it worth our while keeping the total number of client releases down (and getting cross when “beta” or “test” releases go out without being tagged).

The heavyweight release process

The heavyweight release process, which we use for firmware releases, is based mainly on an abundance of caution.

Once the required collection of new functionality has landed on master, a new release branch is made. This is named after the first release that’s expected to be made from the branch: every release is numbered, with (for instance) releases 25, 25.1, and 25.2 all coming from the release-25-dev branch.

Once the branch is made, it is subjected to the unblinking eye of QA – even a culture of good unit-tests, integration tests, and system tests does not rule out the need for exploratory testing before release.

For major new functionality there may even be a closed beta process, where end-users hand-picked for both eagerness and cluefulness get given tagged beta releases from the branch to supplement our internal testing.

Once a release branch is made, the only subsequent changes are bug fixes. If and when issues are found on a release branch, we adopt GCC’s rule that fixes must (wherever possible) be made on master first and then cherry-picked out to the release branch. This is what ensures that the fix will also end up in subsequent releases: unlike in “Gitflow”, the release branch is not merged back to master.

And if (horrors!) an issue should crop up in a the release after is tagged and rolled out, it again gets fixed on master first and cherry-picked out to the release branch. A point release gets tagged and rolled out: release-27.1, say.

Only if master has moved on so much in the meantime, that the fix for master doesn’t apply on the branch, would fixing take place directly on the release branch.

The lightweight release process

The lightweight release process, which we use for server releases, is based on responding with alacrity to new requirements or to current events – for instance, unexpected load on the servers might require new logging or instrumentation to be added basically immediately.

Releases are made so often that they don’t even get names (and nobody would remember or use them if they did). So to indicate the current state of the production servers, a deployment branch is used. (This is the same as the “blue branch” of “Gitflow”, except that we call it production rather than master.) It’s also the case that, because when we upgrade the server everyone gets it straightaway, previous versions are dead and gone: they don’t hang around in the way that previous firmware releases do. To a much larger extent than with firmware, at any given time only the most recent release matters at all.

As for updating production: if major replumbing or massive new functionality has landed in the server code, it might sometimes be useful to use the heavyweight process – except, with the success event being merging out to production rather than tagging and releasing. More often, though, the necessary alacrity is achieved by a reduced process: picking a suitable version of master, testing it (perhaps by deploying it to staging servers), applying fixes directly to master where necessary, and then simply merging out to production and pushing.

Hot-fixes, small patches to the production code done for emergency situations, can be written on master, cherry-picked locally into production, passed by code-reviewers and/or QA, and then pushed to origin/production. (In “Gitflow”, hot-fixes are landed via a short-lived hot-fix branch. That would be useful where a hot-fix itself consists of a series of commits, not just one – but that seems like it would rarely actually happen.)

Scaling out to enormous development organisations

All of the above assumes that the development organisation is small enough to operate as a single team. Above a certain size, this starts to become awkward: even the rare bad commits on master start to happen too often, and the (lock-free but not wait-free) bow-shaped merge process starts to become a bottle-neck.

In this situation, all you can do is introduce more process (and hope that the increase in developer numbers offsets the decrease in per-developer productivity – an outcome far from guaranteed). What you end up doing is dividing into teams and running the heavyweight release process – but, instead of releasing directly, releasing to an internal “meta-integration” branch where the “best available” versions of each team’s work are combined, to then face further automated and manual testing before actual release.

Really enormous organisations would end up with meta-meta-integration branches, or worse. Releases become great tides that ripple through the organisation, to be taken at the flood or omitted as necessary: the magic phrase to Google for to read more about Agile-in-the-large seems to be “release train”...

Saturday, 24 August 2013

Bow-shaped branches: using vendor branches in Git

LATER EDIT: the method described here turned out not to scale very well for large vendored libraries with lots of changes. If your Broadcom (or similar) merges are killing you, you should probably be using a rebasing vendor flow instead.

Following on from the one about bow-shaped branches, one of the remaining ways that Git history can get untidy, despite following those precepts, is when you’ve got a long-running “vendor branch” for some third-party code. The usual pattern is, to have a branch containing successive deliveries of the third-party code; when a new delivery needs to be integrated, it’s checked-in unaltered on that branch, and then the branch is merged back to master. This means that the branch effectively has a diff that corresponds only to the vendor’s changes in the new release (relative to the previous release), so that Git’s merge infrastructure helps you integrate those changes with any changes you’ve made yourself to the third-party code.

Which is all to the good, but presents a problem for those striving for tidy Git history. Especially in the case of widespread changes or ugly merges, integrating the new delivery could end up being as much work as a new feature in its own right – which means it really ought to take place on a feature branch. But if you try and do that feature branch as a bow-shaped branch, starting off with a commit that’s the merge from the vendor branch, then the bow-shaped workflow (and indeed the simple rebase workflow) won’t work out of the box: every time you attempt to rebase the feature branch, Git tries to rebase the entire vendor branch, which is hopelessly not what you want.

The obscure Git feature which in this case turns out to be exactly what’s wanted, as if designed for it, is git rerere. Git can be instructed to remember merge-conflict resolutions, and replay them automatically if the precise same conflict is encountered again. This feature isn’t enabled by default, no doubt because it’d confuse people about how their conflicts somehow disappeared, but it’s just what’s needed for rebasing a feature branch that includes a merge from a vendor branch.

Let’s start by having a look at the desired result. On the right is a screenshot from gitk, which shows the newest commits at the top. It shows that the long-lived vendor branch “broadcom” has been merged following a new delivery of the third-party code, and that the work of integrating the new release itself happened on a bow-shaped branch. No superfluous merge commits are visible.

Here’s how to make history that looks like that. First of all I need to turn on the rerere feature:

git config --global rerere.enabled true

Then import the new delivery. This part is much like the process of using vendor branches in any other source-control system:

checkout the vendor branch
remove the old code entirely
untar/unzip the new delivery
rename if necessary (usually, at least, removing version numbers from directory names – the delivery depicted here untarred into a directory called WICED-SDK-2.3.1, but we wanted it called thirdparty/broadcom)
but don’t touch the source in any other way
add the changes, additions, and removals to Git (git add -A)
commit everything to the vendor branch with a commit message along the lines of “import Broadcom Wiced 2.3.1”

But then it’s time to merge those changes back to master. The steps are as follows:

create a merge branch, here called “pdh-broadcom” (this is the branch that will eventually look bow-shaped)
git checkout master
git checkout -b pdh-broadcom
merge the vendor branch
git merge broadcom
1. if there were merge conflicts, fix them up; don’t do any other edits at this stage
2. once the merge conflicts are fixed, commit (to the merge branch)
  git commit
compile and test the result
commit any compile or test fixes to the merge branch
test some more, hold code reviews, commit more fixes, etc., until you’re satisfied that it’s ready to merge to master

At this stage, history looks like the picture on the right: a bit like the “ready to merge” stage of a normal bow-shaped branch, except of course with the long tail of the vendor branch heading off into the distant past.

So let’s get hold of the current state of master:

git checkout master git pull --rebase

Aha, see on the right, someone’s beaten us to the punch again. We need to rebase our work on top of theirs. But here’s the point at which a bit of caution needs to be applied. Blithely using git rebase as if the vendor branch weren’t there, will instruct git to rebase the entire vendor branch on top of master, which will typically be somewhere between undesirable and complete carnage.

There’s a -p (“preserve branching”) option to git rebase option which will stop it doing that, but there is still the issue that re-doing our merge branch, will involve re-doing the merge itself – which will involve re-resolving all the conflicts that we got in stage 2 above.

And that is where git rerere comes in. Because we enabled rerere above, each time we committed the fix to a merge conflict, git remembered the conflict and our resolution of the conflict. And when we re-encounter the same conflict again, as a result of rebasing, git re-resolves it for us:

git checkout pdh-broadcom git rebase -p master Resolved 'thirdparty/broadcom/Wiced/WWD/internal/wwd_wifi.c' using previous resolution. Automatic merge failed; fix conflicts and then commit the result. Error redoing merge 7a485d1d

In fact, Git is being unduly alarmist here. The merge did technically fail, but rerere has in fact already solved all the issues automatically. So we just need to convince Git of that, noting which files or directories it thought were a problem:

git add thirdparty/broadcom/Wiced/WWD/internal git commit

So we’ve now got a fully rebased merge branch, ready to merge back to master – as at right. The merge itself is now just like a normal bow-shaped branch:

git checkout master git merge --no-ff pdh-broadcom

So at last (as at left) we’ve got where we need to be: it’s a merge from a vendor branch, but it looks like a feature branch.

The more I use Git, the more I feel the truth of these two rules of thumb about it: there is always a way to do X, and there is always a use for command Y. Blogging about Git is really an exercise in providing the Y corresponding to a certain X, which is not always obvious.

Sunday, 2 June 2013

Bow-shaped branches: a Git workflow

Distributed version control systems, and Git in particular, have for all practical purposes wiped out CVS, Subversion and similar systems. But Git is new enough that there’s still discussion to be had on the best ways of using it: the best Git “workflows”. One goal of a good workflow is that Git history – the graph of all previous revisions – remains readable and comprehensible. Or, put conversely: one symptom of a bad workflow is a convoluted history that looks like a Tube map, or worse.

One way in which git history can end up looking a tangled mess, is if the history graph includes lots of merges caused by doing git pull when upstream has also been updated. Those merges don’t really add that much information – only that two things happened simultaneously, which is a perfectly normal occurrence in teams of more than one person. So most development teams, at least once they’ve been using Git for a while, decide that merges caused by git pull cause unnecessary tangles, and git pull --rebase gets mandated instead.

But what if development occurs on “feature branches” which are always getting merged? This case, too, can be dealt with using rebasing: in this case, rebasing the branch before merging it. But a pure rebase workflow serialises everything into a straight line, losing the information that these commits, in some sense, belong together.

So what’s really needed is a hybrid of the rebasing and merging styles: something that keeps the feature commits together, but also keeps history looking neat (and, in case of disaster, allows easy reverting of the whole feature). What’s needed is bow-shaped feature branches.

Let’s look at the desired end-result first. On the right is a snapshot from gitk --all, showing a feature branch that I’ve just merged, ready to push. As always with gitk, the newest commits are shown at the top. Two different routes join the same two endpoints, spanning the addition of my new feature: one direct, and another taking many small steps, so that the overall effect is a bow shape. This means that, though each individual commit is readily accessible, so too (by comparing the top and bottom of the bow) is the overall effect of the whole feature. Notice how the commits are all bunched together, but with a limited, local amount of branching: a run of N such branches consecutively would still, in some sense, have O(1) complexity, not O(N). (At least, it would have O(1) concurrently-active development branches, which is believably a useful measure of the complexity, and the comprehensibility burden, of history.)

Someone looking back through history who’s only interested in the large-scale changes, can thus move back via the “express train” lines and not the “stopping train” (or “local train” in US parlance) that goes via every commit. But they still wouldn’t have to follow many different branch lines, as they would in a pure merge workflow, because each bow-shape happens on top of the last, not in parallel with it.

So how to build history graphs that look like that? On the left is gitk’s view of my local repository after doing some work on a feature: in this case a feature made up of seven commits. (Commit early, commit often; but stay bisectable.) If I just did a git pull followed by git push, I’d get a merge with commits on both sides; if I did the same but pulling with --rebase, I’d get a straight-line history with my commits at the top.

In order to get the bow shape, though, I’m going to need to retrospectively turn my commits into a branch. Fortunately, that’s straightforward in Git. First, I create a new branch, named after my initials and my feature (the branch itself won’t be pushed upstream, but its name appears in the merge message, so it ought to be fairly descriptive):

git checkout -b pdh-regexp

At this point my repository looks like the picture on the right.

Then I need to wind back my local master branch to match origin/master:

git checkout master git reset --hard origin/master

Now my repository looks like the picture on the left: a feature-branch ready to merge. (If at this point you can't see your feature branch, make sure you used the --all option to gitk.) This is the point at which to do any smartening-up of the branch, using interactive rebase or similar, that is needed. It’s also the point at which to do a code review with one of your colleagues – a practice that’s strongly recommended, for various reasons.

[Edited 2014-Apr-05→] Now if by this stage master has moved on – in other words, if it’s not still pointing to the base of my feature-branch as shown in the pictures – then I’ll first need to rebase my branch on top of master, so that it looks like the gitk pictures:

git checkout master git pull --rebase git checkout pdh-regexp git rebase master git checkout master

And now I’m once more at the ready to merge stage, as seen in gitk.

At this point a normal git merge would do a fast-forward merge, returning me to the picture above on the right – so what’s needed instead, is an explicitly no-fast-forwards merge (which can never produce conflicts, because nothing else has happened to the top of master). First I’d better just check that I’m ready to merge:

git checkout master git branch --contains HEAD

which will print a list of branches, including at least master itself, that contain the commit at the top of master. If my branch isn’t in the list, then I wasn’t, in fact, ready to merge – which means that I need to rebase on top of master again. If, though, my branch is in the list, then it’s time to do the bow-shaped merge itself:

git merge --no-ff pdh-regexp

And now I’m where I need to be, like the picture above right. [←end edits]

So it’s time to push:

git push To peter@git.electricimp.com ! [rejected] master -> master (non-fast-forward) error: failed to push some refs to 'peter@git.electricimp.com/ei.git' To prevent you from losing history, non-fast-forward updates were rejected Merge the remote changes (e.g. 'git pull') before pushing again. See the 'Note about fast-forwards' section of 'git push --help' for details.

Ahh, someone else has pushed a different feature while I was doing all that. So I need to go round the loop again.

git reset --hard origin/master

This has the effect of undoing the (trivial) merge commit, and the repository ends up like the picture on the left.

Now I can fetch the new commits to the top of master:

git pull --rebase git checkout pdh-regexp git rebase master git checkout master

Of course, git rebase master might produce conflicts, which should be cleared up in the usual way. (And if the conflicts are too hard to clear up, maybe a real merge is called-for, so that history records that the project had a hard time here.) But if all goes smoothly, I end up with the picture on the right: a feature branch ready to merge, as above – the only difference being that the branch is now on top of a newer commit as the head of origin/master. So I can carry on from that point.

Of course, sometimes a feature is too small to merit all of this ... mechanism. If something goes in in just one or two commits, then it’s probably not complex enough for its presence in history to be controversial: for those, just use the pure rebase workflow. But once a feature has grown to three or four commits, it’s worthwhile making it as clear as possible, to those who follow us, what’s gone on. As for an upper limit: I’ve seen bow shapes with 20–30 commits which still looked fairly sane. Any feature which takes more commits than that, could probably do with being subdivided anyway: not least because the rebasing is probably becoming a major pain at that scale. Again, perhaps a real merge commit is called-for, and again this helps signal to future historians that here is a point at which perhaps the project was having a bit of a hard time.

Wednesday, 4 January 2012

Giving An FTDI Serial Port A Persistent Device Node

There are plenty of perfectly good reasons why one might have several FTDI USB-to-serial adaptors attached to the same PC at the same time. Anyone who does, though, will notice that they’re a royal pain, because every time you unplug and replug one, it gets a different /dev/ttyUSBn device node.

Fortunately, there’s a way to use udev to set up unchanging aliases for those ever-changing device nodes. This is done by setting up a udev rule for each FTDI adaptor that you care about, that picks it out by serial number and gives it a custom symlink.

Some of the details are here, but just to sew it all together, here’s what you need to do.

First attach your FTDI adaptor, making sure all others are unplugged.
ls /dev/ttyUSB*
— only one should be listed.
udevadm info --name=/dev/ttyUSB0 --attribute-walk
— (using the ttyUSB number listed in the previous step) should dump a huge list of attributes
udevadm info --name=/dev/ttyUSB0 --attribute-walk | grep serial
— will list only a few lines, such as the following —
SUBSYSTEMS=="usb-serial" ATTRS{serial}=="XR00U1BU" ATTRS{serial}=="000000000000" ATTRS{serial}=="0000:00:1d.7"
The first ATTRS line is the one we’re interested in. That’s the USB serial number of the FTDI adaptor. (The subsequent lines are the USB serial number of an intermediate hub, and the PCI address of the USB controller.)
Now you need to go and find the udev rules directory: in a standard udev install, that’s /etc/udev/rules.d. In that directory, make a file called 99-local.rules (you’ll need to be root), and add a line like the following (and it must be all on one line; this blog might show it split) —

KERNEL=="ttyUSB?", ATTRS{serial}=="XR00U1BU", SYMLINK+="ftdiDUINO", MODE="0666"
(An earlier version of this page suggested the name 10-local.rules, which with modern versions of udev is checked too early in the process; using 99-local.rules makes it work with any version of udev.)
Obviously the ATTRS clause must match the one your adaptor listed above. The SYMLINK clause can be anything (that doesn’t clash with any other device), so name it after the thing it’s plugged into at the other end.
Unplug and replug your FTDI adaptor. Whatever ttyUSB number it gets, the symlink /dev/ftdiDUINO will be pointing to it.
Oh yes, that udev rule also makes the device node world-writable. So you won’t need to be root to issue
miniterm.py /dev/ftdiDUINO 115200

Saturday, 9 April 2011

The Failure Mode Of Agile Development

Here’s the “traditional” waterfall model of software development:

+------------+------------+------------+
| Design     | Implement  | Test       |
+------------+------------+------------+
^ Start                                ^ Deadline

And here’s its almost inevitable failure mode: design takes too long, or implementation takes too long, but the deadline doesn’t move, so test is cut short and the product ships full of bugs:

+------------>>>+------------>>>+------+.....+
| Design        | Implement     | Test |
+------------>>>+------------>>>+------+.....+
^ Start                                ^ Deadline

Software development organisations have been wearily familiar with this outcome, whether in their own code or other people’s, since at least Brooks (1975). Both the cause and the effect of schedule crunch are widely-known and well-understood: often the effect is so well-known that it can be specifically detected by reviewers and end-users.

Here, by contrast, is the fashionable new agile model:

+------------+------------+------------+
| Red        | Green      | Refactor   |
+------------+------------+------------+
^ Start                                ^ Deadline

In this case, “red” means you write the unit-tests first, but of course they fail (red light), because you haven’t written the code yet. Then you write the code, so they pass (“green” light). But so far what you’ve done is “the simplest thing that could possibly work” – in other words, you’ve been deliberately closely-focussed, overtly short-termist, to get your tests to pass, and a refactoring stage is needed to reintegrate your new functionality with the bigger picture.

Agile, of course, involves many of these cycles between project start and project deadline, not just one. (Indeed, some say that each cycle should be as small as a small tomato. I find that the going rate is two small tomatoes to one cup of tea.) So the agile diagram isn’t drawn to quite the same scale as the waterfall one: still, though, developers acquire the same sense of schedule crunch, they skip on to the next task too soon, and the corresponding failure mode occurs:

+------------>>>+------------>>>+------+.....+
| Red           | Green         | Refac|
+------------>>>+------------>>>+------+.....+
^ Start                                ^ Deadline

The cause is the same, but what’s the effect? The code was complete, and – if not perfect – then at least unit-tested, at the end of the green phase. So the product as actually shipped works, which is more than can be said for its waterfallen equivalent.

All that’s missing, in fact, is some of the refactoring effort. Unfortunately, that’s the only place in agile development where any large-scale design work gets done: the design debt that an agile shop takes out by not doing Big Design Up-Front, is paid off only in these refactoring installments. This means that in fact the effect of schedule crunch on agile projects, is that the system ends up under-designed and directionless at any other than the lowest level.

And unlike the paper-bag obviousness of the waterfall model’s failure mode, the agile model’s failure mode is subtle and pernicious. Product 1.0 ships and works – because agile development “releases” every sprint, and thus is perfectly fitted for triaging features against time. But the system is a ball of mud. Feature development on Product 1.5 and Product 2.0 takes longer than expected – which agile development also helps to hide, given its stubborn reluctance to countenance long-term planning – because developers eventually spend all their time battling, not intrinsic problems of the target domain, but incidental problems caused by previous instances of themselves.

Only the most obsessed agiliste would claim that agile development doesn’t have a failure mode. But because agile development is new, its failure mode is unfamiliar to us; and because that failure mode is less visibly catastrophic than Brooks’s, it’s easier to overlook. It is, however, real; its very subtlety requires us to pay particular care to look out for it, and to get right on top of fixing it once we see it start to happen.

Conversely, the fact that there are problems that agile development can’t solve, isn’t a fatal blow. Inevitably such problems are the most visible ones – because all the problems which agile development does solve easily, come and go without anyone really noticing. And the failure mode of agile development – the system’s complexity spiralling out of control – can be fixed without doing too much damage to the theory.

And how to fix it? Schedule in some serious refactoring, one subsystem at a time. In his paper Checklist for Planning Software System Production, RW Bemer, writing in August 1966 (August 1966!) says:

Is periodic recoding recommended when a routine has lost a clean structural form?

Nearly 45 years later, that’s still effectively the best available advice. Whether you call it refactoring or periodic recoding, it of course takes advantage of all the unit tests that the ball of mud already contains. This time round, it also takes advantage of knowledge about how all the parts of the subsystem operate. That knowledge is unlikely to be acquired in a single two-week sprint, so unless you can put someone on the task who already knows the subsystem inside-out (and most of the time you’re in this state, there won’t even be such a person), you’ll find yourselves breaking some of the rules of agile development by formally or otherwise block-booking someone’s time for a larger period. (Agile development aims at having any team member able to take on any task in any sprint – but for that to be okay, there mustn’t be any software problems complex enough to require more than two weeks’ thought. Some software problems just are that big, and context-switching can break the developers’ train of thought.)

This is the answer to the sometimes-asked question, “If, in agile development, everyone does design [as often as once per small tomato], what’s the rôle of the architect?”. Agile development is, in a way, Christopher Alexander’s observation that most things can be made piecemeal. But simplicity cannot be made piecemeal. The contribution of the software architect is simplicity.

Saturday, 22 January 2011

Is “Factory Method” An Anti-Pattern?

Let’s take another look at this version of the “two implementations, one interface” code from that one about portability:

// Event.h v8
class Event {
public:
  virtual ~Event() {}
  virtual void Wake() = 0;
  ...
};

std::auto_ptr<Event> CreateEvent();

// Event.cpp
std::auto_ptr<Event> CreateEvent()
{
  ... return whichever derived class of Event is appropriate ...
}

What I didn’t say at the time is that this is sort-of the Factory Method pattern, though a strict following of that pattern would instead have us put the CreateEvent function inside the class as a static member, Event::Create(). And the pattern also includes designs where CreateEvent is a factory method on a different class from Event, but it‘s specifically “self-factory methods” such as Event::Create that I’m concerned with here.

(As an aside: the patterns literature comes in for a lot of criticism for being simplistic. Which it is: the GoF book could be a quarter the size if it engaged in less spoon-feeding and pedagoguery. (And by pedagoguery I mean pedagoguery.) But in a sense the simplicity and obviousness of the patterns they’re describing is the whole point: thanks to the patternistas (patternauts?), a lot of simple and obvious things that often come up in programmers’ natural discourse about code, and that didn’t previously have names, now do. Reading a patterns book might not make your code much better unless you’re a n00b, but that’s not what it’s for. It’s for making your discourse about code better. In any n>1 team, being able to discourse well about code is a vital skill.)

But what I also didn’t say at the time, is that whether CreateEvent is a member or not, so long as it’s in event.h, this code appears to have cyclic dependencies — to be a violation of the principle of “levelisability” set out in the Lakos book.

What’s going on, as you can see on the left, is that, although the source file dependencies themselves don’t exhibit any cycles, viewing, as Lakos does, each header and its corresponding .cpp file as forming a component — the three grey rectangles — produces a component dependency graph with cycles: win32/event ↔ event ↔ posix/event.

One way around that would be to move CreateEvent out into its own component — a freestanding event factory — as seen on the right. With this change, the design is fairly clearly levelisable at both the file level and the component level. This refactoring is an example of what Lakos (5.2) calles escalation: knowledge of the multifarious nature of events has been kicked upstairs to a higher-level class that, being higher-level, is allowed to know everything. (The file event.cpp now gets a question-mark because, as the implementation file for what may now be a completely abstract class, it might not exist at all — or it might exist and contain the unit tests.)

But is it worth it? We’ve arguably complicated the code — requiring users of events to know also about event factories — for what’s essentially the synthetic goal of levelisability: synthetic in the sense that it won’t be appearing in any user-story. Any subsequent programmer working on the code would be very tempted to fold the factory back into event.h under the banner of the Factory Method pattern.

Moreover, in this case the warning is basically a false positive: if Event is truly an abstract class (give or take its factory method), then the apparent coupling between, say, posix/event and event is not in fact present: posix/event can be unit-tested without linking against event.o nor win32/event.o. (Not that, in this particular example, posix/event and win32/event would both exist for the same target — but factory methods obviously also get used for cases where both potential concrete products exist in the same build.) Though conversely, if Event had any non-abstract methods — any with implementations in event.cpp — then it’d be a true positive not a false positive, as all the different event types would be undesirably link-time coupled together.

One reason that the refactoring is worth it, is the same sort of reason that fixing compiler warnings, i.e. altering the code so they don’t trigger, is worth it, even in instances when the warning doesn’t point out a bug: because if you let warnings proliferate, real ones will get lost in the noise, and ideally you aim for the zero-warnings state in order that the introduction of any new warning is an easily-visible alert telling you that there’s a new potential bug to check for. Steve Maguire is talking here about unit tests, but the same applies to compiler warnings: “[W]hen they corner a bug, they grab it by the antennae, drag it to the broadcast studio, and interrupt your regularly-scheduled program”.

Exactly like compiler warnings, cyclic-dependency warnings — which are really design warnings — are sometimes false positives, but likewise it’s worth aiming for the “zero design warnings” state, because it makes new design warnings stand out so. I ran a cycle-checker script (in the spirit of the ones in Lakos) over my own Chorale project, and the result was that it effectively shone a spotlight on all the parts of the code where the design was a bit icky. Every cycle it produced — one was dvb::Service ↔ dvb::Recording, another was all the parts of db::steam depending on each other in a big loop — was a place where I’d at some time thought, “Hmm, this isn’t quite right, but it works for now and I’ll come back and do it properly”. And of course without anything to remind me, I never had gone back and done it properly.

So it turns out that you can’t have both factory methods and levelisability. You have to pick one or the other. And levelisability is better.