Not in fact any relation to the famous large Greek meal of the same name.

Wednesday 25 September 2013

A Git workflow at EI

Electric Imp have used Git from the very beginning of the company, and in that time we’ve evolved what I at least reckon is a useful way of using it, a useful workflow.

It’s ended up similar to, but not quite the same as, Vincent Driessen’s “Gitflow” model, and this blog post purposely uses similar diagrams, terminology, and colour-coding to that one, to make comparisons easier (though hopefully it also stands alone, for those who haven’t read it).

The big picture

There’s a single central Git repository, origin, from which all releases are made and in which all tags reside. Because Git is “decentralised”, each developer has one or more local repositories too.

This diagram, like Vincent Driessen’s original, is drawn with oldest at the top, newest at the bottom, which is the opposite of the convention used by gitk.

Quick summary of differences from “Gitflow”

  • The yellow (main, integration) branch is, for historical reasons, called master;
  • The blue (deployment) branch is called production;
  • Bug-fixes are cherry-picked out from yellow to release branches wherever possible, rather than being merged from release branches back to yellow;
  • Pink feature branches (those done by single individuals, at least) are done as bow-shaped merges;
  • Because of the bow-shaped merges, yellow is never merged out to feature branches: if a feature branch needs some new stuff that’s landed on yellow, it gets rebased on top of yellow;
  • Because we do two kinds of releases from the same codebase – server deployments which are lightweight and rapid, and client firmware upgrades which are more heavyweight and intrusive – there are two kinds of green release branch which are treated slightly differently. (But the server deployment one works much like the “Gitflow” equivalent.)
These are mostly fairly minor differences. (But notice how there are very few non-rebased merges.)

The two long-lived branches and their relationship

The integration branch (“master”, yellow) and the deployment branch (“production”, blue) are the only branches that continue to get new commits indefinitely.

All new work happens on master;

A one-commit story:

$ git checkout master
$ git pull
hack ... hack ... hack
test ... test ... test
$ git commit
$ git pull --rebase
test ... test ... test
$ git push
work that consists of only one or two commits goes straight in, and work that’s more involved than that lands via the merging of a feature branch, of which more below.

The Jenkins continuous-integration server runs whenever new commits are made to master: it builds the whole codebase for all relevant platforms, runs all the unit tests, runs all the integration tests, and finally runs some system-tests on a test farm of real hardware. The quality bar for pushes to master, is clean runs on all of these test suites; any failures are stop-the-line emergencies. If a build or tests is failing, the very next push must be the fix, or other developers can’t continue pushing (because they can’t know whether their own work passes that test or not). This is usually known as “do not commit on red” – although with Git, it’s actually the “push”, not “commit”, operation that’s the relevant one.

This achieves the goal that “There is a known production branch, so you don’t have to think. If you checkout the equivalent of production, it’s either exactly what’s currently in production or it’s what’s about to be in production.”

Also, “The production branch is known-good. It is never a mistake to push the production branch to production servers, ever.” This eases communication with the Operations team. New work is never done directly onto production: it arrives there due to either merges or cherry-picks from master (possibly via an intermediate release or hot-fix branch).

Feature branches

Feature branches are usually short-lived, and indeed usually exist as named branches only in developers’ local repositories. (With Git, if you merge a branch locally into master and then push the result, the branching structure is pushed to origin and becomes part of permanent history, but the branch name isn’t pushed, and doesn’t appear in the origin repository except perhaps in the commit comment of the merge.)

Feature branches are usually named with the developer’s initials and a brief hint to the branch’s purpose: for instance, pdh-regexp was my branch for implementing a regular-expressions feature.

Starting a feature branch:

$ git checkout master
$ git pull
$ git checkout -b pdh-modbus
hack ... hack ... hack
test ... test ... test
There are two exceptions to the above description, both (hopefully) pretty rare: the first is that, if a feature branch is getting so big or so long-lived that it could do with living on the origin server too purely as a backup strategy, then its developer can push it to origin. Prefixing the name with the initials, though, makes clear that it’s a private branch, in the sense that it is likely to get forcibly rebased by that developer, so caveat emptor.

The second exception is when several developers are working on the same feature. This is also probably relatively rare (Kanban and Agile encourage single-developer, or single-pair, working), but it doesn’t fit the same model, because a branch that gets commits from two different sources, can’t be rebased without messing up the other developers. So in that situation, you’d keep the feature branch on origin, the co-operating developers would pull it using git pull --rebase and push it using git push. Once the feature is reviewed, QA’d, and delivered, the collaborative feature branch can be merged to master. This is the only situation in which a non-rebased branch gets merged to master. (“Gitflow” also suggests the use of developer-to-developer, not developer-to-origin, Git pulls and pushes for managing this case, but that sounds to me like a recipe for confusion, plus it’s hard to do with a rebase workflow.)

Once a feature on a branch is complete (and reviewed, and tested), the feature branch can be merged back to master. This is done by rebasing the feature branch on top of master, then doing a no-fast-forward (--no-ff) merge; the thinking behind that style of merge, and full information and walk-throughs of how to perform one, can be found at Bow-shaped branches: a Git workflow.

Because, in order to do a bow-shaped merge, every feature branch eventually gets rebased on top of master, there shouldn’t be any merges from master out to a feature branch. If the feature branch needs some functionality that only landed on master after the feature branch started, it should be rebased on top of master instead. Indeed, it’s good practice to rebase all your feature branches on top of master fairly regularly, as it eases and subdivides the final rebasing process that happens before the delivery merge.

Notice that with the bow-shaped merge construction, although there can be several current unmerged feature branches at any time – mostly in developers’ local repositories – the merging process serialises them completely (by always rebasing before pushing), so that Git permanent history never contains overlapping or nested ones. This makes it easier to find problems using git bisect.

Two different release patterns

Electric Imp has a single repository from which all parts of the system are built: this eases system testing, and the addition of system-wide features, but it does mean that two different types or cadences of “release” happen from the same codebase.

Server releases are deployed to our cloud service. As is best-practice in the server software culture, this is (close to) continuous deployment. Releases are made really quite often, sometimes several times per day – so often, in fact, that it’s pointless even to tag or number them (we’d be in the hundreds). This is achievable because it’s relatively easy for automated testing to cover the entire gamut of server functionality, because upgrades themselves and reverts or hot-fixes are so straightforward as to be virtually push-button, and because (assuming the revert script works as-tested) the impact of a “bad” release is relatively minor. The pace of server releases demands a lightweight release process.

None of those considerations apply to client firmware releases: covering the gamut of firmware functionality can require custom hardware, upgrades get downloaded over the Internet and programmed into flash memory (which is a bit disruptive and can be time-consuming) – and, in theory at least, a “bad” release could be quite awkward to recover from (requiring careful actions by individual end-users). So firmware releases are performed with considerably more caution: the QA, beta-test, and qualification process for a newly-made release branch typically takes a number of weeks. This is (by our standards at least) a heavyweight release process.

Another important difference is that the end-user can at any time get bored of the device, put it away in a drawer for an arbitrary length of time, then rekindle their interest, retrieve the device, and try to use it. This means that the current server release must work with all previous client releases (at least enough for them to upgrade themselves), a criterion fortunately not present in the reverse direction. This concern makes it worth our while keeping the total number of client releases down (and getting cross when “beta” or “test” releases go out without being tagged).

The heavyweight release process

The heavyweight release process, which we use for firmware releases, is based mainly on an abundance of caution.

Once the required collection of new functionality has landed on master, a new release branch is made. This is named after the first release that’s expected to be made from the branch: every release is numbered, with (for instance) releases 25, 25.1, and 25.2 all coming from the release-25-dev branch.

Once the branch is made, it is subjected to the unblinking eye of QA – even a culture of good unit-tests, integration tests, and system tests does not rule out the need for exploratory testing before release.

For major new functionality there may even be a closed beta process, where end-users hand-picked for both eagerness and cluefulness get given tagged beta releases from the branch to supplement our internal testing.

Once a release branch is made, the only subsequent changes are bug fixes. If and when issues are found on a release branch, we adopt GCC’s rule that fixes must (wherever possible) be made on master first and then cherry-picked out to the release branch. This is what ensures that the fix will also end up in subsequent releases: unlike in “Gitflow”, the release branch is not merged back to master.

And if (horrors!) an issue should crop up in a the release after is tagged and rolled out, it again gets fixed on master first and cherry-picked out to the release branch. A point release gets tagged and rolled out: release-27.1, say.

Only if master has moved on so much in the meantime, that the fix for master doesn’t apply on the branch, would fixing take place directly on the release branch.

The lightweight release process

The lightweight release process, which we use for server releases, is based on responding with alacrity to new requirements or to current events – for instance, unexpected load on the servers might require new logging or instrumentation to be added basically immediately.

Releases are made so often that they don’t even get names (and nobody would remember or use them if they did). So to indicate the current state of the production servers, a deployment branch is used. (This is the same as the “blue branch” of “Gitflow”, except that we call it production rather than master.) It’s also the case that, because when we upgrade the server everyone gets it straightaway, previous versions are dead and gone: they don’t hang around in the way that previous firmware releases do. To a much larger extent than with firmware, at any given time only the most recent release matters at all.

As for updating production: if major replumbing or massive new functionality has landed in the server code, it might sometimes be useful to use the heavyweight process – except, with the success event being merging out to production rather than tagging and releasing. More often, though, the necessary alacrity is achieved by a reduced process: picking a suitable version of master, testing it (perhaps by deploying it to staging servers), applying fixes directly to master where necessary, and then simply merging out to production and pushing.

Hot-fixes, small patches to the production code done for emergency situations, can be written on master, cherry-picked locally into production, passed by code-reviewers and/or QA, and then pushed to origin/production. (In “Gitflow”, hot-fixes are landed via a short-lived hot-fix branch. That would be useful where a hot-fix itself consists of a series of commits, not just one – but that seems like it would rarely actually happen.)

Scaling out to enormous development organisations

All of the above assumes that the development organisation is small enough to operate as a single team. Above a certain size, this starts to become awkward: even the rare bad commits on master start to happen too often, and the (lock-free but not wait-free) bow-shaped merge process starts to become a bottle-neck.

In this situation, all you can do is introduce more process (and hope that the increase in developer numbers offsets the decrease in per-developer productivity – an outcome far from guaranteed). What you end up doing is dividing into teams and running the heavyweight release process – but, instead of releasing directly, releasing to an internal “meta-integration” branch where the “best available” versions of each team’s work are combined, to then face further automated and manual testing before actual release.

Really enormous organisations would end up with meta-meta-integration branches, or worse. Releases become great tides that ripple through the organisation, to be taken at the flood or omitted as necessary: the magic phrase to Google for to read more about Agile-in-the-large seems to be “release train”...

Saturday 24 August 2013

Bow-shaped branches: using vendor branches in Git

LATER EDIT: the method described here turned out not to scale very well for large vendored libraries with lots of changes. If your Broadcom (or similar) merges are killing you, you should probably be using a rebasing vendor flow instead.

Following on from the one about bow-shaped branches, one of the remaining ways that Git history can get untidy, despite following those precepts, is when you’ve got a long-running “vendor branch” for some third-party code. The usual pattern is, to have a branch containing successive deliveries of the third-party code; when a new delivery needs to be integrated, it’s checked-in unaltered on that branch, and then the branch is merged back to master. This means that the branch effectively has a diff that corresponds only to the vendor’s changes in the new release (relative to the previous release), so that Git’s merge infrastructure helps you integrate those changes with any changes you’ve made yourself to the third-party code.

Which is all to the good, but presents a problem for those striving for tidy Git history. Especially in the case of widespread changes or ugly merges, integrating the new delivery could end up being as much work as a new feature in its own right – which means it really ought to take place on a feature branch. But if you try and do that feature branch as a bow-shaped branch, starting off with a commit that’s the merge from the vendor branch, then the bow-shaped workflow (and indeed the simple rebase workflow) won’t work out of the box: every time you attempt to rebase the feature branch, Git tries to rebase the entire vendor branch, which is hopelessly not what you want.

The obscure Git feature which in this case turns out to be exactly what’s wanted, as if designed for it, is git rerere. Git can be instructed to remember merge-conflict resolutions, and replay them automatically if the precise same conflict is encountered again. This feature isn’t enabled by default, no doubt because it’d confuse people about how their conflicts somehow disappeared, but it’s just what’s needed for rebasing a feature branch that includes a merge from a vendor branch.

Let’s start by having a look at the desired result. On the right is a screenshot from gitk, which shows the newest commits at the top. It shows that the long-lived vendor branch “broadcom” has been merged following a new delivery of the third-party code, and that the work of integrating the new release itself happened on a bow-shaped branch. No superfluous merge commits are visible.

Here’s how to make history that looks like that. First of all I need to turn on the rerere feature:

git config --global rerere.enabled true
Then import the new delivery. This part is much like the process of using vendor branches in any other source-control system:
  1. checkout the vendor branch
  2. remove the old code entirely
  3. untar/unzip the new delivery
  4. rename if necessary (usually, at least, removing version numbers from directory names – the delivery depicted here untarred into a directory called WICED-SDK-2.3.1, but we wanted it called thirdparty/broadcom)
  5. but don’t touch the source in any other way
  6. add the changes, additions, and removals to Git (git add -A)
  7. commit everything to the vendor branch with a commit message along the lines of “import Broadcom Wiced 2.3.1”
But then it’s time to merge those changes back to master. The steps are as follows:
  1. create a merge branch, here called “pdh-broadcom” (this is the branch that will eventually look bow-shaped)
    git checkout master
    git checkout -b pdh-broadcom
  2. merge the vendor branch
    git merge broadcom
    1. if there were merge conflicts, fix them up; don’t do any other edits at this stage
    2. once the merge conflicts are fixed, commit (to the merge branch)
      git commit
  3. compile and test the result
  4. commit any compile or test fixes to the merge branch
  5. test some more, hold code reviews, commit more fixes, etc., until you’re satisfied that it’s ready to merge to master

At this stage, history looks like the picture on the right: a bit like the “ready to merge” stage of a normal bow-shaped branch, except of course with the long tail of the vendor branch heading off into the distant past.

So let’s get hold of the current state of master:

git checkout master
git pull --rebase

Aha, see on the right, someone’s beaten us to the punch again. We need to rebase our work on top of theirs. But here’s the point at which a bit of caution needs to be applied. Blithely using git rebase as if the vendor branch weren’t there, will instruct git to rebase the entire vendor branch on top of master, which will typically be somewhere between undesirable and complete carnage.

There’s a -p (“preserve branching”) option to git rebase option which will stop it doing that, but there is still the issue that re-doing our merge branch, will involve re-doing the merge itself – which will involve re-resolving all the conflicts that we got in stage 2 above.

And that is where git rerere comes in. Because we enabled rerere above, each time we committed the fix to a merge conflict, git remembered the conflict and our resolution of the conflict. And when we re-encounter the same conflict again, as a result of rebasing, git re-resolves it for us:

git checkout pdh-broadcom
git rebase -p master
Resolved 'thirdparty/broadcom/Wiced/WWD/internal/wwd_wifi.c' using previous resolution.
Automatic merge failed; fix conflicts and then commit the result.
Error redoing merge 7a485d1d

In fact, Git is being unduly alarmist here. The merge did technically fail, but rerere has in fact already solved all the issues automatically. So we just need to convince Git of that, noting which files or directories it thought were a problem:

git add thirdparty/broadcom/Wiced/WWD/internal
git commit

So we’ve now got a fully rebased merge branch, ready to merge back to master – as at right. The merge itself is now just like a normal bow-shaped branch:

git checkout master
git merge --no-ff pdh-broadcom

So at last (as at left) we’ve got where we need to be: it’s a merge from a vendor branch, but it looks like a feature branch.

The more I use Git, the more I feel the truth of these two rules of thumb about it: there is always a way to do X, and there is always a use for command Y. Blogging about Git is really an exercise in providing the Y corresponding to a certain X, which is not always obvious.

Sunday 2 June 2013

Bow-shaped branches: a Git workflow

Distributed version control systems, and Git in particular, have for all practical purposes wiped out CVS, Subversion and similar systems. But Git is new enough that there’s still discussion to be had on the best ways of using it: the best Git “workflows”. One goal of a good workflow is that Git history – the graph of all previous revisions – remains readable and comprehensible. Or, put conversely: one symptom of a bad workflow is a convoluted history that looks like a Tube map, or worse.

One way in which git history can end up looking a tangled mess, is if the history graph includes lots of merges caused by doing git pull when upstream has also been updated. Those merges don’t really add that much information – only that two things happened simultaneously, which is a perfectly normal occurrence in teams of more than one person. So most development teams, at least once they’ve been using Git for a while, decide that merges caused by git pull cause unnecessary tangles, and git pull --rebase gets mandated instead.

But what if development occurs on “feature branches” which are always getting merged? This case, too, can be dealt with using rebasing: in this case, rebasing the branch before merging it. But a pure rebase workflow serialises everything into a straight line, losing the information that these commits, in some sense, belong together.

So what’s really needed is a hybrid of the rebasing and merging styles: something that keeps the feature commits together, but also keeps history looking neat (and, in case of disaster, allows easy reverting of the whole feature). What’s needed is bow-shaped feature branches.

Let’s look at the desired end-result first. On the right is a snapshot from gitk --all, showing a feature branch that I’ve just merged, ready to push. As always with gitk, the newest commits are shown at the top. Two different routes join the same two endpoints, spanning the addition of my new feature: one direct, and another taking many small steps, so that the overall effect is a bow shape. This means that, though each individual commit is readily accessible, so too (by comparing the top and bottom of the bow) is the overall effect of the whole feature. Notice how the commits are all bunched together, but with a limited, local amount of branching: a run of N such branches consecutively would still, in some sense, have O(1) complexity, not O(N). (At least, it would have O(1) concurrently-active development branches, which is believably a useful measure of the complexity, and the comprehensibility burden, of history.)

Someone looking back through history who’s only interested in the large-scale changes, can thus move back via the “express train” lines and not the “stopping train” (or “local train” in US parlance) that goes via every commit. But they still wouldn’t have to follow many different branch lines, as they would in a pure merge workflow, because each bow-shape happens on top of the last, not in parallel with it.

So how to build history graphs that look like that? On the left is gitk’s view of my local repository after doing some work on a feature: in this case a feature made up of seven commits. (Commit early, commit often; but stay bisectable.) If I just did a git pull followed by git push, I’d get a merge with commits on both sides; if I did the same but pulling with --rebase, I’d get a straight-line history with my commits at the top.

In order to get the bow shape, though, I’m going to need to retrospectively turn my commits into a branch. Fortunately, that’s straightforward in Git. First, I create a new branch, named after my initials and my feature (the branch itself won’t be pushed upstream, but its name appears in the merge message, so it ought to be fairly descriptive):

git checkout -b pdh-regexp

At this point my repository looks like the picture on the right.

Then I need to wind back my local master branch to match origin/master:

git checkout master
git reset --hard origin/master

Now my repository looks like the picture on the left: a feature-branch ready to merge. (If at this point you can't see your feature branch, make sure you used the --all option to gitk.) This is the point at which to do any smartening-up of the branch, using interactive rebase or similar, that is needed. It’s also the point at which to do a code review with one of your colleagues – a practice that’s strongly recommended, for various reasons.

[Edited 2014-Apr-05→] Now if by this stage master has moved on – in other words, if it’s not still pointing to the base of my feature-branch as shown in the pictures – then I’ll first need to rebase my branch on top of master, so that it looks like the gitk pictures:

git checkout master
git pull --rebase
git checkout pdh-regexp
git rebase master
git checkout master

And now I’m once more at the ready to merge stage, as seen in gitk.

At this point a normal git merge would do a fast-forward merge, returning me to the picture above on the right – so what’s needed instead, is an explicitly no-fast-forwards merge (which can never produce conflicts, because nothing else has happened to the top of master). First I’d better just check that I’m ready to merge:

git checkout master
git branch --contains HEAD

which will print a list of branches, including at least master itself, that contain the commit at the top of master. If my branch isn’t in the list, then I wasn’t, in fact, ready to merge – which means that I need to rebase on top of master again. If, though, my branch is in the list, then it’s time to do the bow-shaped merge itself:

git merge --no-ff pdh-regexp

And now I’m where I need to be, like the picture above right. [←end edits]

So it’s time to push:

git push
To peter@git.electricimp.com
! [rejected] master -> master (non-fast-forward)
error: failed to push some refs to 'peter@git.electricimp.com/ei.git'
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes (e.g. 'git pull') before pushing again. See the
'Note about fast-forwards' section of 'git push --help' for details.

Ahh, someone else has pushed a different feature while I was doing all that. So I need to go round the loop again.

git reset --hard origin/master

This has the effect of undoing the (trivial) merge commit, and the repository ends up like the picture on the left.

Now I can fetch the new commits to the top of master:

git pull --rebase
git checkout pdh-regexp
git rebase master
git checkout master

Of course, git rebase master might produce conflicts, which should be cleared up in the usual way. (And if the conflicts are too hard to clear up, maybe a real merge is called-for, so that history records that the project had a hard time here.) But if all goes smoothly, I end up with the picture on the right: a feature branch ready to merge, as above – the only difference being that the branch is now on top of a newer commit as the head of origin/master. So I can carry on from that point.

Of course, sometimes a feature is too small to merit all of this ... mechanism. If something goes in in just one or two commits, then it’s probably not complex enough for its presence in history to be controversial: for those, just use the pure rebase workflow. But once a feature has grown to three or four commits, it’s worthwhile making it as clear as possible, to those who follow us, what’s gone on. As for an upper limit: I’ve seen bow shapes with 20–30 commits which still looked fairly sane. Any feature which takes more commits than that, could probably do with being subdivided anyway: not least because the rebasing is probably becoming a major pain at that scale. Again, perhaps a real merge commit is called-for, and again this helps signal to future historians that here is a point at which perhaps the project was having a bit of a hard time.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.