Distributed version control systems, and Git in particular, have for all practical purposes wiped out CVS, Subversion and similar systems. But Git is new enough that there’s still discussion to be had on the best ways of using it: the best Git “workflows”. One goal of a good workflow is that Git history – the graph of all previous revisions – remains readable and comprehensible. Or, put conversely: one symptom of a bad workflow is a convoluted history that looks like a Tube map, or worse.
One way in which git history can end up looking a tangled mess, is if the history graph includes lots of merges caused by doing git pull when upstream has also been updated. Those merges don’t really add that much information – only that two things happened simultaneously, which is a perfectly normal occurrence in teams of more than one person. So most development teams, at least once they’ve been using Git for a while, decide that merges caused by git pull cause unnecessary tangles, and git pull --rebase gets mandated instead.
But what if development occurs on “feature branches” which are always getting merged? This case, too, can be dealt with using rebasing: in this case, rebasing the branch before merging it. But a pure rebase workflow serialises everything into a straight line, losing the information that these commits, in some sense, belong together.
So what’s really needed is a hybrid of the rebasing and merging styles: something that keeps the feature commits together, but also keeps history looking neat (and, in case of disaster, allows easy reverting of the whole feature). What’s needed is bow-shaped feature branches.
Let’s look at the desired end-result first. On the right is a snapshot from gitk --all, showing a feature branch that I’ve just merged, ready to push. As always with gitk, the newest commits are shown at the top. Two different routes join the same two endpoints, spanning the addition of my new feature: one direct, and another taking many small steps, so that the overall effect is a bow shape. This means that, though each individual commit is readily accessible, so too (by comparing the top and bottom of the bow) is the overall effect of the whole feature. Notice how the commits are all bunched together, but with a limited, local amount of branching: a run of N such branches consecutively would still, in some sense, have O(1) complexity, not O(N). (At least, it would have O(1) concurrently-active development branches, which is believably a useful measure of the complexity, and the comprehensibility burden, of history.)
Someone looking back through history who’s only interested in the large-scale changes, can thus move back via the “express train” lines and not the “stopping train” (or “local train” in US parlance) that goes via every commit. But they still wouldn’t have to follow many different branch lines, as they would in a pure merge workflow, because each bow-shape happens on top of the last, not in parallel with it.
So how to build history graphs that look like that? On the left is gitk’s view of my local repository after doing some work on a feature: in this case a feature made up of seven commits. (Commit early, commit often; but stay bisectable.) If I just did a git pull followed by git push, I’d get a merge with commits on both sides; if I did the same but pulling with --rebase, I’d get a straight-line history with my commits at the top.
In order to get the bow shape, though, I’m going to need to retrospectively turn my commits into a branch. Fortunately, that’s straightforward in Git. First, I create a new branch, named after my initials and my feature (the branch itself won’t be pushed upstream, but its name appears in the merge message, so it ought to be fairly descriptive):
git checkout -b pdh-regexp
At this point my repository looks like the picture on the right.
Then I need to wind back my local master branch to match origin/master:
git checkout master
git reset --hard origin/master
Now my repository looks like the picture on the left: a feature-branch ready to merge. (If at this point you can't see your feature branch, make sure you used the --all option to gitk.) This is the point at which to do any smartening-up of the branch, using interactive rebase or similar, that is needed. It’s also the point at which to do a code review with one of your colleagues – a practice that’s strongly recommended, for various reasons.
[Edited 2014-Apr-05→] Now if by this stage master has moved on – in other words, if it’s not still pointing to the base of my feature-branch as shown in the pictures – then I’ll first need to rebase my branch on top of master, so that it looks like the gitk pictures:
git checkout master
git pull --rebase
git checkout pdh-regexp
git rebase master
git checkout master
And now I’m once more at the ready to merge stage, as seen in gitk.
At this point a normal git merge would do a fast-forward merge, returning me to the picture above on the right – so what’s needed instead, is an explicitly no-fast-forwards merge (which can never produce conflicts, because nothing else has happened to the top of master). First I’d better just check that I’m ready to merge:
git checkout master
git branch --contains HEAD
which will print a list of branches, including at least master itself, that contain the commit at the top of master. If my branch isn’t in the list, then I wasn’t, in fact, ready to merge – which means that I need to rebase on top of master again. If, though, my branch is in the list, then it’s time to do the bow-shaped merge itself:
git merge --no-ff pdh-regexp
And now I’m where I need to be, like the picture above right. [←end edits]
So it’s time to push:
git push
To peter@git.electricimp.com
! [rejected] master -> master (non-fast-forward)
error: failed to push some refs to 'peter@git.electricimp.com/ei.git'
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes (e.g. 'git pull') before pushing again. See the
'Note about fast-forwards' section of 'git push --help' for details.
Ahh, someone else has pushed a different feature while I was doing all that. So I need to go round the loop again.
git reset --hard origin/master
This has the effect of undoing the (trivial) merge commit, and the repository ends up like the picture on the left.
Now I can fetch the new commits to the top of master:
git pull --rebase
git checkout pdh-regexp
git rebase master
git checkout master
Of course, git rebase master might produce conflicts, which should be cleared up in the usual way. (And if the conflicts are too hard to clear up, maybe a real merge is called-for, so that history records that the project had a hard time here.) But if all goes smoothly, I end up with the picture on the right: a feature branch ready to merge, as above – the only difference being that the branch is now on top of a newer commit as the head of origin/master. So I can carry on from that point.
Of course, sometimes a feature is too small to merit all of this ... mechanism. If something goes in in just one or two commits, then it’s probably not complex enough for its presence in history to be controversial: for those, just use the pure rebase workflow. But once a feature has grown to three or four commits, it’s worthwhile making it as clear as possible, to those who follow us, what’s gone on. As for an upper limit: I’ve seen bow shapes with 20–30 commits which still looked fairly sane. Any feature which takes more commits than that, could probably do with being subdivided anyway: not least because the rebasing is probably becoming a major pain at that scale. Again, perhaps a real merge commit is called-for, and again this helps signal to future historians that here is a point at which perhaps the project was having a bit of a hard time.
It would appear that XKCD also advocates bow-shaped branches: http://xkcd.com/1296/ Your commit messages appear to be of higher quality though.
ReplyDelete