LATER EDIT: the method described here turned out not to scale very well for large vendored libraries with lots of changes. If your Broadcom (or similar) merges are killing you, you should probably be using a rebasing vendor flow instead.
Following on from the one about bow-shaped branches, one of the remaining ways that Git history can get untidy, despite following those precepts, is when you’ve got a long-running “vendor branch” for some third-party code. The usual pattern is, to have a branch containing successive deliveries of the third-party code; when a new delivery needs to be integrated, it’s checked-in unaltered on that branch, and then the branch is merged back to master. This means that the branch effectively has a diff that corresponds only to the vendor’s changes in the new release (relative to the previous release), so that Git’s merge infrastructure helps you integrate those changes with any changes you’ve made yourself to the third-party code.
Which is all to the good, but presents a problem for those striving for tidy Git history. Especially in the case of widespread changes or ugly merges, integrating the new delivery could end up being as much work as a new feature in its own right – which means it really ought to take place on a feature branch. But if you try and do that feature branch as a bow-shaped branch, starting off with a commit that’s the merge from the vendor branch, then the bow-shaped workflow (and indeed the simple rebase workflow) won’t work out of the box: every time you attempt to rebase the feature branch, Git tries to rebase the entire vendor branch, which is hopelessly not what you want.
The obscure Git feature which in this case turns out to be exactly what’s wanted, as if designed for it, is git rerere. Git can be instructed to remember merge-conflict resolutions, and replay them automatically if the precise same conflict is encountered again. This feature isn’t enabled by default, no doubt because it’d confuse people about how their conflicts somehow disappeared, but it’s just what’s needed for rebasing a feature branch that includes a merge from a vendor branch.
Let’s start by having a look at the desired result. On the right is a screenshot from gitk, which shows the newest commits at the top. It shows that the long-lived vendor branch “broadcom” has been merged following a new delivery of the third-party code, and that the work of integrating the new release itself happened on a bow-shaped branch. No superfluous merge commits are visible.
Here’s how to make history that looks like that. First of all I need to turn on the rerere feature:
git config --global rerere.enabled trueThen import the new delivery. This part is much like the process of using vendor branches in any other source-control system:
- checkout the vendor branch
- remove the old code entirely
- untar/unzip the new delivery
- rename if necessary (usually, at least, removing version numbers from directory names – the delivery depicted here untarred into a directory called WICED-SDK-2.3.1, but we wanted it called thirdparty/broadcom)
- but don’t touch the source in any other way
- add the changes, additions, and removals to Git (git add -A)
- commit everything to the vendor branch with a commit message along the lines of “import Broadcom Wiced 2.3.1”
- create a merge branch, here called “pdh-broadcom” (this is the branch that will eventually look bow-shaped)
git checkout master
git checkout -b pdh-broadcom - merge the vendor branch
git merge broadcom
- if there were merge conflicts, fix them up; don’t do any other edits at this stage
- once the merge conflicts are fixed, commit (to the merge branch)
git commit
- compile and test the result
- commit any compile or test fixes to the merge branch
- test some more, hold code reviews, commit more fixes, etc., until you’re satisfied that it’s ready to merge to master
At this stage, history looks like the picture on the right: a bit like the “ready to merge” stage of a normal bow-shaped branch, except of course with the long tail of the vendor branch heading off into the distant past.
So let’s get hold of the current state of master:
git checkout master
git pull --rebase
Aha, see on the right, someone’s beaten us to the punch again. We need to rebase our work on top of theirs. But here’s the point at which a bit of caution needs to be applied. Blithely using git rebase as if the vendor branch weren’t there, will instruct git to rebase the entire vendor branch on top of master, which will typically be somewhere between undesirable and complete carnage.
There’s a -p (“preserve branching”) option to git rebase option which will stop it doing that, but there is still the issue that re-doing our merge branch, will involve re-doing the merge itself – which will involve re-resolving all the conflicts that we got in stage 2 above.
And that is where git rerere comes in. Because we enabled rerere above, each time we committed the fix to a merge conflict, git remembered the conflict and our resolution of the conflict. And when we re-encounter the same conflict again, as a result of rebasing, git re-resolves it for us:
git checkout pdh-broadcom
git rebase -p master
Resolved 'thirdparty/broadcom/Wiced/WWD/internal/wwd_wifi.c' using previous resolution.
Automatic merge failed; fix conflicts and then commit the result.
Error redoing merge 7a485d1d
In fact, Git is being unduly alarmist here. The merge did technically fail, but rerere has in fact already solved all the issues automatically. So we just need to convince Git of that, noting which files or directories it thought were a problem:
git add thirdparty/broadcom/Wiced/WWD/internal
git commit
So we’ve now got a fully rebased merge branch, ready to merge back to master – as at right. The merge itself is now just like a normal bow-shaped branch:
git checkout master
git merge --no-ff pdh-broadcom
So at last (as at left) we’ve got where we need to be: it’s a merge from a vendor branch, but it looks like a feature branch.
The more I use Git, the more I feel the truth of these two rules of thumb about it: there is always a way to do X, and there is always a use for command Y. Blogging about Git is really an exercise in providing the Y corresponding to a certain X, which is not always obvious.