Not in fact any relation to the famous large Greek meal of the same name.

Tuesday, 4 December 2018

A rebasing vendor-branch workflow for Git

An earlier post laid out a neat way of managing third-party code, in a subdirectory of a larger Git repository, as a vendor branch. Here at Electric Imp we followed that scheme for the subsequent five years, and bitter experience over that time has now convinced us that it doesn’t work well in all circumstances, and a better plan is needed.

The issue is that essentially all local changes to the repository accrete over time into one massive merge commit. If we’re talking about a sizeable amount of code, and if we’ve made lots of changes to it over time, and if upstream have also made lots of changes to it over time, then re-merging that single merge commit soon becomes horrendous. Basically, the original bow-shaped vendor branch workflow does not scale.

So what to do instead? We need to deconstruct that overwhelming single merge commit; we need, ideally, to move from a merging workflow to a rebasing workflow.

The broad outline of the situation is as follows: some time ago we merged an upstream version of some vendor code, but we’ve been hacking on it since. Meanwhile, upstream has produced a newer version whose updates we’d like to take – while keeping our changes, where relevant.

To achieve this, we’re first going to create a new branch, re-import a stock copy of the current upstream version, and then re-apply our changes, one by one, so that the tip of that branch exactly matches the tip of master.

So far, this sounds like a lot of work to achieve literally nothing! But what we now have, is a branch containing just our changes, in a sequence of individual commits. In other words, it’s exactly what’s needed in order to rebase those commits on top of a newer upstream release. So, starting from our re-import, we create a second new branch, then import a stock copy of the new upstream (so that there’s a single commit containing all upstream’s changes between the two), and then rebase our re-apply branch on top of that.

The upstream source in the example below is the same WICED SDK as used in the previous blog post; note that although the whole WICED project was sold by Broadcom to new owners Cypress in the meantime, many references to “Broadcom” still remain in our code, notably in the directory names. This ticks all three boxes for needing the heavyweight integration process: it’s large – 7,000 source files – we’ve made lots of changes, and upstream have also made lots of changes.

Here’s exactly what to do. First we need to know which commits, since the last time we took an upstream version, changed the vendor directory – here, thirdpaty/broadcom. Fortunately, git log is able to tell us exactly that. We need to go back in history (perhaps using gitk’s “find commits touching paths” option) and find the commit where we originally took our current upstream version. In the repo I’m using, that’s ac64f159.

The following command logs all commits that have changed the thirdparty/broadcom directory since then:

git log --format=%H --reverse --ancestry-path ac64f159..master -- thirdparty/broadcom
For me, that lists 198 commits! Because of the --reverse, they’re listed in forwards chronological order: in other words, in the order in which we need to re-apply them on our re-merge branch. Let’s put that list of commits in a file, commits.txt:

git log --format=%H --reverse --ancestry-path ac64f159..master -- thirdparty/broadcom > commits.txt

Now we start a new branch:

git checkout -b pdh-wiced-remerge
...and re-import the current upstream version, here WICED-3.5.2:
rm -rf thirdparty/broadcom
cd thirdparty
7zr x WICED-SDK-3.5.2.7z
cd ..
mv thirdparty/WICED-SDK-3.5.2 thirdparty/broadcom
(clean up the source as necessary, fix CR/LFs etc.)
git add -A thirdparty/broadcom
git commit -m"Import stock WICED 3.5.2"
git tag pdh-wiced-3.5.2-stock

Notice that we tag this commit, as we’ll be referring to it again later.

Now we’d like to replay our long list of local commits on top of that release – to get back, as it were, to where we started. The thing to note here is that we can do this in a completely automated way – there should be no chance of merge conflicts, as we’re re-applying the same commits on top of the same upstream drop. It’s so very automated that we can do each one using a script, which I called apply-just-broadcom:

#! /bin/bash
git checkout $1 -- thirdparty/broadcom
git add -u thirdparty/broadcom
git commit -C $1
This says, first checkout (or, really, apply) the commit named in the script’s first argument – but only where it affects the thirdparty/broadcom directory. Any other parts of the commit aren’t applied. This automatically adds any new files, but it doesn’t delete anything deleted by the commit we’re applying – so we delete those using git add -u. Finally we want to commit those changes to our new branch, but using the commit message they originally had – for which, git commit’s -C option is exactly what we want.

Armed with that script, we can then apply, one-by-one, all the commits we identified before:

git checkout pdh-wiced-remerge
for i in `cat commits.txt` ; do scripts/apply-just-broadcom $i ; done
This will churn through the commits, one by one, re-applying them to the pdh-wiced-remerge branch. Because they’re all getting applied in the same context in which they were originally committed, they should all go in one-after-the-other with no conflicts or warnings. (If they don’t, perhaps your re-import of the old upstream didn’t quite match the original import, so fix that and start again.)

And now we should be, quite circuitously, back where we started, with the tip of the pdh-wiced-remerge branch matching master exactly:

git diff pdh-wiced-remerge master
...which should show no differences at all. What you’d see in gitk is something like the image to the right, showing master with your branch coming off it. And the branch contains the (re-)import commit, then all the work that’s been done since. Scrolled way, way off the top of that screenshot, about 150 commits further up, is the tip of the pdh-wiced-remerge branch.

Optionally but usefully, you can now tidy up the branch to make it easier to apply to the new release. For instance, if the branch contains patches that were later reverted, you can use interactive rebase to remove both the patch and the revert, for the same result but avoiding any chance of the patch later causing conflicts. Doing this should still leave no diff between your branch and master.

Even more optionally, but still usefully, I needed at this stage to rewrite a bunch of the commit messages. Those numbers in square-brackets in the commit summaries, are instructions to our git server to link the commits with the related user-story (or bug) in Pivotal Tracker. (The reason they don’t all have one, is that some of the original commits were themselves on bow-shaped branches, and only the merge commit carried the Pivotal link.) I wanted to remove those links, so that pushing my branch didn’t spam every Pivotal story associated with a thirdparty/broadcom commit, some of which were years old by this stage. But there are tons of them, so I wanted to rewrite the messages programmatically. It’s definitely out-of-scope for this blog post, but I ended up using the powerful and not-to-be-trifled-with git filter-branch command, in conjunction with a tiny sed script:

git filter-branch --msg-filter "sed -e 's/\[\#\([0-9]*\)\]/(\1)/g'" master..pdh-wiced-remerge
This monstrosity rewrites every commit message between master and pdh-wiced-remerge, to replace square-bracket Pivotal links with curved-bracket “links” that don’t trigger the automatic Pivotal integration.

Anyway, that aside, we’re now in a position to merge the new WICED release, upstream version 3.6.3. So let’s tag where we are now, so we can refer to it later:

git checkout pdh-wiced-remerge
git tag pdh-wiced-3.5.2-merged

We want to branch our WICED-3.6.3 branch from the stock 3.5.2 re-import, so that there’s a diff that just shows upstream’s changes between 3.5.2 and 3.6.3. So that’s:

git checkout pdh-wiced-3.5.2-stock
git checkout -b pdh-wiced-3.6.3
And now we can do the same dance as before, removing the old thirdparty/broadcom and replacing it with the new one:
rm -rf thirdparty/broadcom
cd thirdparty
7zr x WICED-SDK-3.6.3.7z
cd ..
mv thirdparty/WICED-SDK-3.6.3 thirdparty/broadcom
(clean up the source as necessary, fix CR/LFs etc.)
git add -A thirdparty/broadcom
git commit -m"Import stock WICED 3.6.3"
git tag pdh-wiced-3.6.3-stock

Again, because this is a complete replacement, there’s no chance of merge conflicts.

Now just the rebasing operation itself remains. Because we tagged all the important points, we can use those tag names in the rebase command:

git rebase --onto pdh-wiced-3.6.3-stock pdh-wiced-3.5.2.stock pdh-wiced-remerge

This will be a lengthy rebasing operation, and merge conflicts are likely to occur. These can be fixed “in the usual way” – this isn’t a process usable be people who aren’t already experienced in fixing rebase conflicts, so I won’t say much more about that fixing here. But note that the rebase operation always tells you which commit it got stuck on – so you can go and look at the original version of that commit to check what its intention was, and also at the pdh-wiced-3.6.3-stock commit, containing exactly upstream’s changes between 3.5.2 and 3.6.3, in the hope that you can glean some insight into what upstream’s intention was.

If you’ve previously been using our original vendor-branch scheme, the first of those commits will be the worst trouble to integrate, as it will be the single big-bang merge commit from the previous integration. But at least you can console yourself now that future integrations will be no harder than this – as opposed to the previous scheme where they just kept getting harder.

Once you’ve completed the rebase, the repository will look basically as it is on the right. Notice that the rebase operation has “moved” the branch pdh-wiced-remerge so that it’s now on top of the 3.6.3 merge; it’s not still where the pdh-wiced-3.5.2-merged tag points to. (Rebasing always moves the branch that you’re rebasing; it never moves any tags.)

Now you get to build and test the pdh-wiced-remerge branch; it’s likely that it currently doesn’t even compile, and here is where you make any major changes needed to deal with the new drop. (Minor, textual changes may have already been fixed during the rebase process.) Add any new commits as necessary, on the pdh-wiced-remerge branch, until everything builds, runs, and passes all its tests. This may or may not be particularly arduous, depending on the nature of the changes made upstream. (But either way it’s still less arduous than the same thing would have been with the merging workflow.)

Now all that remains is to merge the pdh-wiced-remerge branch back to master as a normal bow-shaped branch:

git checkout master
git merge --no-ff pdh-wiced-remerge

And you should end up with a repository that looks like the one at left: a bow-shaped branch containing, in order, the re-imported 3.5.2; the newly-imported 3.6.3; the rebased versions of all your previous fixes; and any new fixes occasioned by 3.6.3 itself.

Notice that the replayed fixes leading up to the tag pdh-wiced-3.5.2-merged don’t appear in the branch as finally merged, but that the stock WICED 3.5.2 commit does. This is probably what you want: it’s important to have a commit that denotes only the changes made upstream between releases, but it’s not very important to record the replayed fixes – after all, those commits were generated by a script to start with, so the information in them can’t be that significant.

So now 3.6.3 is merged. And when the next upstream release comes along, you get to repeat the whole process.

But what if several upstream releases have happened since you last took one? There are two options: either you do the above process once, taking the latest release – or, you do it repeatedly, once for each intermediate release. The latter is the best option if widespread changes have occurred, as it lets you fix each conflict separately, rather than all in one go. And in fact if you know you’ve got several upstream releases to merge, you don’t have to bother merging back to master each time (especially if you too have 198 commits on your remerge branch): you can keep branching and rebasing and then merge the final result to master in one go, as seen on the right. The branch, as finally merged, would contain, in order:

  • stock 3.5.2
  • stock 3.6.3
  • stock 3.7.0
  • ... further upstream releases ... (red)
  • fixes from 3.5.2 and earlier (pink)
  • fixes from 3.6.3 (pale blue)
  • fixes from 3.7.0 (mid-blue)
  • ... further sets of fixes ... (green)

All of which means that the tip of that branch would be the latest upstream version, with all relevant fixes applied. Which is what’s needed to be merged.

No comments:

Post a Comment

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.