Not in fact any relation to the famous large Greek meal of the same name.

Tuesday, 4 December 2018

A rebasing vendor-branch workflow for Git

An earlier post laid out a neat way of managing third-party code, in a subdirectory of a larger Git repository, as a vendor branch. Here at Electric Imp we followed that scheme for the subsequent five years, and bitter experience over that time has now convinced us that it doesn’t work well in all circumstances, and a better plan is needed.

The issue is that essentially all local changes to the repository accrete over time into one massive merge commit. If we’re talking about a sizeable amount of code, and if we’ve made lots of changes to it over time, and if upstream have also made lots of changes to it over time, then re-merging that single merge commit soon becomes horrendous. Basically, the original bow-shaped vendor branch workflow does not scale.

So what to do instead? We need to deconstruct that overwhelming single merge commit; we need, ideally, to move from a merging workflow to a rebasing workflow.

The broad outline of the situation is as follows: some time ago we merged an upstream version of some vendor code, but we’ve been hacking on it since. Meanwhile, upstream has produced a newer version whose updates we’d like to take – while keeping our changes, where relevant.

To achieve this, we’re first going to create a new branch, re-import a stock copy of the current upstream version, and then re-apply our changes, one by one, so that the tip of that branch exactly matches the tip of master.

So far, this sounds like a lot of work to achieve literally nothing! But what we now have, is a branch containing just our changes, in a sequence of individual commits. In other words, it’s exactly what’s needed in order to rebase those commits on top of a newer upstream release. So, starting from our re-import, we create a second new branch, then import a stock copy of the new upstream (so that there’s a single commit containing all upstream’s changes between the two), and then rebase our re-apply branch on top of that.

The upstream source in the example below is the same WICED SDK as used in the previous blog post; note that although the whole WICED project was sold by Broadcom to new owners Cypress in the meantime, many references to “Broadcom” still remain in our code, notably in the directory names. This ticks all three boxes for needing the heavyweight integration process: it’s large – 7,000 source files – we’ve made lots of changes, and upstream have also made lots of changes.

Here’s exactly what to do. First we need to know which commits, since the last time we took an upstream version, changed the vendor directory – here, thirdpaty/broadcom. Fortunately, git log is able to tell us exactly that. We need to go back in history (perhaps using gitk’s “find commits touching paths” option) and find the commit where we originally took our current upstream version. In the repo I’m using, that’s ac64f159.

The following command logs all commits that have changed the thirdparty/broadcom directory since then:

git log --format=%H --reverse --ancestry-path ac64f159..master -- thirdparty/broadcom
For me, that lists 198 commits! Because of the --reverse, they’re listed in forwards chronological order: in other words, in the order in which we need to re-apply them on our re-merge branch. Let’s put that list of commits in a file, commits.txt:

git log --format=%H --reverse --ancestry-path ac64f159..master -- thirdparty/broadcom > commits.txt

Now we start a new branch:

git checkout -b pdh-wiced-remerge
...and re-import the current upstream version, here WICED-3.5.2:
rm -rf thirdparty/broadcom
cd thirdparty
7zr x WICED-SDK-3.5.2.7z
cd ..
mv thirdparty/WICED-SDK-3.5.2 thirdparty/broadcom
(clean up the source as necessary, fix CR/LFs etc.)
git add -A thirdparty/broadcom
git commit -m"Import stock WICED 3.5.2"
git tag pdh-wiced-3.5.2-stock

Notice that we tag this commit, as we’ll be referring to it again later.

Now we’d like to replay our long list of local commits on top of that release – to get back, as it were, to where we started. The thing to note here is that we can do this in a completely automated way – there should be no chance of merge conflicts, as we’re re-applying the same commits on top of the same upstream drop. It’s so very automated that we can do each one using a script, which I called apply-just-broadcom:

#! /bin/bash
git checkout $1 -- thirdparty/broadcom
git add -u thirdparty/broadcom
git commit -C $1
This says, first checkout (or, really, apply) the commit named in the script’s first argument – but only where it affects the thirdparty/broadcom directory. Any other parts of the commit aren’t applied. This automatically adds any new files, but it doesn’t delete anything deleted by the commit we’re applying – so we delete those using git add -u. Finally we want to commit those changes to our new branch, but using the commit message they originally had – for which, git commit’s -C option is exactly what we want.

Armed with that script, we can then apply, one-by-one, all the commits we identified before:

git checkout pdh-wiced-remerge
for i in `cat commits.txt` ; do scripts/apply-just-broadcom $i ; done
This will churn through the commits, one by one, re-applying them to the pdh-wiced-remerge branch. Because they’re all getting applied in the same context in which they were originally committed, they should all go in one-after-the-other with no conflicts or warnings. (If they don’t, perhaps your re-import of the old upstream didn’t quite match the original import, so fix that and start again.)

And now we should be, quite circuitously, back where we started, with the tip of the pdh-wiced-remerge branch matching master exactly:

git diff pdh-wiced-remerge master
...which should show no differences at all. What you’d see in gitk is something like the image to the right, showing master with your branch coming off it. And the branch contains the (re-)import commit, then all the work that’s been done since. Scrolled way, way off the top of that screenshot, about 150 commits further up, is the tip of the pdh-wiced-remerge branch.

Optionally but usefully, you can now tidy up the branch to make it easier to apply to the new release. For instance, if the branch contains patches that were later reverted, you can use interactive rebase to remove both the patch and the revert, for the same result but avoiding any chance of the patch later causing conflicts. Doing this should still leave no diff between your branch and master.

Even more optionally, but still usefully, I needed at this stage to rewrite a bunch of the commit messages. Those numbers in square-brackets in the commit summaries, are instructions to our git server to link the commits with the related user-story (or bug) in Pivotal Tracker. (The reason they don’t all have one, is that some of the original commits were themselves on bow-shaped branches, and only the merge commit carried the Pivotal link.) I wanted to remove those links, so that pushing my branch didn’t spam every Pivotal story associated with a thirdparty/broadcom commit, some of which were years old by this stage. But there are tons of them, so I wanted to rewrite the messages programmatically. It’s definitely out-of-scope for this blog post, but I ended up using the powerful and not-to-be-trifled-with git filter-branch command, in conjunction with a tiny sed script:

git filter-branch --msg-filter "sed -e 's/\[\#\([0-9]*\)\]/(\1)/g'" master..pdh-wiced-remerge
This monstrosity rewrites every commit message between master and pdh-wiced-remerge, to replace square-bracket Pivotal links with curved-bracket “links” that don’t trigger the automatic Pivotal integration.

Anyway, that aside, we’re now in a position to merge the new WICED release, upstream version 3.6.3. So let’s tag where we are now, so we can refer to it later:

git checkout pdh-wiced-remerge
git tag pdh-wiced-3.5.2-merged

We want to branch our WICED-3.6.3 branch from the stock 3.5.2 re-import, so that there’s a diff that just shows upstream’s changes between 3.5.2 and 3.6.3. So that’s:

git checkout pdh-wiced-3.5.2-stock
git checkout -b pdh-wiced-3.6.3
And now we can do the same dance as before, removing the old thirdparty/broadcom and replacing it with the new one:
rm -rf thirdparty/broadcom
cd thirdparty
7zr x WICED-SDK-3.6.3.7z
cd ..
mv thirdparty/WICED-SDK-3.6.3 thirdparty/broadcom
(clean up the source as necessary, fix CR/LFs etc.)
git add -A thirdparty/broadcom
git commit -m"Import stock WICED 3.6.3"
git tag pdh-wiced-3.6.3-stock

Again, because this is a complete replacement, there’s no chance of merge conflicts.

Now just the rebasing operation itself remains. Because we tagged all the important points, we can use those tag names in the rebase command:

git rebase --onto pdh-wiced-3.6.3-stock pdh-wiced-3.5.2.stock pdh-wiced-remerge

This will be a lengthy rebasing operation, and merge conflicts are likely to occur. These can be fixed “in the usual way” – this isn’t a process usable be people who aren’t already experienced in fixing rebase conflicts, so I won’t say much more about that fixing here. But note that the rebase operation always tells you which commit it got stuck on – so you can go and look at the original version of that commit to check what its intention was, and also at the pdh-wiced-3.6.3-stock commit, containing exactly upstream’s changes between 3.5.2 and 3.6.3, in the hope that you can glean some insight into what upstream’s intention was.

If you’ve previously been using our original vendor-branch scheme, the first of those commits will be the worst trouble to integrate, as it will be the single big-bang merge commit from the previous integration. But at least you can console yourself now that future integrations will be no harder than this – as opposed to the previous scheme where they just kept getting harder.

Once you’ve completed the rebase, the repository will look basically as it is on the right. Notice that the rebase operation has “moved” the branch pdh-wiced-remerge so that it’s now on top of the 3.6.3 merge; it’s not still where the pdh-wiced-3.5.2-merged tag points to. (Rebasing always moves the branch that you’re rebasing; it never moves any tags.)

Now you get to build and test the pdh-wiced-remerge branch; it’s likely that it currently doesn’t even compile, and here is where you make any major changes needed to deal with the new drop. (Minor, textual changes may have already been fixed during the rebase process.) Add any new commits as necessary, on the pdh-wiced-remerge branch, until everything builds, runs, and passes all its tests. This may or may not be particularly arduous, depending on the nature of the changes made upstream. (But either way it’s still less arduous than the same thing would have been with the merging workflow.)

Now all that remains is to merge the pdh-wiced-remerge branch back to master as a normal bow-shaped branch:

git checkout master
git merge --no-ff pdh-wiced-remerge

And you should end up with a repository that looks like the one at left: a bow-shaped branch containing, in order, the re-imported 3.5.2; the newly-imported 3.6.3; the rebased versions of all your previous fixes; and any new fixes occasioned by 3.6.3 itself.

Notice that the replayed fixes leading up to the tag pdh-wiced-3.5.2-merged don’t appear in the branch as finally merged, but that the stock WICED 3.5.2 commit does. This is probably what you want: it’s important to have a commit that denotes only the changes made upstream between releases, but it’s not very important to record the replayed fixes – after all, those commits were generated by a script to start with, so the information in them can’t be that significant.

So now 3.6.3 is merged. And when the next upstream release comes along, you get to repeat the whole process.

But what if several upstream releases have happened since you last took one? There are two options: either you do the above process once, taking the latest release – or, you do it repeatedly, once for each intermediate release. The latter is the best option if widespread changes have occurred, as it lets you fix each conflict separately, rather than all in one go. And in fact if you know you’ve got several upstream releases to merge, you don’t have to bother merging back to master each time (especially if you too have 198 commits on your remerge branch): you can keep branching and rebasing and then merge the final result to master in one go, as seen on the right. The branch, as finally merged, would contain, in order:

  • stock 3.5.2
  • stock 3.6.3
  • stock 3.7.0
  • ... further upstream releases ... (red)
  • fixes from 3.5.2 and earlier (pink)
  • fixes from 3.6.3 (pale blue)
  • fixes from 3.7.0 (mid-blue)
  • ... further sets of fixes ... (green)

All of which means that the tip of that branch would be the latest upstream version, with all relevant fixes applied. Which is what’s needed to be merged.

Friday, 14 September 2018

Enable and disable methods are an anti-pattern

Here's a C++ API for an abstracted I2C master peripheral, such as might be found on a microcontroller – an API which I invented:

class I2C
{
public:
    virtual ~I2C() {}

    virtual void setClockSpeed(ClockSpeed speed) = 0;
    virtual void enable() = 0;
    virtual void disable() = 0;
    virtual ssize_t read(uint8_t target, const uint8_t *subAddress,
                         size_t subAddressBytes, uint8_t *buffer,
                         size_t bufferBytes) = 0;
    virtual ssize_t write(uint8_t target, const uint8_t *message,
                          size_t messageLength) = 0;
};

The details of how an I2C bus actually works aren't so relevant to the current issue, so I've removed the comments. What's interesting here is what the calling code needs to look like: the caller has to set the desired clock-speed, enable the peripheral, read and/or write using it, and then disable it again.

This code has been happily chugging away inside Electric Imp devices of various types for about seven years now. Which is why it comes as such a surprise to me that I now think it's terrible.

The realisation came after spending some time learning the Rust language, whereby I've been trained into caring a lot about object lifetimes – because the compiler actually rejects code containing use-after-free errors and whole swathes of further bugs of that sort. I've seen bugs we diagnosed via in telemetry reports from crashing units in production, which would have been caught as compile-time errors in Rust. So Rust is great, once you've got your head round it; but in fact it's so great that I now find myself writing better C++ because of the experience of writing some Rust.

From an object-lifetime point-of-view, the API listed above is error-prone and so undesirable. It's too easy for client code to get things wrong: calling read() before you've called enable(), calling read() after you've called disable(), or calling setClockSpeed() – er, wait, when should you call setClockSpeed()? Is it only allowed before enable? Or is it only allowed afterwards? Or can it be in either position? What guarantees on that point are offered by implementations of this interface?

An act of API design should be an act of humility; you should, to a certain extent, be aiming to make other people's lives easier at the expense of making your own life harder. I failed to do that here. Offering a read() method that just returns ERR_NOT_ENABLED if you haven't called enable() yet, is just leading callers down the garden path. That ordering should be expressed in the API, so that, in the Rust style, it's a compile-time error to get it wrong (or alternatively, in another situation familiar to Rust programmers, so that the error simply isn't expressible in the first place).

And how can that be arranged? By not giving the caller access to the read() method until they've called the enable() method. What if we remove the read() and write() methods from class I2C altogether, and put them in a "handle" class which callers can't get hold of except by enabling the peripheral? This ends up looking like the API below:

class EnabledI2C
{
public:
    virtual ~EnabledI2C() {}

    virtual ssize_t read(uint8_t target, const uint8_t *subAddress,
                         size_t subAddressBytes, uint8_t *buffer,
                         size_t bufferBytes) = 0;
    virtual ssize_t write(uint8_t target, const uint8_t *message,
                          size_t messageLength) = 0;
};

class I2C
{
public:
    virtual ~I2C() {}

    virtual std::unique_ptr<EnabledI2C> open(ClockSpeed cs);
};

The disable() method has disappeared altogether; when the calling code has finished with its I2C controller, it can just reset or destroy the unique_ptr, and the destructor of EnabledI2C (or rather, the destructors of its subclasses) can do the work of disable(). Likewise, the work of enable() can be done in the constructor, bringing some RAII to the situation and making the implementations, as well as the calling code, less error-prone.

Without a method called "disable", "enable" looked like the wrong name, so it became "open". And this refactoring has also cleared up the ambiguity around setClockSpeed(); in fact, one implementation of this I2C interface allowed it to be called at any time, even in-between reads, but in another implementation it only took effect at enable() time. In practice, the calling code all expected it to work in the latter way, so we can codify that directly into the new API by incorporating it into the "enable" ("open") call itself, so that it's impossible to call it too early or too late.

And again but in C

C++'s support for RAII definitely adds both neatness and robustness to the code here. But part of the principle – that where API calls must be ordered or nested in order to work properly, that ordering or nesting should be enforced by the code rather than meekly whined-about by its documentation – applies even in a C version of the same API:

struct I2C; // opaque
struct EnabledI2C; // opaque

EnabledI2C *I2C_OpenEnabled(I2C *i2c, unsigned clockSpeed);

ssize_t EnabledI2C_Read(EnabledI2C *ei2c, int8_t target,
                        const uint8_t *subAddress,
                        size_t subAddressBytes, uint8_t *buffer,
                        size_t bufferBytes);
ssize_t EnabledI2C_Write(EnabledI2C *ei2c, uint8_t target,
                         const uint8_t *message, size_t messageLength);
void EnabledI2C_Close(EnabledI2C **pei2c);

Without RAII, the "close" corresponding to the "open" must be explicit, but at least it's still not possible for the client to call things in the wrong order.

And again but in Squirrel

This part is a bit of a cheat; the Electric Imp Squirrel API isn't actually implemented in Squirrel; rather, it's a set of bindings from Squirrel to the C++ code in the improm. But even so the same pattern can apply; here's (the native-Squirrel equivalent of) what it currently looks like:

class I2C {
    function configure(clockSpeed) { ... }
    function read(deviceAddress, registerAddress, numberOfBytes) { ... }
    function write(deviceAddress, registerPlusData) { ... }
    function disable() { ... }
};

This API is used by our customers left, right, and centre, and we can't ever change it. (We could extend it, perhaps.) But if we had our time over again, I'd have done it like this:

class I2C {
    function open(clockSpeed) { ... return EnabledI2C(...); }
};

class EnabledI2C {
    function read(deviceAddress, registerAddress, numberOfBytes) { ... }
    function write(deviceAddress, registerPlusData) { ... }
};

As in C++, the disable() call has disappeared, because it's all done in the destructor of the EnabledI2C object, which is called when the Squirrel reference-counting system determines that the object is no longer being used. The calling code might look like this:

local e = hardware.i2cAB.open(100000);
e.read(...);
e.write(...);
e = null;

Alternatively, it could just rely on e going out of scope at the end of the function, instead of explicitly setting it to null. But the cheating here has intensified: native Squirrel does not include destructors, and so doesn't allow for RAII. However, Squirrel objects that are bindings to other languages – such as the C++ bindings used by the Electric Imp API – can have destructors using Squirrel's sq_setreleasehook mechanism, and so can be used with RAII.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.