Whitebait, Kleftiko, Ekmek Special

Not in fact any relation to the famous large Greek meal of the same name.

Monday, 2 December 2024

Things I learned writing a USB host stack in Rust for RP2040

Previously on #rust:

     
“I have of late – but wherefore I know not –” had it occur to me to get to know USB a bit better by implementing USB host from scratch. At a former employer, where an embedded USB host stack was going to be needed, I remember one grizzled developer saying to me, “Why are we licensing one in? Just read the spec and write one!” The company licensed one in (briefly, before canning the entire project) and, speaking as someone who has, over the past month or so, just read the spec and written one: they were probably right to do so.

But for learning about USB – or, put differently, when it’s the journey that’s important, not the destination – writing a USB host stack turns out to be not a completely ludicrous thing to attempt, especially with simplifying assumptions such as USB1.1 and 2.0 only, and no power-delivery or battery-charging. This blog post is the origin story of the cotton-usb-host crate, and details some of the things I learned when writing it.

Wiring-up async/await to IRQ handlers

I won’t repeat here the good bits of this excellent article which finally cleared up for me how the very high-level concept of awaiting on a Future meets up with the very low-level concept of an IRQ handler (and which you should just read), but I’ll give the gist of it.

A Future in Rust gets polled, and when that happens it can either detect that the thing it’s waiting for has happened – and return Ready – or it can return Pending. The key thing it must do, is that if it does return Pending, it needs to set in motion events that will cause the calling task to be reawoken when the Future’s status has changed. Most Future tutorials deal with networking servers of various sorts, where the task gets reawoken when a file descriptor becomes readable; in the embedded world, things are a bit lower-level, and the impetus to resume probably comes in the form of a hardware interrupt, handled by an IRQ handler routine.

The way that these are tied together, is that Future::poll gets passed a Context object, which contains a Waker object. Waker is Clone, Send, and Sync, so it can be freely passed through the system to where it’s needed: in this case, most likely stored in a static variable so that it can be accessed from the IRQ handler. So this is exactly what the Future::poll method for, say, Rp2040ControlEndpoint does. The RTIC 2 framework provides a wordily-named but vital component here, CriticalSectionWakerRegistration. This object is capable of safely storing a Waker using internal mutability, making it ideal for keeping in a global static. (It’s also completely independent of the rest of RTIC, so it’s just as usable in Embassy-based applications.)

A bit of care is needed to avoid race conditions – interrupts slipping through the gaps and going unanswered – but the general pattern is:

fn poll() {
    if ...hardware condition is met... {
        Poll::Ready(result)
    } else {
        ...store the waker away...
        ...ensure the interrupt is enabled...
        Poll::Pending
    }
}

Then all the IRQ handler has to do, is figure out (by inspecting peripheral registers) exactly what caused the IRQ, i.e. which of the various successes and error conditions has occurred, and call wake() on the correct Waker. In cotton-usb-host, you can see this mechanism in action in UsbShared::on_irq().

Again a small amount of care is needed to avoid both lost interrupts (when the interrupt is accidentally not enabled at the time it actually occurs) and interrupt storms (when the IRQ handler doesn’t acknowledge or clear the hardware interrupt, so it’s immediately re-entered and nothing gets done). It’s vital that the IRQ handler, and not Future::poll(), acknowledges the interrupt: Future::poll() is not run in IRQ context, but in normal thread context, and if the IRQ handler keeps getting re-entered then Future::poll() will never get to run! The interrupt handler for RP2040 USB does its best to acknowledge interrupts without disabling them, but for those which canot be acknowledged without consuming important data, it just disables them in the interrupt-enable register and relies on the corresponding Future::poll() to re-enable the ones it needs, as in this pattern:

fn irq_handler() {
    if ...hardware condition is met... {
        waker.wake();
        ...acknowledge interrupt if possible...
    }
    ...disable any interrupts that are still pending...
}

Unit tests of async code needn’t themselves be async

This is something I wish I’d learned before writing some of the cotton-netif and cotton-ssdp unit tests. Asynchronous Rust exhibits something a bit like wave/particle duality, which in this case might be called async/poll duality. (The duality is so close that you can define async methods in traits, and then in the impl block write them as normal functions returning impl Future – or vice versa.) You can unit-test asynchronous code “in async-land” with the #[tokio::test] attribute and explicit awaits, but you can also test it “in poll-land” (biało czerwoni 🇵🇱) by observing that under the hood, the result of making a non-async call to an async function isn’t a compiler error or any sort of magic woo, it’s just a Future (or perhaps a Stream), and you can quite happily call the async function from non-async code and then just poll the future manually to drive it through its states to complete the test.

Typically, to get good unit-test coverage, unit tests of async code come in threes: the case where poll succeeds, the case where it fails, and the case where it pends. But all three can be tested in non-async tests; here’s a unit test from UsbBus that checks that UsbBus::set_address returns pending when the underlying USB transfer returns pending:

#[test]
fn set_address_pends() {
    do_test(
        |hc| {
            hc.expect_control_transfer()
                .times(1)
                .withf(is_set_address::<5>)
                .returning(control_transfer_pending);
        },
        |mut f| {
            let mut fut = pin!(f.bus.set_address(unaddressed_device(), 5));
            let poll = fut.as_mut().poll(&mut f.c);
            assert!(poll.is_pending());
            let poll = fut.as_mut().poll(&mut f.c);
            assert!(poll.is_pending());
        }
    );
}

I should mention do_test(): when writing the unit-tests for UsbBus (there’s about 100), I found there was a lot of boilerplate which kept getting repeated. So I abstracted it away into a function, leaving two closures to represent the unique bits of each test: one to set up the expectations (using the excellent Mockall crate), and then one to run the test (using a Fixture struct containing everything it might need) and verify the outcomes:

fn do_test<
    SetupFn: FnMut(&mut MockHostControllerInner),
    TestFn: FnMut(Fixture),
>(
    mut setup: SetupFn,
    mut test: TestFn,
) {
    let w = Waker::from(Arc::new(NoOpWaker));
    let mut c = core::task::Context::from_waker(&w);

    let mut hc = MockHostController::default();

    setup(&mut hc.inner);

    let f = Fixture {
        c: &mut c,
        hub_state: HubState::default(),
        bus: UsbBus::new(hc),
    };

    test(f);
}

This cleaned the tests up so much, that I’ve started adopting this pattern in other crates too.

Rust unit tests don’t have to be in the same file

The central UsbBus structure in cotton-usb-host, has about 700 lines of code in its implementation (according to test coverage); it also has about 3,500 lines of unit tests. Keeping both in the same file, which is a pattern encouraged by Rust, made it unwieldy to read the implementation or to search for things in it. But that pattern isn’t compulsory: the tests need to be a submodule of the usb_bus module (in order to have white-box-test access to the innards of UsbBus), but submodules can always be in separate files. The alternative pattern, which I adopted, is to put the tests – not just for usb_bus.rs, but for every module – in a tests/ subdirectory; the file src/usb_bus.rs ends like this:

#[cfg(all(test, feature = "std"))]
#[path = "tests/usb_bus.rs"]
mod tests;
and all the unit tests are in src/tests/usb_bus.rs.

This has the secondary advantage that, for some reason, code-coverage tools ignore the coverage of files in a tests directory – meaning that you don’t get annoying region-coverage misses for test code (which you don’t care about) bringing down your average for production code (which you do care about).

USB is a bus, and USB hubs are hubs (not switches)

Sometimes the clue is right there in the name. But, being a perennial encapsulator, an ardent hider of messy details from client code, I’d initially hoped to implement USB hubs as self-contained entities that each operate independently from the rest of the bus.

But that’s just not how USB works. In particular, when a new USB device is detected and in the Default state, it’s listening on address zero (and is surely expecting to immediately receive a SetAddress request to give it a proper unique address in the range 1-127). And, well, USB is a bus. So (give or take some considerations about operating at different speeds), all devices receive all packets and simply ignore the ones that aren’t addressed to them. Which means that only one device on the whole bus can be in the Default state (listening as device zero) at any one time – which, in turn, means there needs to be a bus-wide hub state machine that ensures that no port on any hub can be reset while any port on any other hub is being reset. This is workable, but does slightly offend my sense of proper encapsulation: it could have been avoided if SetAddress (which is the only thing that devices in the Default state need to receive) was instead an port-directed message to the hub instead of to the device itself.

(How the bus-wide state machine works in UsbBus::device_events is a bit subtle: once a hub interrupt packet has been received – indicating the presence of a newly-detected device – that newly-detected device is processed from Powered to Default all the way to Address without returning to the top of the function to check for fresh interrupt packets.)

The zerocopy crate is bad for code-coverage, but bytemuck is better

Both bytemuck and zerocopy set out to solve the same problem: to make it easier to soundly convert between Rust #[repr(C)] structures, and byte slices. Such conversions are commonplace in cases where structured data is transferred over a byte-stream connection: networking, of course, but also USB. Jack Wrenn, one of the maintainers of zerocopy, wrote an excellent summary of why this matters, and what subtleties are lurking in that, seemingly straightforward, constrait of “soundness”. (That post also introduces a third solution, TransmuteFrom, currently in nightly and hoping one day to land in the standard library.)

Unfortunately, zerocopy’s marker traits work, in part, by having a hidden trait function which is implemented in the auto-derived implementations of those traits. This function is never called – it’s just there to police against non-auto-derived implementations, which the zerocopy soundness guarantees rely on – and so it appears as untested in code-coverage analysis. The alternative bytemuck crate does not use this mechanism, and so does not pollute code-coverage analysis with unfixable false negatives.

Many heads are better than one

Especially, it turns out, when they’re the heads of receive queues. As well as the RP2040 host-controller implementation in cotton-usb-host, I’ve been looking at supporting the STM32 USB host peripheral – specifically, the Synopsys DWC one in STM32F7 and other “F-series” STM32 parts.

And actually the Synopsys one is decidedly more awkward to deal with. On RP2040, interrupt endpoints are polled in hardware (you tell it the polling interval in milliseconds/SOFs), and if a packet arrives, it appears in a memory-mapped buffer specific to that endpoint. On STM32F7, however, there is a single receive queue for all endpoints, plus a register that tells you which endpoint the current head-of-queue packet was received on.

That’s actually a bit unfortunate, as it means that, for interrupt endpoints especially, the application must be prepared to receive data on any endpoint at any time – if a task opens an interrupt endpoint but then waits on anything else happening, a packet arriving on that interrupt endpoint will jam all USB traffic until somebody gets round to reading it. That feels like an unacceptable constraint on how clients of cotton-usb-host must be written – and indeed, cotton-usb-host’s own USB hub support would fall foul of such a restriction. So it’s likely that the implementation will have to read the packet queue in the IRQ handler and store any interrupt packets somewhere in RAM (in global statics) to be read later, just to unblock the queue so that subsequent packets can be read and tasks can make progress.

The USB peripheral in STM32H5 and STM32C0 is a different one, which does have per-endpoint receive buffers and so doesn’t suffer from head-of-line blocking problems; it should be easier to deal with, especially in RAM-constrained systems.

There isn’t an ArrayOfOptionStream stream, but you can just write one

The USB hub support in cotton-usb-host needs to listen out for interrupt-endpoint packets from all hubs on the bus simultaneously. The futures-util crate contains several functions for composing streams and futures into other streams and futures, including select() and select_all(), but none quite fit the bill (select_all() comes closest, but requires alloc, which nothing else in cotton-usb-host does).

But, in a similar epiphany to the one I had about unit-tests, I realised that by working in “poll-land” I could just write one. (And after all, someone just wrote select_all(), there’s no magic compiler support for it.) So cotton-usb-host now includes a HubStateStream, containing an array of Option<Stream>, and in its poll_next method just polling all the array elements which are Some.

Except that I got halfway through writing it and realised that I couldn’t. In order to call poll_next on the contained Stream, it needed to be a Pin<&mut Stream>. But, even though inside its own poll_next, HubStateStream must surely itself be pinned, meaning that its directly-contained streams must implicitly also be pinned – the compiler didn’t know that. This problem is referred to as “pin projection”, and there are crates that help deal with it, but again none seemed to quite fit the bill. And then I realised that select_all() and friends impose an additional constraint on the streams they work with: they must be Unpin, which means they support the Stream::poll_next_unpin() method, which can be called on a plain &mut Stream. Just so long as the interrupt-endpoint pipes I was using could be made Unpin (by declaring in trait HostController that associated type InterruptPipe must have Unpin as a supertrait), I wouldn’t need any pin-projection crates.

Not every Stream, of course, is Unpin. A Stream that is the result of an async block (with an anonymous type) might not be Unpin, as such streams typically embed internal pointers – but streams of named types, written in poll-land, are usually Unpin, and indeed the one which I’d already written for RP2040 already was. So it didn’t seem inappropriate to impose that constraint on other host-controller implementors.

Monday, 9 September 2024

Rustifying Lakos: Avoiding circular dependencies in Rust

Previously on #rust:

     

It would be unfair to John Lakos to gloss his monumental book Large-Scale C++ Software Design as basically just saying “Don’t have circular dependencies”. The book contains lots of hard-earned useful advice on the architecting of C/C++ systems, and if the words that make up its title, large-scale C++ software design, form any part of your daily life then I can only suggest that you just go and read it. (I... won’t wait. It’s a hefty tome. I confess I haven’t read any of his ongoing Peter-Jackson-esque project to expand it across three volumes.)

But it is fair to say that much of it consists of very C++ solutions to very C++ problems, and by no means all of it is relevant to large-scale software systems in other languages. Amongst all the pieces of good advice imparted, though, “Don’t have circular dependencies” is perhaps the most widely-applicable.

Rust’s cargo package manager effectively rules out circular dependencies between crates (i.e., libraries). But there’s no such built-in barrier to creating circular dependencies between modules within a crate. This blog post explains the third-party tooling that can be used to erect such a barrier.

But why, though?

Some of the benefits of non-circularity, or levelisability, lauded in the Lakos book apply more to C++ than to Rust: compilation-speed improvements, for instance, where in C++ the translation unit is the source file but in Rust it is the crate. But others still very much apply:

  • Integrity of structural induction: By “structural induction” I mean the idea that unit-testing can be seen as proving that your code is correct; the “proofs” in question aren’t always quite as rigorous as what a mathematician would recognise as a proof, but mathematics does provide a useful structure and language for reasoning about these “proofs”. For instance, a mathematician would say, if you’ve proven that certain pieces of code A and B do their jobs correctly – and that another piece of code C(x,y) does its own job correctly, just so long as the dependencies x and y that it gets given each do their own jobs correctly – then all told, you’ve proven that the overall combination C(A,B) is itself correct. With this mental model, you can use unit-testing to go about making a tree of proofs: first that each lowest-level component in your system does its job correctly, then that the components one level higher (with lowest-level components as dependencies) do their jobs correctly, and so on all the way up to your system as a whole. If your system’s dependency “tree” isn’t a tree at all but has loops or cycles, this proof falls to pieces – or, at the very best, you have to awkwardly test the whole loop of components as a single entity.

    If you didn’t do Test-Driven Development (perhaps you did some “Spike-Driven Exploration” first), and are only just now adding unit-tests to an existing crate, then this is exactly the information you need in order to direct your test-writing efforts where they’re most useful: write tests for the leaves of the dependency graph first, which are usually the foundational parts where failings would cause hard-to-understand cascading failures throughout the crate, then work your way up to the more complex modules.

  • Reusability: Since the days of CORBA and COM, efforts to foster a wide ecosystem of reusable software components have either failed, or succeeded in ways that were worse than failure. But reusability within an organisation or a project is still important, and it’s eased when components can be cleanly separated, without false dependencies on huge swathes of the surrounding code. (Extra “extremely online” points if you guessed what story that link would lead to before clicking on it.)
  • Ease of understanding: Just like an induction proof or like a reuse attempt, any attempt to understand the operation of the code – by any “new” developer, which includes future-you-who’s-forgotten-why-you-did-that – is easier if it can proceed on a component-by-component basis, as opposed to needing a whole interlinked cycle of components to be comprehended in one go.
  • Cleanliness, and its proximity to godliness: As I wrote on this very blog more than thirteen years ago, cyclic dependencies are a “design warning” – like a compiler warning, but for your design instead of your implementation. And just like a compiler warning, sometimes there’s a false positive – but keeping your code warning-free is still helpful, because it makes sure that when a true positive crops up, it’s breaking news, and isn’t lost in the noise.

But how, though?

#!/bin/bash -xe
#
# usage:
#    bin/do-deps
#    (browse to target/deps/index.html)

PACKAGES=`cargo metadata --no-deps --format-version 1 | jq '.packages[].name' --raw-output`

mkdir -p target/deps
echo "" > target/deps/index.htm

for PKG in $PACKAGES ; do
    echo "<img src=$PKG-deps.png><pre>" >> target/deps/index.htm
    cargo modules dependencies --package $PKG --lib \
          --no-externs --no-sysroot --no-fns --no-traits --no-types \
          --layout dot > target/$PKG-deps.dot
    sed -E -e 's/\[constraint=false\]//' -e 's/splines="line",//' \
        -e 's/rankdir=LR,//' \
        -e 's/label="(.*)", f/label="{\1}", f/' \
        < target/$PKG-deps.dot > target/$PKG-deps2.dot
    tred < target/$PKG-deps2.dot > target/$PKG-deps3.dot 2>> target/deps/index.htm
    dot -Tpng -Gdpi=72 < target/$PKG-deps3.dot > target/deps/$PKG-deps.png
    echo "</pre><hr/>" >> target/deps/index.htm
done

First of all, the (built-in to Cargo) cargo metadata command is used to list all the crates in the current workspace (usefully, if invoked in a non-workspace crate, it just returns that crate’s name).

Then the very excellent cargo modules command does most of the work. I’m only using a tiny part of its functionality here; you should read its documentation to learn about the rest, which might be of especial interest if you’d like to model your project’s dependencies in a different way from that presented here. That page also describes how to install it (but it’s just “cargo install cargo-modules”).

By default, cargo modules shows all the individual functions, traits and types that link the dependency graph together; for all but the smallest project, that’s too much clutter for these purposes, as we just want to see which modules depend on which other ones. Temporarily removing the --no-fns --no-traits --no-types arguments will show the whole thing, which can be helpful if it’s not obvious exactly why one module is reported as depending on another. (Currently, cargo modules has some issues around reporting impl X for Y blocks and around static data items – but issues caused by those are hopefully rare, and it’s still enormously better than nothing.)

The sed command has the effect of making the dependency graph run top-to-bottom rather than left-to-right; that seemed easier to read, especially in the blog-post format used here.

Each graph-definition (*.dot) file is drawn as a PNG image, and a simple HTML page is generated that just inlines all the images. This script is run in CI and the HTML page (with its images) is included in the output artifacts of that job.

Before and after

The (as yet unpublished) cotton-net crate exhibited several dependency cycles when analysed with that script; in this image there are three upward-pointing arrows, each indicating a cycle. (The colours represent visibility: almost everything here is green for “public visibility”.)

warning: %1 has cycle(s), transitive reduction not unique
cycle involves edge cotton_net::arp -> cotton_net

The graph can’t, of course, really indicate which components should be seen as “at fault” here – though, for each cycle, at least one of the components in that cycle must be changed for the cycle to be broken. Instead, determining which of the components is “at fault” is a very human endeavour, usually involving the train of thought, “Wait, why does thing-I-think-of-as-low-level depend on thing-I-think-of-as-higher-level?” In this particular case, some very low-level abstractions indeed (IPv4 address, MAC address) were in the crate root “for convenience”; the convenience factor is real but it can be achieved in a levelisable way by defining everything in its right place, and then exporting them again from the top-level (“pub use ethernet::MacAddress”) for library-users’ convenience. (Such exports of course count as a dependency of the root on the low-level module, which is the direction we don’t mind.)

Sometimes a cycle can be broken by separating out a trait from a contentious structure: in this case, the new interface_data trait was extracted from interface, which meant that all the individual protocols (TCP, UDP) could depend on the trait, not on the implementation. This is a bit like Lakos’s “Dumb Data” refactoring from “Large Scale C++ Software Design”, section 5.5; indeed, the whole of Chapter 5 of that book presents refactorings that can help disentangle cycles in Rust just as well as they can in C++.

Because of the way Graphviz’s tred tool works – indeed, because of the way the mathematical “transitive reduction” operation inherently works – it’s not actually very important how many cycles appear in the graph, only whether there’s some or none. In this case, as well as the three cyclic dependencies via the crate root, the modules interface and tcp depended directly on each other in a pathologically-small cycle; the graph doesn’t depict the tcpinterface arrow directly, as it’s implied by the larger cycle via tcpethernet→root→dhcpinterface. So in fact there were more than three issues with the original cotton-net (which, in my defence, was one of the first things I tried to write in Rust; some of the commits in there are over eight years old) – the fixing of dependency cycles is an iterative process, and each time you fix one you should re-check for the existence of further ones until there are none left.

Saturday, 10 August 2024

How to recover from ill-advisedly installing gcc-13 on Ubuntu 22.04

     

I thought I’d try compiling Chorale with gcc-13, which is newer than the default gcc-12 version in Ubuntu 22.04 – compiler diversity drives out bugs, and all that. Easy peasy, there’s a “toolchain testing” repository that packages gcc-13 for 22.04. Adding it was as straightforward as:

 sudo add-apt-repository ppa:ubuntu-toolchain-r/test
 sudo apt install g++-13
and, with a couple of new warnings fixed, everything compiled with gcc-13 as fine as ever. So I installed gcc-13 on the CI host in the same way, and pushed the branch.

And all the builds failed.

It turns out that installing gcc-13 updates the libstdc++ headers – to ones which neither the default clang-14 nor clang-18 is able to compile. I thought about making clang use (LLVM’s) libc++ instead of (GCC’s, the default) libstdc++ – but you can’t have both libc++ and libstdc++ installed in parallel, because they disagree about who owns libunwind. Oddly, libgstreamer depends directly on libstd++’s libunwind, so installing libc++’s libunwind instead would uninstall GStreamer. That’s not really on.

So there was nothing else to do but uninstall gcc-13 again. But unwinding that installation process was not straightforward: as well as the gcc-13 and g++-13 packages themselves, which live happily alongside gcc-12 and g++-12, the upgrade also upgraded the single, system-wide copies of several foundational packages such as libgcc-s1. Undoing all those upgrades was necessary too: partly because Clang unconditionally uses the highest installed version number of libstdc++ to locate its headers, but also because having these packages stuck at 13.x would mean never getting Ubuntu security updates for them ever again (updates which would have 12.x version numbers).

Somewhat naively I imagined that just un-adding the repository might solve that:

sudo apt-add-repository -r ppa:ubuntu-toolchain-r/test
but even after apt update the hoped-for, “You’ve got all these packages installed that match none of my repositories, shall I replace them with pukka ones?” message abjectly failed to appear. Nor does apt install pkg or apt upgrade override a newer locally-installed package with an older one from a repository. (Which I grudgingly admit is usually the correct thing to do.) And completely uninstalling these foundational packages, even momentarily, seemed like a bad idea – especially as this is actually Kubuntu, and the entire desktop is in C++ and links against libstdc++.

I ended up manually searching for packages using dpkg -l – fortunately, all the packages installed alongside gcc-13 shared its exact version number (13.1.0), which I could just grep for.

Each one of those packages needed to be replaced with the corresponding Ubuntu 22.04 version, which is indicated by using Ubuntu release codenames – for 22.04, that’s “Jammy”:

sudo apt install libgcc-s1/jammy
... etc., for about 30 packages

But worse, some of these packages depended on each other, so ideally there’d be just one apt install command-line. I ended up doing this:

 sudo apt-add-repository -r ppa:ubuntu-toolchain-r/test
 sudo apt remove g++-13 gcc-13
 sudo apt install `dpkg -l | fgrep 13.1.0 | grep -v -- -13 | sed -E 's/[ \t]+/\t/g' | cut -f 2 | sed -e 's,$,/jammy,'`
 sudo apt remove cpp-13 gcc-13-base
The long shell substitution in the install line uses dpkg to list all packages, then filters on the version 13.1.0, then filters out packages with “-13” in the actual name (which won’t have 22.04 equivalents), then use sed and cut to extract just the package name from the verbose dpkg output, then add the trailing /jammy and pass all the results as arguments to a single apt install invocation.

The apt install command politely told me exactly what it was going to do – downgrade those packages, no more or less than that – so it wasn’t stressful to give it the Y to proceed.

Tuesday, 21 May 2024

The Austin Test: A software checklist for silicon vendors

     
^photo by John Robert Shepherd under CC-BY-2.0
     
Silicon chips are usually made by chipmakers[citation needed]. These are companies largely staffed by chip designers — who often, quite naturally, view the chip as the product. But a more holistic view would say that the product is the value that the chip provides to their customer — and for all but the simplest passive parts, that value is provided partly through software.

Software design is a related but decidedly different skill-set to chip design. That means that the difference between silicon products succeeding or failing, can be nothing to do with the merit or otherwise of the chip design itself, but instead down to the merit or otherwise of the accompanying software. A sufficiently large chip buyer, aiming their own product at a sufficiently lucrative market, can often overcome poor “developer experience” by throwing internal software engineering at the problem: for a product that eventually sells in the millions, it can indeed be worth shaving cents off the bill-of-materials cost by specifying a chip that’s cheaper but harder-to-use than the alternative, even when taking into account the increased spending on software development.

But if you’re a chipmaker, many of your customers will not be in that position — or may accept that they are, but still underestimate the software costs, and release the product in a bug-ridden state or not at all — and ultimately sell fewer of their own products to end-users and thus buy fewer of your chips.

So the product isn’t finished just because the chip itself is sitting there, taped-out and fabbed and packaged and black and square and on reels of 1,000. For any complex or high-value chip — a microcontroller, for instance — the product is not complete until there’s also a software story, usually in the form of a Software Development Kit (SDK) and accompanying documentation. But a chipmaker staffed unremittingly and at too high a level with only chip-design experts may not even, corporately, realise when the state of the art in software has moved on. So here is a checklist — in the spirit of the famous “Joel Test” — to aid hardware specialists in assessing the maturity of a chipmaker’s software process; I’ve named it after the home-town of a chipmaker I once worked for.

The Austin Test

  1. Could your SDK have been written using only your public documentation?
  2. Is your register documentation generated from the chip source?
  3. Are the C headers for your registers, generated from the chip source?
  4. Are your register definitions available in machine-readable form (e.g. SVD)?
  5. Is your pinmux definition available in machine-readable form?
  6. Are all the output pins tristated when the chip is held in reset, and on cold boot?
  7. Is it straightforward to get notifications when your public documentation
    changes (e.g. new datasheet revision released)?
  8. Is it straightforward to use your SDK as a software component in a larger system?

1. Could your SDK have been written using only your public documentation?

Sometimes when the silicon comes back from the fab, it doesn’t work quite the way the designers expected. There’s no shame in that. And quite often when that happens, there’s a software workaround for whatever the issue is, so you put that workaround into your SDK. There’s no shame in that either. But if the problem and the workaround are not documented, you’ve just laid a trap for anyone who’s not using your entire SDK in just the way you expected. (See Questions 4 and 8.)

Perhaps your customer is using Rust, or Micropython. Perhaps they have a range of products, based on a range of different chips, of which yours is just one. If there’s “magic” hidden in your C SDK to quietly work around chip issues, then those customers are going to have a bad time.

(The original Mastodon post of which this blog post is a lengthy elaboration.)

2. Is your register documentation generated from the chip source?

I’m pretty sure that even the chipmakers with the most mature and sensible software operations don’t actually do this: they don’t have an automated process that ingests Verilog or VHDL and emits Markdown or RTF or some other output that gets pasted straight into the “Register Layout” sections of their public documentation. (You can sort of tell they don’t, from the changelogs in reference manuals.) But it’s the best way of guaranteeing accuracy — certainly superior to having human beings painstakingly compare the two.

Because I’ve worked at chip companies, I do realise that one reason not to do this is because not everything is public. Because silicon design cycles are so long, and significant redesign is so arduous, what chipmakers do is speculatively design all manner of stuff onto the chip, then test it and only document the parts that actually work or that they can find uses for. Sometimes this is visible as mysterious gaps in memory maps or in peripheral-enable registers; sometimes it’s less visible. I can personally vouch that there is a whole bunch of stuff in the silicon of the Displaylink DL-3000 chip that has never actually been used by the published firmware or software. But this is easily dealt with by equipping the automated process with a filter that just lets through the publicly-attested blocks. It’s still a win to have an automated process for the documentation just of those blocks!

3. Are the C headers for your registers, generated from the chip source?

This is again essentially the question, do you have a process that inherently guarantees correctness, or do you have human employees laboriously curate correctness? The sheer volume of C headers for a sophisticated modern microcontroller can be enormous, and if it’s not automatically-generated then you have only your example code — or worse, customers building their own products — to chase out corner cases where they’re incorrect.

Really these first three questions are closely interlinked: if you find that you can’t write your SDK against only publicly-attested headers, that should be a big hint that you’ve filtered-out too much: that your customers won’t be able to write their own code against those headers either.

4. Are your register definitions available in machine-readable form (e.g. SVD)?

ARM’s SVD (“System View Definition”) format was created as a machine-readable description of the register layout of Cortex-M-based microcontrollers, for the consumption of development tools — so that a debugger, for instance, could describe “a write to address 0x5800_0028” more helpfully as “a write to RTC.SSR”. But the utility of such a complete description is not limited to debuggers: in the embedded Rust ecosystem, the peripheral access crates or PACs that consist of the low-level register read and write functions which enable targetting each specific microcontroller — a sort of crowd-sourced SDK — are themselves generated straight from SVD files. (Higher-level abstractions are then layered on top by actual software engineers.)

Even for C/C++ codebases that are more directly compatible with the vendor’s SDK, it might sometimes be preferable to generate at least the low-level register-accessing code in-house rather than use the vendor SDK: for instance to generate test mocks of the register-access API — or, for code targetting a range of similar microcontrollers, separating common peripherals (e.g. the STM32 GPIO block, which is identical across a large number of different STM32 variants) from per-target peripherals where each chip variant needs its own code (e.g. the STM32 “RCC” clock-control block).

At Electric Imp we did both of those things: our range of internet-of-things products spanned several generations of (closely-related) microcontroller, with all of these products remaining in support and building in parallel from the same codebase for many years, and so we needed a better answer to Question 8 than our silicon vendor provided at the time. (In Rust terms, we needed something that looked like stm32-metapac, not stm32-rs.) And using the SVD files to generate test mocks of the register API, let us achieve good unit-test coverage even of the lowest-level device-driver code (a topic I hope to return to in a future blog post).

Basically, having a good SVD file available (perhaps itself generated from the chip source) gives your customers an “escape hatch” if they find their needs aren’t met by your published SDK. Although SVD was invented by ARM to assist takeup of their very successful Cortex-M line, it is so obviously useful that SVD files are becoming the standard way of programmatically defining the peripheral registers of non-ARM-based microcontrollers too.

5. Is your pinmux definition available in machine-readable form?

Most microcontrollers have far more peripherals on-board than they have pins available to dedicate to them, so several different peripherals or signals are multiplexed onto each physical pin; that way, customers interested in, say, UART-heavy designs and those interested in SPI-heavy designs, can all use the same microcontroller and just set the multiplex, the pinmux, appropriately to connect the desired peripherals with the available pins. Often, especially in low-pin-count packages, up to sixteen different functions can be selected on each pin.

This muxing information, like the register definitions themselves, is helpful metadata about the chip — and, thus, about software targetting it. A machine-readable version of this information can be used to make driver code more readable, and more amenable to linting or other automated checks for correctness.

Pinmux information in datasheets is typically organised into a big table where each row is a pin and the columns are the signals available; at the very least, having this mapping available as a CSV or similar would make it easy to invert it in order to allow the reverse lookup: for this signal, or this peripheral, what pins is it available on? Laboriously manually creating that inverse map was always one of the first tasks to be done whenever a new microcontroller crossed my path at Electric Imp.

6. Are all the output pins tristated when the chip is held in reset, and on cold boot?

Honestly this question is a specific diss-track of one particular microcontroller which failed to do this. I mean, lots of perfectly sensible microcontrollers tristate everything on cold boot except the JTAG or SWD pins, and that exception is completely reasonable. But this part drove a square wave out of some of its output pins while held in reset. It’s hard to fathom how that could have come about, without some pretty fundamental communication issues inside that chipmaker about what a reset signal even is and what it’s for.

(The microcontroller in question was part of a product line later sold-on to Cypress Semi; it might have been more fitting to sell it to Cypress Hill, as the whole thing was insane in the brain.)

7. Is it straightforward to get notifications when your public documentation changes (e.g. new datasheet revision released)?

I’ve said a few somewhat negative things about chipmakers so far, so here comes a solidly positive one: Renesas do this really well. Every time you download a datasheet or chip manual from their website, you get a popup offering to email you whenever a newer version is released of the document that you’ve just downloaded. Particularly at the start of a chip’s lifecycle, when these documents may include newly-discovered chip errata that require customer code changes, this service can be a huge time-saver. Chipmakers who don’t already do this should seriously consider it: it’s not rocket-science to implement, especially as compared to, say, the designing of a microcontroller.

8. Is it straightforward to use your SDK as a software component in a larger system?

The “developer experience” of a microcontroller SDK typically looks like: “Thank you for choosing the Arfle Barfle 786DX/4, please select how you’d like it configured, clickety-click, ta-da! here’s the source for a hello-world binary, now just fill in these blanks with your product’s actual functionality.” And so it should, of course, because that’s where every customer starts out. But it’s not where every customer ends up, especially in the case of a successful product (which, really, is the case you ought to be optimising for): there’s a bigger, more complex version 2 of the customer’s hardware, or there’s a newer generation of your microcontroller that’s faster or has more SRAM — or perhaps you’ve “won the socket” to replace your competitor’s microcontroller in a certain product, but the customer has a big wodge of their own existing hard-fought code that they’d like to bring across to the new version of that product.

In all of these cases, your SDK suddenly stopped being a framework with little bits of customer functionality slotted-in in well-documented places, and started being just one, somewhat replaceable, component in a larger framework organised by your customer. You don’t get to pick the source-tree layout. You probably don’t get to write main() (or the reset handler); if there’s things that need doing there for your specific chip to work, then see Question 1.

Being a software component in a larger system doesn’t preclude also having a friendly out-of-box developer experience; it just means that, when customers peer beneath the hood of the software framework that you’ve built them, what they see is that the core of the framework is a small collection of well-designed fundamental components — built with all the usual software-engineering values such as separation-of-concerns, composeability, unit-testing, and documentation — which can be used on their without the customer needing to understand your entire SDK.

When I worked at Sigmatel on their SDKs for microcontroller-DSPs for the portable media player market, it was clear that successful customers came in two types: there were the opportunist, low-engineering folks who took the SDK’s example media player firmware, stuck their own branding on it if that, and shipped the whole thing straight out again to their own end-users; and the established, high-engineering folks who already had generations of media-player firmware in-house, and just wanted the device drivers so that they could launch a new product with the familiar UI of their existing products. And there was nobody in-between these extremes, so it was not a good use of time to try to serve that middle-ground well.

In a sense the importance of a good answer to this question was emphasised by Finnish architect — that’s buildings architect, not software architect — Eliel Saarinen, who famously said “Always design a thing by considering it in its next larger context — a chair in a room, a room in a house, a house in an environment, an environment in a city plan.” (Quoted posthumously by his son, also an architect.)

I wish I’d seen that quote when I was just starting in software engineering. One of the most useful and widely-applicable tenets that I’ve been learning the hard way since, is this: Always keep in mind whether you’re designing a system, or whether you’re designing a component in a larger system. Hint: it’s never the former.

Tuesday, 2 April 2024

Solved: KDE Sleep immediately reawakens (Nvidia vs systemd)

Previously on #homelab:

     
I’ve had this PC, a self-assembled Linux (Kubuntu 22.04) desktop with Intel DX79SR motherboard and Core i7-3930K CPU, since late 2012. And today I finally got sleep working. (Though in fairness it’s not like I spent the entire intervening 11½ years investigating the problems.)

Problem One was that the USB3 controllers never came back after a resume (the USB2 sockets continued to work). And not just resume-after-suspend: USB3 stopped working if I power-cycled using the front panel button instead of the physical switch on the back of the PSU. This turns out to be a silicon (or at least firmware) bug in the suspend modes of the NEC/Renesas uPD720200 and/or uPD720201 controllers on the motherboard; newer firmware is probably available but, last I checked, could only be applied under Windows (not even DOS). The workaround is to edit /etc/default/grub and add usbcore.autosuspend=-1 to GRUB_CMDLINE_LINUX_DEFAULT.

Fixing that got power-cycling using the front-panel power button working, but exposed Problem Two — choosing “Sleep” in KDE’s shutdown menu (or on the icon bar) or pressing the dedicated sleep (“⏾”) button on the keyboard (Dell KB522) successfully momentarily blanked the screen but then immediately woke up again to the login prompt. But I discovered that executing pm-suspend worked (the system powered-down, the power LED started blinking, and any keyboard key woke it up again), as did executing /lib/systemd/systemd-sleep suspend.

So something must have been getting systemd in a pickle in-between it deciding to suspend, and it actually triggering the kernel-level suspending (which it does in /usr/lib/systemd/system/systemd-suspend.service). Eventually I found advice to check journalctl | grep logind and it showed this:

Apr 01 10:06:23 amd64 systemd-logind[806]:\
      Error during inhibitor-delayed operation (already returned success to client): Unit nvidia-suspend.service is masked.
Apr 01 10:14:28 amd64 systemd-logind[806]: Suspend key pressed.
Apr 01 10:14:28 amd64 systemd-logind[806]:\
      Error during inhibitor-delayed operation (already returned success to client): Unit nvidia-resume.service is masked.

This PC has an Nvidia graphics card (it’s a GeForce 1050Ti / GP107), but it uses the Nouveau driver, not the proprietary Nvidia one to which those suspend and resume services belong. And that turned out to be the issue: with those services masked (due to the driver not being present), there were dangling symlinks to their service files present as /etc/systemd/system/systemd-suspend.service.requires/nvidia-{suspend,resume}.service. Removing those dangling symlinks made both the Sleep menu option and the keyboard button work.

It’s possible that I once had the Nvidia proprietary driver installed (dpkg -S isn’t prepared to own up to who put those Nvidia symlinks there) but that, being “configuration files”, removing the driver didn’t remove them.

If you had to name the two most controversial parts of a modern-day Linux install, I think you’d probably come up with (1) systemd and (2) the proprietary Nvidia drivers. I’m not usually a follower of internet hate mobs, but I do have to say: it turned out that the issue was an interaction between systemd and the proprietary Nvidia drivers, which weren’t even installed.

SEO bait: Kubuntu 22.04, Ubuntu 22.04, won't sleep, suspend fails, wakes up, systemd-suspend, pm-suspend, solved.

System-testing embedded code in Rust, part three: A CI test-runner

Previously on #rust:

     
Thanks to earlier parts of this series, developers of Cotton now have the ability to run automated system-tests of the embedded builds using their own computer as the test host — if they have the right STM32F746-Nucleo development board to hand. What we need to do now, is add the ability for continuous integration to run those tests automatically on request (e.g. whenever a feature branch is pushed to the central git server). For at least the third time on this blog, we’re going to employ a Raspberry Pi 3; the collection of those sitting unused in my desk drawer somehow never seems to get any smaller. (And the supply shortages of recent years seem to have abated.)

First, set up Ubuntu 22.04 with USB boot and headless full-disk encryption in the usual way.

OK, perhaps doing so isn’t all that everyday (that remark was slightly a dig at Larousse Gastronomique, at least one recipe in which starts, “First, make a brioche in the usual way”). But this very blog has exactly the instructions you need. This time, instead of a 512GB Samsung USB SSD, I used a 32GB Sandisk USB flash drive — the test-runner won’t be needing tons of storage.

Alternatively, you could use any other reasonable computer you’ve got lying around — an Intel NUC, say, or a cast-off laptop. Most or all of the following steps will apply to any freshly-installed Ubuntu box, and many of them to other Linux distributions too. But you’ll have the easiest time of it if your CI test-runner has the same architecture and OS as your main CI server, which is another reason I’m pressing yet another Raspberry Pi into service here.

However you get there, what you need to proceed with the rest of this post (at least the way I proceeded) is:

  • An existing Laminar CI server, on your local Ethernet.
  • A Raspberry Pi to become the test-runner;
  • with Ubuntu 22.04 freshly installed;
  • plugged into your local Ethernet, able to DHCP (I gave mine a fixed address in OpenWRT’s DHCP server setup) — or, at least, somewhere where the CI server can SSH to it;
  • with your own user, that you can SSH in as, and use sudo.
  • A USB-to-Ethernet adaptor (I used the Amazon Basics one);
  • an Ethernet switch, at least 100Mbit (I used this TP-Link LS1008);
  • and an STM32F746-Nucleo development board.

In this blog post we’re going to:

  1. Connect everything together, making a separate test network
  2. Automate the remaining setup of the test-runner, using Ansible
  3. Arrange that the CI server can SSH to the test-runner autonomously
  4. Run a trivial CI job on the test-runner
  5. Package up the system-tests and run a real CI job on the test-runner
  6. Go and sit down and think about what we’ve done

1. Connect everything together, making a separate test network

For once, instead of being a whimsical stock photo of something tangentially related, the image at the top of this blog post is an actual photograph of the actual physical items being discussed herein (click to enlarge if need be). The only connections to the outside world are from my home network to the Raspberry Pi’s built-in Ethernet (lower left) and power to the Raspberry Pi and to the Ethernet switch (top left and centre bottom). The test network is otherwise self-contained: the STM32 development board is on a private Ethernet segment with just the Raspberry Pi’s USB Ethernet for company. This network has its own RFC1918 address range, 192.168.3.x, distinct from the rest of the home network. (The Wifi interface on the Raspberry Pi is not currently being used.)

The breadboard is attached to the Raspberry Pi’s GPIO connector, and at present is only used to provide a “testing in progress” LED attached to GPIO13 (glowing white in the photo, but therefore hard to see). GPIO usage could become more sophisticated in the future: for instance, if I was writing a HAL crate for a new embedded device, I could connect the Raspberry Pi’s GPIO inputs to the GPIO outputs of the embedded device (and vice-versa) and system-test my GPIO code.

The Raspberry Pi programs the STM32 over USB; apart from that and the Ethernet adaptor, the USB flash drive also takes up one of the four USB sockets, leaving just one spare for future enhancements. (But hubs are a thing, of course.)

2. Automate the remaining setup of the test-runner, using Ansible

The initial setup of a new Ubuntu installation is arguably best done manually, as you might need to react to things that the output of the commands is telling you. But once the basics are in place, the complexities of setting up everything else that the test-runner needs, are best automated — so they can be repeatable, documented, and explainable to others (in this case: you).

Automating server setup is not new news to cloud engineers, who often use tools such as Chef or Puppet to bring one or more new hosts or containers up-to-speed in a single command. Electric Imp used Chef, which I never really got on with, partly because of the twee yet mixed metaphors (“Let’s knife this cookbook — solo!”, which is a thing all chefs say), but mostly because it was inherently bound up with Ruby. Yet I felt I needed a bit more structure than just “copy on a shell script and run it”. So for #homelab purposes, I thought I’d try Ansible.

Ansible is configured using YAML files, which at least is one step up from Ruby as a configuration language. The main configuration file is called a “playbook”, which contains “plays” (think sports, not theatre), which are in turn made up of individual “tasks”. A task can be as simple as executing a single shell command, but the benefit of Ansible is that it comes with a huge variety of add-ons which allow tasks to be written in a more expressive way. For instance, instead of faffing about to automate cron, there’s a “cron” add-on which even knows about the @reboot directive and lets you write:

    - name: init gpios on startup
      cron:
        name: init-gpios
        special_time: reboot
        job: /usr/local/bin/init-gpios

Tasks can be linked to the outcome of earlier tasks, so that for instance it can restart the DHCP server if, and only if, the DHCP configuration has been changed. With most types of task, the Ansible configuration is “declarative”: it describes what the situation ought to be, and Ansible checks whether that’s already the case, and changes things only where they’re not as desired.

Ansible can be used for considerably more complex setups than the one in this blog post — many, many servers of different types that all need different things doing to them — but I’ve made an effort at least to split up the playbook into plays relevant to basically any machine (hosts: all), or ones relevant just to the Raspberry Pi (hosts: raspberrypis), or ones specific to the rôle of being a system-test runner (hosts: testrunners).

I ended up with 40 or so tasks in the one playbook, which between them install all the needed system packages as root, then install Rust, Cargo and probe-rs as the laminar user, then set up the USB Ethernet adaptor as eth1 and run a DHCP server on it.

Declaring which actual machines make up “all”, “testrunners”, etc., is the job of a separate “inventory” file; the one included in the repository matches my setup at home. The inventory file is also the place to specify per-host information that shouldn’t be hard-coded into the tasks: in this case, the DHCP setup tasks need to know the MAC address of the USB Ethernet adaptor, but that’s host-specific, so it goes in the inventory file.

All the tasks in the main YAML file are commented, so I won’t re-explain them here, except to say that the “mark wifi optional”, “rename eth1”, and “set static IP for eth1” tasks do nothing to dispel my suspicion that Linux networking is nowadays just a huge seven-layer dip of xkcd-927 all the way down, with KDE and probably Gnome only adding their own frothy outpourings on top.

I added a simple Makefile to the systemtests/ansible directory, just to automate running the one playbook with the one inventory.

The name “Ansible” comes originally from science-fiction, where it’s used to mean a device that can communicate across deep space without experiencing speed-of-light latency. I only mention this because when communicating across about a metre of my desk, it’s bewilderingly slow — taking seconds to update each line of sshd_config. That’s about the same as speed-of-light latency would be if the test-runner was on the Moon.

But having said all that, there’s still value in just being able to re-run Ansible and know that everything is set consistently and repeatably.

I did wonder about running Ansible itself under CI — after all, it’s software, it needs to be correct when infrequently called upon, so it ought therefore to be automatically tested to avoid bugs creeping in. But running Ansible needs root (or sudo) access on the test-runner, which in turn means it needs SSH access to the test-runner as a user which is capable of sudo — and I don’t want to leave either of those capabilities lying around unencrypted-at-rest in CI. So for the time being it’s down to an unreliable human agent — that’s me — to periodically run make in the ansible directory.

3. Arrange that the CI server can SSH to the test-runner autonomously

Most of the hosts in this series of #homelab posts are set up with, effectively, zero-trust networking: they can only be accessed via SSH, and all SSH sessions start with me personally logged-in somewhere and running ssh-agent (and using agent-forwarding). But because the CI server needs to be able to start system-test jobs on the test-runner, it needs to be able to login via SSH completely autonomously.

This isn’t as alarming as it sounds, as the user it logs into (the laminar user on the test-runner) isn’t very privileged; in particular, it’s not in the sudo group and thus can’t use sudo at all. (The Ansible setup explicitly grants that user permissions to the hardware it needs to access.)

Setting this up is a bit like setting up your own SSH for the first time. First generate a key-pair:

ssh-keygen -t ed25519

— when prompted, give the empty passphrase, and choose “laminar-ssh” as the output file. The public key will be written to “laminar-ssh.pub”.

The public key needs to be added to ~laminar/.ssh/authorized_keys (American spelling!) on the test-runner; the Ansible setup already does this for my own CI server’s public key.

Once authorized_keys is in place, you can test the setup using:

ssh -i laminar-ssh laminar@scotch

Once you’re happy that it works, copy the file laminar-ssh as ~laminar/.ssh/id_ed25519 on the main CI server (not the test-runner!):

sudo cp ~peter/laminar-ssh ~laminar/.ssh/id_ed25519
sudo chown -R laminar.laminar ~laminar/.ssh
sudo chmod 0700 ~laminar/.ssh

You can test that setup by using this command on the CI server (there should be no password or pass-phrase prompt):

sudo -u laminar ssh scotch

— indeed, you need to do this at least once, in order to reassure the CI server’s SSH client that you trust the test-runner’s host key.

4. Run a trivial CI job on the test-runner

Now that the laminar user on the CI server can SSH freely to scotch, what remains is mostly Laminar setup. This part is adapted very closely from Laminar’s own documentation: we set up a “context” for the test-runner, separate from the default context used by all the other jobs (because the test-runner can carry on when other CPU-intensive jobs are running on the CI server), then add a remote job that executes in that context.

/var/lib/laminar/cfg/contexts/test-runner-scotch.conf
EXECUTORS=1
/var/lib/laminar/cfg/contexts/test-runner-scotch.env
RUNNER=scotch

The context is partly named after the test-runner host, but it also includes the name of the test-runner host as an environment variable. This means that the job-running scripts don’t need to hard-code that name.

As before, the actual job file in the jobs directory defers all the complexity to a generic do-system-tests script in the scripts directory:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run
#!/bin/bash -xe
exec do-system-tests

In keeping with the Laminar philosophy of not poorly reinventing things that already exist, Laminar itself has no built-in support for running jobs remotely — because that’s what SSH is for. This, too, is closely-inspired by the Laminar documentation:

/var/lib/laminar/cfg/scripts/do-system-tests
#!/bin/bash -xe

ssh laminar@$RUNNER /bin/bash -xe << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  uname -a
  run-parts /etc/update-motd.d
  sleep 20
  echo 0 > /sys/class/gpio/gpio13/value
EOF

This, of course, doesn’t run any actual tests, but it provides validation that the remote-job mechanism is working. The LED attached to GPIO13 on the test-runner Raspberry Pi serves as the “testing in progress” indicator. (And the run-parts invocation is an Ubuntu-ism: it reproduces the “message of the day”, the welcome message that’s printed when you log in. Most of it is adverts these days.)

Tying the job to the context is the $JOB.conf file:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.conf
CONTEXTS=test-runner-*

Due to the judicious use of a wildcard, the job can run on any test-runner; my setup only has the one, but if you found yourself in a team with heavy contention for the system-test hardware, this setup would allow you to build a second, identical Raspberry Pi with all the same hardware attached — called shepherds, maybe — and add it as a separate Laminar context. Because Laminar runs a job as soon as any context it’s fit for becomes available, this would automatically split queued system-test jobs across the two test-runners.

With all of this in place, it’s time to trigger the job on the CI server:

laminarc queue cotton-system-tests-dev

After a few teething troubles, including the thing I mentioned above about making sure that the SSH client accepts scotch’s host key, I was pleased to see the “testing in progress” LED come on and the message-of-the-day spool out in the Laminar logs.

5. Package up the system-tests and run a real CI job on the test-runner

We didn’t come here just to read some Ubuntu adverts in the message-of-the-day. Now we need to do the real work of building Cotton for the STM32 target, packaging-up the results, and transferring them to the test-runner where they can be run on the target hardware. First we build:

/var/lib/laminar/cfg/jobs/cotton-embedded-dev.run
#!/bin/bash -xe

PROJECT=cotton
RUST=stable
BRANCH=${BRANCH-main}
SOURCE=/var/lib/laminar/run/$PROJECT/workspace

(
    flock 200
    cd $SOURCE/$PROJECT
    git checkout $BRANCH
    cd -
    cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock

source $HOME/.cargo/env
rustup default $RUST

cd $PROJECT
cargo build -p systemtests -F arm,stm32f746-nucleo --all-targets
cargo test --no-run -p systemtests -F arm,stm32f746-nucleo 2> $ARCHIVE/binaries.txt
grep "Executable tests/" $ARCHIVE/binaries.txt  | cut -d'(' -f 2 | cut -d')' -f 1 > binaries.2.txt

tar cf $ARCHIVE/binaries.tar `find cross/*/target -type f -a -executable \
        | grep -v /deps/ | grep -v /build/` `cat binaries.2.txt`
laminarc queue cotton-system-tests-dev PARENT_RUN=$RUN
exec prune-archives cotton-embedded-dev 10

The actual build commands look much like many of the other Laminar jobs but with the extra Cargo features added which enable the cross-compiled targets; the interesting parts of this script come once the cargo build is done and the results must be tarred-up ready to be sent to the test-runner.

Finding all the target binaries is fairly easy using find cross/*/target, but we also need to find the host binary from the systemtests package. The easiest way to do that is to parse the output of cargo test --no-run, which includes lines such as:

   Compiling systemtests v0.0.1 (/home/peter/src/cotton/systemtests)
    Finished test [unoptimized + debuginfo] target(s) in 2.03s
  Executable unittests src/lib.rs (target/debug/deps/systemtests-4a9b67de54149231)
  Executable tests/device/main.rs (target/debug/deps/device-cfdcb3ff3e5eaaa5

The line with “Executable tests” is the one we’re looking for. (The string of hex digits after the executable name changes every time the sources change.) It’s possible that we could “cheat” here and just pick the first file we find starting with target/debug/deps/device-, as this is CI so we’re always building from clean — but this is a more robust way of determining the most recent binary.

(You might feel that this section is a bit of a cop-out, a bit white-box: knowing that there’s only one host-side binary does make the packaging a lot easier. If there were a lot of host-side binaries to package, and this technique started creaking at the seams, I’d look into cargo-nextest which has features specifically designed for packaging and unpackaging suites of Cargo tests.)

Once everything the system-tests job will need is stored in $ARCHIVE/binaries.tar, we can trigger the system-tests job — making sure to tell it, in $PARENT_RUN, which build in the archive it should be testing. (Initially I had the system-tests job use “latest”, but that’s wrong: it doesn’t handle multiple queued jobs correctly, and has a race condition even without queued jobs. The “latest” archive is that of the most recent successfully-finished job — but the build job hasn’t yet finished at the time it triggers the test job.)

The final prune-archives command is something I added after the initial Laminar writeup when some of the archive directories (particularly doc and coverage) started getting big: it just deletes all but the most recent N non-empty archives:

/var/lib/laminar/cfg/scripts/prune-archives
#!/bin/bash -xe

PROJECT=$1
KEEP=${2-2}

cd /var/lib/laminar/archive/$PROJECT
for d in `find * -maxdepth 0 -type d -a \! -empty | sort -n | head -n -$KEEP`; do
    rm -r /var/lib/laminar/archive/$PROJECT/$d
done

No-one likes deleting data, but in this case older archives should all be recoverable at any time, if the need arises, just by building the source again at that revision.

Now the cotton-system-tests-dev.run job needs to pass the $PARENT_RUN variable on to the underlying script:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run
#!/bin/bash -xe

exec do-system-tests cotton-embedded-dev $PARENT_RUN

and the do-system-tests script can use it to recover the tarball and scp it off to the test-runner:

/var/lib/laminar/cfg/scripts/do-system-tests
#!/bin/bash -xe

PARENT_JOB=$1
PARENT_RUN=$2

scp /var/lib/laminar/archive/$PARENT_JOB/$PARENT_RUN/binaries.tar laminar@$RUNNER:

ssh laminar@$RUNNER /bin/bash -xeE << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  cleanup_function() {
    echo 0 > /sys/class/gpio/gpio13/value
    exit 1
  }
  trap 'cleanup_function' ERR
  export PS4='+ \t '
  export PATH=/home/laminar/.cargo/bin:$PATH
  rm -rf tests
  mkdir -p tests/systemtests
  ( cd tests
    tar xf ../binaries.tar
    cd systemtests
    export CARGO_MANIFEST_DIR=`pwd`
    ../target/debug/deps/device-* --list
    ../target/debug/deps/device-* --test
  )
  echo 0 > /sys/class/gpio/gpio13/value
EOF

The rest of the script has also gained in features and complexity. It now includes a trap handler to make sure that the testing-in-progress LED is extinguished even if the tests fail with an error. (See here for why this requires the -E flag to /bin/bash.)

The script goes on to add timestamps to the shell output (and thus to the logs) by adding \t to PS4, and add the Cargo bin directory to the path (because that’s where probe-rs got installed by Ansible).

The tests themselves need to be executed as if from the systemtests directory of a full checkout of Cotton — which we don’t have here on the test-runner — so the directories must be created manually. With all that in place, we can finally run the host-side test binary, which will run all the device-side tests including flashing the STM32 with the binaries, found via relative paths from $CARGO_MANIFEST_DIR.

That’s a lot, but it is all we need to successfully list and run all the device-side tests on our test-runner. Here’s (the best part of) the Laminar logs from a successful run:

+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --list
stm32f746_nucleo::arm_stm32f746_nucleo_dhcp: test
stm32f746_nucleo::arm_stm32f746_nucleo_hello: test
stm32f746_nucleo::arm_stm32f746_nucleo_ssdp: test

3 tests, 0 benchmarks
+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --test

running 3 tests
test stm32f746_nucleo::arm_stm32f746_nucleo_dhcp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 66.37s

+ 16:48:00 echo 0

And now that everything’s working, we can add it to the chain of events that’s triggered whenever a branch is pushed to the CI server:

/var/lib/laminar/cfg/job/cotton-dev.run
#!/bin/bash -xe

BRANCH=${BRANCH-main}
do-checkout cotton $BRANCH
export LAMINAR_REASON="built $BRANCH at `cat git-revision`"
laminarc queue \
	 cotton-embedded-dev BRANCH=$BRANCH \
	 cotton-doc-dev BRANCH=$BRANCH \
         cotton-grcov-dev BRANCH=$BRANCH \
         cotton-msrv-dev BRANCH=$BRANCH \
         cotton-beta-dev BRANCH=$BRANCH \
         cotton-nightly-dev BRANCH=$BRANCH \
	 cotton-minver-dev BRANCH=$BRANCH

If you’re wondering about the -dev suffix on all those jobs: I set up Laminar with two parallel sets of identical build jobs. There’s the ones with -dev which are triggered when pushing feature branches, and the ones without the -dev which are triggered when pushing to main. This is arranged by the Git server post-receive hook:

git/cotton.git/hooks/post-receive
#!/bin/bash -ex

while read oldrev newrev ref
do
    if [ "${ref:0:11}" == "refs/heads/" -a "$newrev" != "0000000000000000000000000000000000000000" ];
    then
        export BRANCH=${ref:11}
        export LAMINAR_REASON="git push $BRANCH"
        if [ "$BRANCH" == "main" ];
        then
           laminarc queue cotton BRANCH=$BRANCH
        else
           laminarc queue cotton-dev BRANCH=$BRANCH
        fi
    fi
done

In a sense there’s no engineering need for this duplication: exactly the same actual work is undertaken in either case. But there’s a social need: feature branches (with -dev) aren’t expected to always pass all tests — indeed, such branches are often pushed for the express purpose of determining whether or not they pass all tests. But the main branch is expected to pass all tests, all the time, and a regression is to be taken seriously. That is: if the cotton-dev build or one of its downstreams fails, that’s got a very different implication from the cotton build or one of its downstreams failing. The cotton build itself should enjoy long, uninterrupted streaks of regression-free passes (and indeed it does; the last failure was in November 2023 due to this issue causing intermittent unit-test failures).

6. Go and sit down and think about what we’ve done

Well, what have we done? We have, over the course of three blog posts, taken a bit of knowledge of bash, cron, and SSH that we already had, then gone and learned a bit about Laminar, Ansible, and Cargo, and the “full stack” that we then engineered for ourselves using all that knowledge is this: any time we push a feature branch, we (eventually) get told automatically, as a straight yes/no answer, whether it’s OK for main or not. That’s an immensely powerful thing to have, an immensely useful property to have an oracle for.

Having that facility available is surely expected these days in other areas of software engineering — where test-harnesses are easier to create — but I hope I’ve demonstrated in these blog posts that even those working on embedded systems can also enjoy and benefit from the reliability (of main in particular) that’s due to this type of workflow.

(Other versions of the workflow could be constructed if preferred. Perhaps you don’t want every push to start a system-test sequence — in that case, you could either write a more complex Git post-receive hook, or set up an alternative Git remote, called perhaps “tests”, so that pushing to that remote kicked off the test sequence. Or you could tie the test sequence in to your pull-request process somehow.)

To repeat a line from the previous instalment, any software you need to test can be tested automatically in some way that is superior to not testing it at all. Compared to Rust’s unit tests, which are always just a cargo test away, it took us three quite detailed blog posts and a small pile of physical hardware to get us to the stage where this embedded code could be tested automatically. If I had to distill the message of these three posts from their ~11,000 words down to just six, they’d be: this sort of effort is worthwhile. If your product is, or runs on, a specific platform or piece of hardware, it’s worth spending a lot of effort arranging to test automatically on the actual target hardware, or as near to the actual target as is practical. (Sometimes embedded products are locked-down, fused-off, potted, or otherwise rendered inaccessible; testing, in that case, requires access to unlocked variants.)

That is to say: is the effort worth it for the cotton-ssdp crate — a few thousand lines of code, about 60% of which is already tests, and the rest of which has 100% test coverage? Arguably yes, but also arguably no, especially as almost all of cotton-ssdp can be tested in hosted builds. The cotton-ssdp crate has acted here more as a spike, a proof-of-concept. But the point is, the concept was proved, a baseline has now been set, and all the right testing infrastructure is in place if I want to write a power-aware RTOS, or implement HAL crates for some of these weird development boards in my desk drawer, or if I want to disrupt the way PAC crates are generated in order to improve the testing story of HAL crates. Now when I start doing those things, I can start defending the functionality with system-tests from the outset. If I want to do those more complex, more embedded-centric things — which I do — then all the effort expended so far will ultimately be very beneficial indeed. If you, too, aim to do complex or embedded-centric things, then similar levels of effort will benefit your projects.

6.1 War stories from the front lines of not-system-testing

I have some war stories for you. I heard tell of a company back in the day whose product, a hardware peripheral device, was (for sound commercial reasons) sold as working with numerous wonky proprietary Unixes. But for some of the more obscure platforms, there had been literally no automated testing: a release was declared by development, it was thrown over the wall into the QA department, and in due course a human tester would physically insert the CD-R into one of these wonky old machines and manually run through a test script ensuring that everything appeard to work. This was such a heavyweight process that it was run very rarely — meaning that, if an issue was found on, say, AIX, then the code change that caused it probably happened months ago and with significant newer work built on top of it. And of course such a discovery at “QA time” meant that the whole, lumbering manual release process had to be started again from scratch once the issue was fixed. This was exactly the pathology that CI was invented to fix! I mean, I’m pretty sure Laminar doesn’t support antediluvian AIX versions out of the box, but given the impact of any issues on release schedules, it was definitely worth their putting in quite significant development effort to bring the installation process under CI — automatically on main (at least nightly, if not on every push), and by request on any feature branch. (Developers need a way to have confidence they haven’t broken AIX before merging to main.) They should have asked themselves, “What can be done to automate testing of the install CD, in some way that is superior to not testing it at all?” — to place it under the control of a CI test-runner, as surely as the STM32F746-Nucleo is under the control of this Raspberry Pi? Well — what’s the simplest thing that can act as a fake CD-ROM drive well enough to fool an AIX box? Do those things have USB host? Can you bitbang SCSI-1 target on a Raspberry Pi? Would a BlueSCSI help? Or even if they were to decide that “CI-ing” the actual install CD image is too hard — how often are the failures specifically CD-related? Could they just SSH on the installer tarballs and test every other part of the process? How did this pathology last for more than one single release?

I also heard tell of a different company whose product was an embedded system, and was under automated test, including before merging to main — but following a recent “urgent” porting exercise (again for sound commercial reasons), many of the tests didn’t pass. The test harness they used supported marking tests as expected-failure — but no-one bothered doing so. So every test run “failed”, and developers had to manually pore over the entire list of test results to determine whether they’d regressed anything. In a sense the hard part of testing was automated, but the easy part not! This company had put in 99% of the effort towards automated testing, but were reaping just a tiny fraction of the benefits, because the very final step — looking at the test results and turning them into a yes/no for whether the code is OK for main — was not automated. How did this pathology last for more than one single afternoon?

6.2 People

The rhetorical-looking questions posed above about the AIX CI and the expected-fail tests (“How did these obviously wrong situations continue?”) did in fact have an answer in the real world. Indeed, it was the same answer in both cases: people.

In the first company, the head of QA presided over a large and (self-)important department — which it needed to be in order to have enough staff to manually perform quite so much automatable work. If QA was run with the attitude that human testers are exploratory testers, squirrelers-out of corner-cases, stern critics of developers’ assumptions — if the testers’ work-product, including during early-development-phase involvement, was more and better automated tests — then they probably wouldn’t need anything like so many testers, and the rôle of “Head of QA” would perhaps be viewed as a less big cheese than hitherto. Although the product quality would benefit, the company’s bottom-line would benefit, and even the remaining testers’ career-progression would benefit — the Head of QA’s incentives were misaligned with all of that, and they played the game they were given by the rules that they found to be in effect.

The second company seems harder to diagnose, but fundamentally the questions are, “Who is in charge of setting the quality bar for merges to main?” and “Who is in charge of what happens when that bar is not met?”. Those are likely to be two different people — they require very different skills — but if you find that a gulf is opening up between your team’s best practices and your team’s typical practices, then both those people are needed in order to bring the two closer together again. (In my own career I’ve a few times acted as the first one, but I’ve never acted as the second one: as Dr. Anthony Fauci famously didn’t say, “I don’t know how to explain to you that you should care for other people software quality.”)

This post, and this blog, and this blogger, cannot help you to deal with people. But often the talking point of the people you need to convince (or whose boss you need to convince to overrule them) is that better automated testing isn’t technologically feasible, isn’t worth attempting. I hope I’ve done a little to dispel that myth at least.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.