Whitebait, Kleftiko, Ekmek Special

Not in fact any relation to the famous large Greek meal of the same name.

Monday, 9 September 2024

Rustifying Lakos: Avoiding circular dependencies in Rust

Previously on #rust:

     

It would be unfair to John Lakos to gloss his monumental book Large-Scale C++ Software Design as basically just saying “Don’t have circular dependencies”. The book contains lots of hard-earned useful advice on the architecting of C/C++ systems, and if the words that make up its title, large-scale C++ software design, form any part of your daily life then I can only suggest that you just go and read it. (I... won’t wait. It’s a hefty tome. I confess I haven’t read any of his ongoing Peter-Jackson-esque project to expand it across three volumes.)

But it is fair to say that much of it consists of very C++ solutions to very C++ problems, and by no means all of it is relevant to large-scale software systems in other languages. Amongst all the pieces of good advice imparted, though, “Don’t have circular dependencies” is perhaps the most widely-applicable.

Rust’s cargo package manager effectively rules out circular dependencies between crates (i.e., libraries). But there’s no such built-in barrier to creating circular dependencies between modules within a crate. This blog post explains the third-party tooling that can be used to erect such a barrier.

But why, though?

Some of the benefits of non-circularity, or levelisability, lauded in the Lakos book apply more to C++ than to Rust: compilation-speed improvements, for instance, where in C++ the translation unit is the source file but in Rust it is the crate. But others still very much apply:

  • Integrity of structural induction: By “structural induction” I mean the idea that unit-testing can be seen as proving that your code is correct; the “proofs” in question aren’t always quite as rigorous as what a mathematician would recognise as a proof, but mathematics does provide a useful structure and language for reasoning about these “proofs”. For instance, a mathematician would say, if you’ve proven that certain pieces of code A and B do their jobs correctly – and that another piece of code C(x,y) does its own job correctly, just so long as the dependencies x and y that it gets given each do their own jobs correctly – then all told, you’ve proven that the overall combination C(A,B) is itself correct. With this mental model, you can use unit-testing to go about making a tree of proofs: first that each lowest-level component in your system does its job correctly, then that the components one level higher (with lowest-level components as dependencies) do their jobs correctly, and so on all the way up to your system as a whole. If your system’s dependency “tree” isn’t a tree at all but has loops or cycles, this proof falls to pieces – or, at the very best, you have to awkwardly test the whole loop of components as a single entity.

    If you didn’t do Test-Driven Development (perhaps you did some “Spike-Driven Exploration” first), and are only just now adding unit-tests to an existing crate, then this is exactly the information you need in order to direct your test-writing efforts where they’re most useful: write tests for the leaves of the dependency graph first, which are usually the foundational parts where failings would cause hard-to-understand cascading failures throughout the crate, then work your way up to the more complex modules.

  • Reusability: Since the days of CORBA and COM, efforts to foster a wide ecosystem of reusable software components have either failed, or succeeded in ways that were worse than failure. But reusability within an organisation or a project is still important, and it’s eased when components can be cleanly separated, without false dependencies on huge swathes of the surrounding code. (Extra “extremely online” points if you guessed what story that link would lead to before clicking on it.)
  • Ease of understanding: Just like an induction proof or like a reuse attempt, any attempt to understand the operation of the code – by any “new” developer, which includes future-you-who’s-forgotten-why-you-did-that – is easier if it can proceed on a component-by-component basis, as opposed to needing a whole interlinked cycle of components to be comprehended in one go.
  • Cleanliness, and its proximity to godliness: As I wrote on this very blog more than thirteen years ago, cyclic dependencies are a “design warning” – like a compiler warning, but for your design instead of your implementation. And just like a compiler warning, sometimes there’s a false positive – but keeping your code warning-free is still helpful, because it makes sure that when a true positive crops up, it’s breaking news, and isn’t lost in the noise.

But how, though?

#!/bin/bash -xe
#
# usage:
#    bin/do-deps
#    (browse to target/deps/index.html)

PACKAGES=`cargo metadata --no-deps --format-version 1 | jq '.packages[].name' --raw-output`

mkdir -p target/deps
echo "" > target/deps/index.htm

for PKG in $PACKAGES ; do
    echo "<img src=$PKG-deps.png><pre>" >> target/deps/index.htm
    cargo modules dependencies --package $PKG --lib \
          --no-externs --no-sysroot --no-fns --no-traits --no-types \
          --layout dot > target/$PKG-deps.dot
    sed -E -e 's/\[constraint=false\]//' -e 's/splines="line",//' \
        -e 's/rankdir=LR,//' \
        -e 's/label="(.*)", f/label="{\1}", f/' \
        < target/$PKG-deps.dot > target/$PKG-deps2.dot
    tred < target/$PKG-deps2.dot > target/$PKG-deps3.dot 2>> target/deps/index.htm
    dot -Tpng -Gdpi=72 < target/$PKG-deps3.dot > target/deps/$PKG-deps.png
    echo "</pre><hr/>" >> target/deps/index.htm
done

First of all, the (built-in to Cargo) cargo metadata command is used to list all the crates in the current workspace (usefully, if invoked in a non-workspace crate, it just returns that crate’s name).

Then the very excellent cargo modules command does most of the work. I’m only using a tiny part of its functionality here; you should read its documentation to learn about the rest, which might be of especial interest if you’d like to model your project’s dependencies in a different way from that presented here. That page also describes how to install it (but it’s just “cargo install cargo-modules”).

By default, cargo modules shows all the individual functions, traits and types that link the dependency graph together; for all but the smallest project, that’s too much clutter for these purposes, as we just want to see which modules depend on which other ones. Temporarily removing the --no-fns --no-traits --no-types arguments will show the whole thing, which can be helpful if it’s not obvious exactly why one module is reported as depending on another. (Currently, cargo modules has some issues around reporting impl X for Y blocks and around static data items – but issues caused by those are hopefully rare, and it’s still enormously better than nothing.)

The sed command has the effect of making the dependency graph run top-to-bottom rather than left-to-right; that seemed easier to read, especially in the blog-post format used here.

Each graph-definition (*.dot) file is drawn as a PNG image, and a simple HTML page is generated that just inlines all the images. This script is run in CI and the HTML page (with its images) is included in the output artifacts of that job.

Before and after

The (as yet unpublished) cotton-net crate exhibited several dependency cycles when analysed with that script; in this image there are three upward-pointing arrows, each indicating a cycle. (The colours represent visibility: almost everything here is green for “public visibility”.)

warning: %1 has cycle(s), transitive reduction not unique
cycle involves edge cotton_net::arp -> cotton_net

The graph can’t, of course, really indicate which components should be seen as “at fault” here – though, for each cycle, at least one of the components in that cycle must be changed for the cycle to be broken. Instead, determining which of the components is “at fault” is a very human endeavour, usually involving the train of thought, “Wait, why does thing-I-think-of-as-low-level depend on thing-I-think-of-as-higher-level?” In this particular case, some very low-level abstractions indeed (IPv4 address, MAC address) were in the crate root “for convenience”; the convenience factor is real but it can be achieved in a levelisable way by defining everything in its right place, and then exporting them again from the top-level (“pub use ethernet::MacAddress”) for library-users’ convenience. (Such exports of course count as a dependency of the root on the low-level module, which is the direction we don’t mind.)

Sometimes a cycle can be broken by separating out a trait from a contentious structure: in this case, the new interface_data trait was extracted from interface, which meant that all the individual protocols (TCP, UDP) could depend on the trait, not on the implementation. This is a bit like Lakos’s “Dumb Data” refactoring from “Large Scale C++ Software Design”, section 5.5; indeed, the whole of Chapter 5 of that book presents refactorings that can help disentangle cycles in Rust just as well as they can in C++.

Because of the way Graphviz’s tred tool works – indeed, because of the way the mathematical “transitive reduction” operation inherently works – it’s not actually very important how many cycles appear in the graph, only whether there’s some or none. In this case, as well as the three cyclic dependencies via the crate root, the modules interface and tcp depended directly on each other in a pathologically-small cycle; the graph doesn’t depict the tcpinterface arrow directly, as it’s implied by the larger cycle via tcpethernet→root→dhcpinterface. So in fact there were more than three issues with the original cotton-net (which, in my defence, was one of the first things I tried to write in Rust; some of the commits in there are over eight years old) – the fixing of dependency cycles is an iterative process, and each time you fix one you should re-check for the existence of further ones until there are none left.

Saturday, 10 August 2024

How to recover from ill-advisedly installing gcc-13 on Ubuntu 22.04

     

I thought I’d try compiling Chorale with gcc-13, which is newer than the default gcc-12 version in Ubuntu 22.04 – compiler diversity drives out bugs, and all that. Easy peasy, there’s a “toolchain testing” repository that packages gcc-13 for 22.04. Adding it was as straightforward as:

 sudo add-apt-repository ppa:ubuntu-toolchain-r/test
 sudo apt install g++-13
and, with a couple of new warnings fixed, everything compiled with gcc-13 as fine as ever. So I installed gcc-13 on the CI host in the same way, and pushed the branch.

And all the builds failed.

It turns out that installing gcc-13 updates the libstdc++ headers – to ones which neither the default clang-14 nor clang-18 is able to compile. I thought about making clang use (LLVM’s) libc++ instead of (GCC’s, the default) libstdc++ – but you can’t have both libc++ and libstdc++ installed in parallel, because they disagree about who owns libunwind. Oddly, libgstreamer depends directly on libstd++’s libunwind, so installing libc++’s libunwind instead would uninstall GStreamer. That’s not really on.

So there was nothing else to do but uninstall gcc-13 again. But unwinding that installation process was not straightforward: as well as the gcc-13 and g++-13 packages themselves, which live happily alongside gcc-12 and g++-12, the upgrade also upgraded the single, system-wide copies of several foundational packages such as libgcc-s1. Undoing all those upgrades was necessary too: partly because Clang unconditionally uses the highest installed version number of libstdc++ to locate its headers, but also because having these packages stuck at 13.x would mean never getting Ubuntu security updates for them ever again (updates which would have 12.x version numbers).

Somewhat naively I imagined that just un-adding the repository might solve that:

sudo apt-add-repository -r ppa:ubuntu-toolchain-r/test
but even after apt update the hoped-for, “You’ve got all these packages installed that match none of my repositories, shall I replace them with pukka ones?” message abjectly failed to appear. Nor does apt install pkg or apt upgrade override a newer locally-installed package with an older one from a repository. (Which I grudgingly admit is usually the correct thing to do.) And completely uninstalling these foundational packages, even momentarily, seemed like a bad idea – especially as this is actually Kubuntu, and the entire desktop is in C++ and links against libstdc++.

I ended up manually searching for packages using dpkg -l – fortunately, all the packages installed alongside gcc-13 shared its exact version number (13.1.0), which I could just grep for.

Each one of those packages needed to be replaced with the corresponding Ubuntu 22.04 version, which is indicated by using Ubuntu release codenames – for 22.04, that’s “Jammy”:

sudo apt install libgcc-s1/jammy
... etc., for about 30 packages

But worse, some of these packages depended on each other, so ideally there’d be just one apt install command-line. I ended up doing this:

 sudo apt-add-repository -r ppa:ubuntu-toolchain-r/test
 sudo apt remove g++-13 gcc-13
 sudo apt install `dpkg -l | fgrep 13.1.0 | grep -v -- -13 | sed -E 's/[ \t]+/\t/g' | cut -f 2 | sed -e 's,$,/jammy,'`
 sudo apt remove cpp-13 gcc-13-base
The long shell substitution in the install line uses dpkg to list all packages, then filters on the version 13.1.0, then filters out packages with “-13” in the actual name (which won’t have 22.04 equivalents), then use sed and cut to extract just the package name from the verbose dpkg output, then add the trailing /jammy and pass all the results as arguments to a single apt install invocation.

The apt install command politely told me exactly what it was going to do – downgrade those packages, no more or less than that – so it wasn’t stressful to give it the Y to proceed.

Tuesday, 21 May 2024

The Austin Test: A software checklist for silicon vendors

     
^photo by John Robert Shepherd under CC-BY-2.0
     
Silicon chips are usually made by chipmakers[citation needed]. These are companies largely staffed by chip designers — who often, quite naturally, view the chip as the product. But a more holistic view would say that the product is the value that the chip provides to their customer — and for all but the simplest passive parts, that value is provided partly through software.

Software design is a related but decidedly different skill-set to chip design. That means that the difference between silicon products succeeding or failing, can be nothing to do with the merit or otherwise of the chip design itself, but instead down to the merit or otherwise of the accompanying software. A sufficiently large chip buyer, aiming their own product at a sufficiently lucrative market, can often overcome poor “developer experience” by throwing internal software engineering at the problem: for a product that eventually sells in the millions, it can indeed be worth shaving cents off the bill-of-materials cost by specifying a chip that’s cheaper but harder-to-use than the alternative, even when taking into account the increased spending on software development.

But if you’re a chipmaker, many of your customers will not be in that position — or may accept that they are, but still underestimate the software costs, and release the product in a bug-ridden state or not at all — and ultimately sell fewer of their own products to end-users and thus buy fewer of your chips.

So the product isn’t finished just because the chip itself is sitting there, taped-out and fabbed and packaged and black and square and on reels of 1,000. For any complex or high-value chip — a microcontroller, for instance — the product is not complete until there’s also a software story, usually in the form of a Software Development Kit (SDK) and accompanying documentation. But a chipmaker staffed unremittingly and at too high a level with only chip-design experts may not even, corporately, realise when the state of the art in software has moved on. So here is a checklist — in the spirit of the famous “Joel Test” — to aid hardware specialists in assessing the maturity of a chipmaker’s software process; I’ve named it after the home-town of a chipmaker I once worked for.

The Austin Test

  1. Could your SDK have been written using only your public documentation?
  2. Is your register documentation generated from the chip source?
  3. Are the C headers for your registers, generated from the chip source?
  4. Are your register definitions available in machine-readable form (e.g. SVD)?
  5. Is your pinmux definition available in machine-readable form?
  6. Are all the output pins tristated when the chip is held in reset, and on cold boot?
  7. Is it straightforward to get notifications when your public documentation
    changes (e.g. new datasheet revision released)?
  8. Is it straightforward to use your SDK as a software component in a larger system?

1. Could your SDK have been written using only your public documentation?

Sometimes when the silicon comes back from the fab, it doesn’t work quite the way the designers expected. There’s no shame in that. And quite often when that happens, there’s a software workaround for whatever the issue is, so you put that workaround into your SDK. There’s no shame in that either. But if the problem and the workaround are not documented, you’ve just laid a trap for anyone who’s not using your entire SDK in just the way you expected. (See Questions 4 and 8.)

Perhaps your customer is using Rust, or Micropython. Perhaps they have a range of products, based on a range of different chips, of which yours is just one. If there’s “magic” hidden in your C SDK to quietly work around chip issues, then those customers are going to have a bad time.

(The original Mastodon post of which this blog post is a lengthy elaboration.)

2. Is your register documentation generated from the chip source?

I’m pretty sure that even the chipmakers with the most mature and sensible software operations don’t actually do this: they don’t have an automated process that ingests Verilog or VHDL and emits Markdown or RTF or some other output that gets pasted straight into the “Register Layout” sections of their public documentation. (You can sort of tell they don’t, from the changelogs in reference manuals.) But it’s the best way of guaranteeing accuracy — certainly superior to having human beings painstakingly compare the two.

Because I’ve worked at chip companies, I do realise that one reason not to do this is because not everything is public. Because silicon design cycles are so long, and significant redesign is so arduous, what chipmakers do is speculatively design all manner of stuff onto the chip, then test it and only document the parts that actually work or that they can find uses for. Sometimes this is visible as mysterious gaps in memory maps or in peripheral-enable registers; sometimes it’s less visible. I can personally vouch that there is a whole bunch of stuff in the silicon of the Displaylink DL-3000 chip that has never actually been used by the published firmware or software. But this is easily dealt with by equipping the automated process with a filter that just lets through the publicly-attested blocks. It’s still a win to have an automated process for the documentation just of those blocks!

3. Are the C headers for your registers, generated from the chip source?

This is again essentially the question, do you have a process that inherently guarantees correctness, or do you have human employees laboriously curate correctness? The sheer volume of C headers for a sophisticated modern microcontroller can be enormous, and if it’s not automatically-generated then you have only your example code — or worse, customers building their own products — to chase out corner cases where they’re incorrect.

Really these first three questions are closely interlinked: if you find that you can’t write your SDK against only publicly-attested headers, that should be a big hint that you’ve filtered-out too much: that your customers won’t be able to write their own code against those headers either.

4. Are your register definitions available in machine-readable form (e.g. SVD)?

ARM’s SVD (“System View Definition”) format was created as a machine-readable description of the register layout of Cortex-M-based microcontrollers, for the consumption of development tools — so that a debugger, for instance, could describe “a write to address 0x5800_0028” more helpfully as “a write to RTC.SSR”. But the utility of such a complete description is not limited to debuggers: in the embedded Rust ecosystem, the peripheral access crates or PACs that consist of the low-level register read and write functions which enable targetting each specific microcontroller — a sort of crowd-sourced SDK — are themselves generated straight from SVD files. (Higher-level abstractions are then layered on top by actual software engineers.)

Even for C/C++ codebases that are more directly compatible with the vendor’s SDK, it might sometimes be preferable to generate at least the low-level register-accessing code in-house rather than use the vendor SDK: for instance to generate test mocks of the register-access API — or, for code targetting a range of similar microcontrollers, separating common peripherals (e.g. the STM32 GPIO block, which is identical across a large number of different STM32 variants) from per-target peripherals where each chip variant needs its own code (e.g. the STM32 “RCC” clock-control block).

At Electric Imp we did both of those things: our range of internet-of-things products spanned several generations of (closely-related) microcontroller, with all of these products remaining in support and building in parallel from the same codebase for many years, and so we needed a better answer to Question 8 than our silicon vendor provided at the time. (In Rust terms, we needed something that looked like stm32-metapac, not stm32-rs.) And using the SVD files to generate test mocks of the register API, let us achieve good unit-test coverage even of the lowest-level device-driver code (a topic I hope to return to in a future blog post).

Basically, having a good SVD file available (perhaps itself generated from the chip source) gives your customers an “escape hatch” if they find their needs aren’t met by your published SDK. Although SVD was invented by ARM to assist takeup of their very successful Cortex-M line, it is so obviously useful that SVD files are becoming the standard way of programmatically defining the peripheral registers of non-ARM-based microcontrollers too.

5. Is your pinmux definition available in machine-readable form?

Most microcontrollers have far more peripherals on-board than they have pins available to dedicate to them, so several different peripherals or signals are multiplexed onto each physical pin; that way, customers interested in, say, UART-heavy designs and those interested in SPI-heavy designs, can all use the same microcontroller and just set the multiplex, the pinmux, appropriately to connect the desired peripherals with the available pins. Often, especially in low-pin-count packages, up to sixteen different functions can be selected on each pin.

This muxing information, like the register definitions themselves, is helpful metadata about the chip — and, thus, about software targetting it. A machine-readable version of this information can be used to make driver code more readable, and more amenable to linting or other automated checks for correctness.

Pinmux information in datasheets is typically organised into a big table where each row is a pin and the columns are the signals available; at the very least, having this mapping available as a CSV or similar would make it easy to invert it in order to allow the reverse lookup: for this signal, or this peripheral, what pins is it available on? Laboriously manually creating that inverse map was always one of the first tasks to be done whenever a new microcontroller crossed my path at Electric Imp.

6. Are all the output pins tristated when the chip is held in reset, and on cold boot?

Honestly this question is a specific diss-track of one particular microcontroller which failed to do this. I mean, lots of perfectly sensible microcontrollers tristate everything on cold boot except the JTAG or SWD pins, and that exception is completely reasonable. But this part drove a square wave out of some of its output pins while held in reset. It’s hard to fathom how that could have come about, without some pretty fundamental communication issues inside that chipmaker about what a reset signal even is and what it’s for.

(The microcontroller in question was part of a product line later sold-on to Cypress Semi; it might have been more fitting to sell it to Cypress Hill, as the whole thing was insane in the brain.)

7. Is it straightforward to get notifications when your public documentation changes (e.g. new datasheet revision released)?

I’ve said a few somewhat negative things about chipmakers so far, so here comes a solidly positive one: Renesas do this really well. Every time you download a datasheet or chip manual from their website, you get a popup offering to email you whenever a newer version is released of the document that you’ve just downloaded. Particularly at the start of a chip’s lifecycle, when these documents may include newly-discovered chip errata that require customer code changes, this service can be a huge time-saver. Chipmakers who don’t already do this should seriously consider it: it’s not rocket-science to implement, especially as compared to, say, the designing of a microcontroller.

8. Is it straightforward to use your SDK as a software component in a larger system?

The “developer experience” of a microcontroller SDK typically looks like: “Thank you for choosing the Arfle Barfle 786DX/4, please select how you’d like it configured, clickety-click, ta-da! here’s the source for a hello-world binary, now just fill in these blanks with your product’s actual functionality.” And so it should, of course, because that’s where every customer starts out. But it’s not where every customer ends up, especially in the case of a successful product (which, really, is the case you ought to be optimising for): there’s a bigger, more complex version 2 of the customer’s hardware, or there’s a newer generation of your microcontroller that’s faster or has more SRAM — or perhaps you’ve “won the socket” to replace your competitor’s microcontroller in a certain product, but the customer has a big wodge of their own existing hard-fought code that they’d like to bring across to the new version of that product.

In all of these cases, your SDK suddenly stopped being a framework with little bits of customer functionality slotted-in in well-documented places, and started being just one, somewhat replaceable, component in a larger framework organised by your customer. You don’t get to pick the source-tree layout. You probably don’t get to write main() (or the reset handler); if there’s things that need doing there for your specific chip to work, then see Question 1.

Being a software component in a larger system doesn’t preclude also having a friendly out-of-box developer experience; it just means that, when customers peer beneath the hood of the software framework that you’ve built them, what they see is that the core of the framework is a small collection of well-designed fundamental components — built with all the usual software-engineering values such as separation-of-concerns, composeability, unit-testing, and documentation — which can be used on their without the customer needing to understand your entire SDK.

When I worked at Sigmatel on their SDKs for microcontroller-DSPs for the portable media player market, it was clear that successful customers came in two types: there were the opportunist, low-engineering folks who took the SDK’s example media player firmware, stuck their own branding on it if that, and shipped the whole thing straight out again to their own end-users; and the established, high-engineering folks who already had generations of media-player firmware in-house, and just wanted the device drivers so that they could launch a new product with the familiar UI of their existing products. And there was nobody in-between these extremes, so it was not a good use of time to try to serve that middle-ground well.

In a sense the importance of a good answer to this question was emphasised by Finnish architect — that’s buildings architect, not software architect — Eliel Saarinen, who famously said “Always design a thing by considering it in its next larger context — a chair in a room, a room in a house, a house in an environment, an environment in a city plan.” (Quoted posthumously by his son, also an architect.)

I wish I’d seen that quote when I was just starting in software engineering. One of the most useful and widely-applicable tenets that I’ve been learning the hard way since, is this: Always keep in mind whether you’re designing a system, or whether you’re designing a component in a larger system. Hint: it’s never the former.

Tuesday, 2 April 2024

Solved: KDE Sleep immediately reawakens (Nvidia vs systemd)

Previously on #homelab:

     
I’ve had this PC, a self-assembled Linux (Kubuntu 22.04) desktop with Intel DX79SR motherboard and Core i7-3930K CPU, since late 2012. And today I finally got sleep working. (Though in fairness it’s not like I spent the entire intervening 11½ years investigating the problems.)

Problem One was that the USB3 controllers never came back after a resume (the USB2 sockets continued to work). And not just resume-after-suspend: USB3 stopped working if I power-cycled using the front panel button instead of the physical switch on the back of the PSU. This turns out to be a silicon (or at least firmware) bug in the suspend modes of the NEC/Renesas uPD720200 and/or uPD720201 controllers on the motherboard; newer firmware is probably available but, last I checked, could only be applied under Windows (not even DOS). The workaround is to edit /etc/default/grub and add usbcore.autosuspend=-1 to GRUB_CMDLINE_LINUX_DEFAULT.

Fixing that got power-cycling using the front-panel power button working, but exposed Problem Two — choosing “Sleep” in KDE’s shutdown menu (or on the icon bar) or pressing the dedicated sleep (“⏾”) button on the keyboard (Dell KB522) successfully momentarily blanked the screen but then immediately woke up again to the login prompt. But I discovered that executing pm-suspend worked (the system powered-down, the power LED started blinking, and any keyboard key woke it up again), as did executing /lib/systemd/systemd-sleep suspend.

So something must have been getting systemd in a pickle in-between it deciding to suspend, and it actually triggering the kernel-level suspending (which it does in /usr/lib/systemd/system/systemd-suspend.service). Eventually I found advice to check journalctl | grep logind and it showed this:

Apr 01 10:06:23 amd64 systemd-logind[806]:\
      Error during inhibitor-delayed operation (already returned success to client): Unit nvidia-suspend.service is masked.
Apr 01 10:14:28 amd64 systemd-logind[806]: Suspend key pressed.
Apr 01 10:14:28 amd64 systemd-logind[806]:\
      Error during inhibitor-delayed operation (already returned success to client): Unit nvidia-resume.service is masked.

This PC has an Nvidia graphics card (it’s a GeForce 1050Ti / GP107), but it uses the Nouveau driver, not the proprietary Nvidia one to which those suspend and resume services belong. And that turned out to be the issue: with those services masked (due to the driver not being present), there were dangling symlinks to their service files present as /etc/systemd/system/systemd-suspend.service.requires/nvidia-{suspend,resume}.service. Removing those dangling symlinks made both the Sleep menu option and the keyboard button work.

It’s possible that I once had the Nvidia proprietary driver installed (dpkg -S isn’t prepared to own up to who put those Nvidia symlinks there) but that, being “configuration files”, removing the driver didn’t remove them.

If you had to name the two most controversial parts of a modern-day Linux install, I think you’d probably come up with (1) systemd and (2) the proprietary Nvidia drivers. I’m not usually a follower of internet hate mobs, but I do have to say: it turned out that the issue was an interaction between systemd and the proprietary Nvidia drivers, which weren’t even installed.

SEO bait: Kubuntu 22.04, Ubuntu 22.04, won't sleep, suspend fails, wakes up, systemd-suspend, pm-suspend, solved.

System-testing embedded code in Rust, part three: A CI test-runner

Previously on #rust:

     
Thanks to earlier parts of this series, developers of Cotton now have the ability to run automated system-tests of the embedded builds using their own computer as the test host — if they have the right STM32F746-Nucleo development board to hand. What we need to do now, is add the ability for continuous integration to run those tests automatically on request (e.g. whenever a feature branch is pushed to the central git server). For at least the third time on this blog, we’re going to employ a Raspberry Pi 3; the collection of those sitting unused in my desk drawer somehow never seems to get any smaller. (And the supply shortages of recent years seem to have abated.)

First, set up Ubuntu 22.04 with USB boot and headless full-disk encryption in the usual way.

OK, perhaps doing so isn’t all that everyday (that remark was slightly a dig at Larousse Gastronomique, at least one recipe in which starts, “First, make a brioche in the usual way”). But this very blog has exactly the instructions you need. This time, instead of a 512GB Samsung USB SSD, I used a 32GB Sandisk USB flash drive — the test-runner won’t be needing tons of storage.

Alternatively, you could use any other reasonable computer you’ve got lying around — an Intel NUC, say, or a cast-off laptop. Most or all of the following steps will apply to any freshly-installed Ubuntu box, and many of them to other Linux distributions too. But you’ll have the easiest time of it if your CI test-runner has the same architecture and OS as your main CI server, which is another reason I’m pressing yet another Raspberry Pi into service here.

However you get there, what you need to proceed with the rest of this post (at least the way I proceeded) is:

  • An existing Laminar CI server, on your local Ethernet.
  • A Raspberry Pi to become the test-runner;
  • with Ubuntu 22.04 freshly installed;
  • plugged into your local Ethernet, able to DHCP (I gave mine a fixed address in OpenWRT’s DHCP server setup) — or, at least, somewhere where the CI server can SSH to it;
  • with your own user, that you can SSH in as, and use sudo.
  • A USB-to-Ethernet adaptor (I used the Amazon Basics one);
  • an Ethernet switch, at least 100Mbit (I used this TP-Link LS1008);
  • and an STM32F746-Nucleo development board.

In this blog post we’re going to:

  1. Connect everything together, making a separate test network
  2. Automate the remaining setup of the test-runner, using Ansible
  3. Arrange that the CI server can SSH to the test-runner autonomously
  4. Run a trivial CI job on the test-runner
  5. Package up the system-tests and run a real CI job on the test-runner
  6. Go and sit down and think about what we’ve done

1. Connect everything together, making a separate test network

For once, instead of being a whimsical stock photo of something tangentially related, the image at the top of this blog post is an actual photograph of the actual physical items being discussed herein (click to enlarge if need be). The only connections to the outside world are from my home network to the Raspberry Pi’s built-in Ethernet (lower left) and power to the Raspberry Pi and to the Ethernet switch (top left and centre bottom). The test network is otherwise self-contained: the STM32 development board is on a private Ethernet segment with just the Raspberry Pi’s USB Ethernet for company. This network has its own RFC1918 address range, 192.168.3.x, distinct from the rest of the home network. (The Wifi interface on the Raspberry Pi is not currently being used.)

The breadboard is attached to the Raspberry Pi’s GPIO connector, and at present is only used to provide a “testing in progress” LED attached to GPIO13 (glowing white in the photo, but therefore hard to see). GPIO usage could become more sophisticated in the future: for instance, if I was writing a HAL crate for a new embedded device, I could connect the Raspberry Pi’s GPIO inputs to the GPIO outputs of the embedded device (and vice-versa) and system-test my GPIO code.

The Raspberry Pi programs the STM32 over USB; apart from that and the Ethernet adaptor, the USB flash drive also takes up one of the four USB sockets, leaving just one spare for future enhancements. (But hubs are a thing, of course.)

2. Automate the remaining setup of the test-runner, using Ansible

The initial setup of a new Ubuntu installation is arguably best done manually, as you might need to react to things that the output of the commands is telling you. But once the basics are in place, the complexities of setting up everything else that the test-runner needs, are best automated — so they can be repeatable, documented, and explainable to others (in this case: you).

Automating server setup is not new news to cloud engineers, who often use tools such as Chef or Puppet to bring one or more new hosts or containers up-to-speed in a single command. Electric Imp used Chef, which I never really got on with, partly because of the twee yet mixed metaphors (“Let’s knife this cookbook — solo!”, which is a thing all chefs say), but mostly because it was inherently bound up with Ruby. Yet I felt I needed a bit more structure than just “copy on a shell script and run it”. So for #homelab purposes, I thought I’d try Ansible.

Ansible is configured using YAML files, which at least is one step up from Ruby as a configuration language. The main configuration file is called a “playbook”, which contains “plays” (think sports, not theatre), which are in turn made up of individual “tasks”. A task can be as simple as executing a single shell command, but the benefit of Ansible is that it comes with a huge variety of add-ons which allow tasks to be written in a more expressive way. For instance, instead of faffing about to automate cron, there’s a “cron” add-on which even knows about the @reboot directive and lets you write:

    - name: init gpios on startup
      cron:
        name: init-gpios
        special_time: reboot
        job: /usr/local/bin/init-gpios

Tasks can be linked to the outcome of earlier tasks, so that for instance it can restart the DHCP server if, and only if, the DHCP configuration has been changed. With most types of task, the Ansible configuration is “declarative”: it describes what the situation ought to be, and Ansible checks whether that’s already the case, and changes things only where they’re not as desired.

Ansible can be used for considerably more complex setups than the one in this blog post — many, many servers of different types that all need different things doing to them — but I’ve made an effort at least to split up the playbook into plays relevant to basically any machine (hosts: all), or ones relevant just to the Raspberry Pi (hosts: raspberrypis), or ones specific to the rôle of being a system-test runner (hosts: testrunners).

I ended up with 40 or so tasks in the one playbook, which between them install all the needed system packages as root, then install Rust, Cargo and probe-rs as the laminar user, then set up the USB Ethernet adaptor as eth1 and run a DHCP server on it.

Declaring which actual machines make up “all”, “testrunners”, etc., is the job of a separate “inventory” file; the one included in the repository matches my setup at home. The inventory file is also the place to specify per-host information that shouldn’t be hard-coded into the tasks: in this case, the DHCP setup tasks need to know the MAC address of the USB Ethernet adaptor, but that’s host-specific, so it goes in the inventory file.

All the tasks in the main YAML file are commented, so I won’t re-explain them here, except to say that the “mark wifi optional”, “rename eth1”, and “set static IP for eth1” tasks do nothing to dispel my suspicion that Linux networking is nowadays just a huge seven-layer dip of xkcd-927 all the way down, with KDE and probably Gnome only adding their own frothy outpourings on top.

I added a simple Makefile to the systemtests/ansible directory, just to automate running the one playbook with the one inventory.

The name “Ansible” comes originally from science-fiction, where it’s used to mean a device that can communicate across deep space without experiencing speed-of-light latency. I only mention this because when communicating across about a metre of my desk, it’s bewilderingly slow — taking seconds to update each line of sshd_config. That’s about the same as speed-of-light latency would be if the test-runner was on the Moon.

But having said all that, there’s still value in just being able to re-run Ansible and know that everything is set consistently and repeatably.

I did wonder about running Ansible itself under CI — after all, it’s software, it needs to be correct when infrequently called upon, so it ought therefore to be automatically tested to avoid bugs creeping in. But running Ansible needs root (or sudo) access on the test-runner, which in turn means it needs SSH access to the test-runner as a user which is capable of sudo — and I don’t want to leave either of those capabilities lying around unencrypted-at-rest in CI. So for the time being it’s down to an unreliable human agent — that’s me — to periodically run make in the ansible directory.

3. Arrange that the CI server can SSH to the test-runner autonomously

Most of the hosts in this series of #homelab posts are set up with, effectively, zero-trust networking: they can only be accessed via SSH, and all SSH sessions start with me personally logged-in somewhere and running ssh-agent (and using agent-forwarding). But because the CI server needs to be able to start system-test jobs on the test-runner, it needs to be able to login via SSH completely autonomously.

This isn’t as alarming as it sounds, as the user it logs into (the laminar user on the test-runner) isn’t very privileged; in particular, it’s not in the sudo group and thus can’t use sudo at all. (The Ansible setup explicitly grants that user permissions to the hardware it needs to access.)

Setting this up is a bit like setting up your own SSH for the first time. First generate a key-pair:

ssh-keygen -t ed25519

— when prompted, give the empty passphrase, and choose “laminar-ssh” as the output file. The public key will be written to “laminar-ssh.pub”.

The public key needs to be added to ~laminar/.ssh/authorized_keys (American spelling!) on the test-runner; the Ansible setup already does this for my own CI server’s public key.

Once authorized_keys is in place, you can test the setup using:

ssh -i laminar-ssh laminar@scotch

Once you’re happy that it works, copy the file laminar-ssh as ~laminar/.ssh/id_ed25519 on the main CI server (not the test-runner!):

sudo cp ~peter/laminar-ssh ~laminar/.ssh/id_ed25519
sudo chown -R laminar.laminar ~laminar/.ssh
sudo chmod 0700 ~laminar/.ssh

You can test that setup by using this command on the CI server (there should be no password or pass-phrase prompt):

sudo -u laminar ssh scotch

— indeed, you need to do this at least once, in order to reassure the CI server’s SSH client that you trust the test-runner’s host key.

4. Run a trivial CI job on the test-runner

Now that the laminar user on the CI server can SSH freely to scotch, what remains is mostly Laminar setup. This part is adapted very closely from Laminar’s own documentation: we set up a “context” for the test-runner, separate from the default context used by all the other jobs (because the test-runner can carry on when other CPU-intensive jobs are running on the CI server), then add a remote job that executes in that context.

/var/lib/laminar/cfg/contexts/test-runner-scotch.conf
EXECUTORS=1
/var/lib/laminar/cfg/contexts/test-runner-scotch.env
RUNNER=scotch

The context is partly named after the test-runner host, but it also includes the name of the test-runner host as an environment variable. This means that the job-running scripts don’t need to hard-code that name.

As before, the actual job file in the jobs directory defers all the complexity to a generic do-system-tests script in the scripts directory:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run
#!/bin/bash -xe
exec do-system-tests

In keeping with the Laminar philosophy of not poorly reinventing things that already exist, Laminar itself has no built-in support for running jobs remotely — because that’s what SSH is for. This, too, is closely-inspired by the Laminar documentation:

/var/lib/laminar/cfg/scripts/do-system-tests
#!/bin/bash -xe

ssh laminar@$RUNNER /bin/bash -xe << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  uname -a
  run-parts /etc/update-motd.d
  sleep 20
  echo 0 > /sys/class/gpio/gpio13/value
EOF

This, of course, doesn’t run any actual tests, but it provides validation that the remote-job mechanism is working. The LED attached to GPIO13 on the test-runner Raspberry Pi serves as the “testing in progress” indicator. (And the run-parts invocation is an Ubuntu-ism: it reproduces the “message of the day”, the welcome message that’s printed when you log in. Most of it is adverts these days.)

Tying the job to the context is the $JOB.conf file:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.conf
CONTEXTS=test-runner-*

Due to the judicious use of a wildcard, the job can run on any test-runner; my setup only has the one, but if you found yourself in a team with heavy contention for the system-test hardware, this setup would allow you to build a second, identical Raspberry Pi with all the same hardware attached — called shepherds, maybe — and add it as a separate Laminar context. Because Laminar runs a job as soon as any context it’s fit for becomes available, this would automatically split queued system-test jobs across the two test-runners.

With all of this in place, it’s time to trigger the job on the CI server:

laminarc queue cotton-system-tests-dev

After a few teething troubles, including the thing I mentioned above about making sure that the SSH client accepts scotch’s host key, I was pleased to see the “testing in progress” LED come on and the message-of-the-day spool out in the Laminar logs.

5. Package up the system-tests and run a real CI job on the test-runner

We didn’t come here just to read some Ubuntu adverts in the message-of-the-day. Now we need to do the real work of building Cotton for the STM32 target, packaging-up the results, and transferring them to the test-runner where they can be run on the target hardware. First we build:

/var/lib/laminar/cfg/jobs/cotton-embedded-dev.run
#!/bin/bash -xe

PROJECT=cotton
RUST=stable
BRANCH=${BRANCH-main}
SOURCE=/var/lib/laminar/run/$PROJECT/workspace

(
    flock 200
    cd $SOURCE/$PROJECT
    git checkout $BRANCH
    cd -
    cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock

source $HOME/.cargo/env
rustup default $RUST

cd $PROJECT
cargo build -p systemtests -F arm,stm32f746-nucleo --all-targets
cargo test --no-run -p systemtests -F arm,stm32f746-nucleo 2> $ARCHIVE/binaries.txt
grep "Executable tests/" $ARCHIVE/binaries.txt  | cut -d'(' -f 2 | cut -d')' -f 1 > binaries.2.txt

tar cf $ARCHIVE/binaries.tar `find cross/*/target -type f -a -executable \
        | grep -v /deps/ | grep -v /build/` `cat binaries.2.txt`
laminarc queue cotton-system-tests-dev PARENT_RUN=$RUN
exec prune-archives cotton-embedded-dev 10

The actual build commands look much like many of the other Laminar jobs but with the extra Cargo features added which enable the cross-compiled targets; the interesting parts of this script come once the cargo build is done and the results must be tarred-up ready to be sent to the test-runner.

Finding all the target binaries is fairly easy using find cross/*/target, but we also need to find the host binary from the systemtests package. The easiest way to do that is to parse the output of cargo test --no-run, which includes lines such as:

   Compiling systemtests v0.0.1 (/home/peter/src/cotton/systemtests)
    Finished test [unoptimized + debuginfo] target(s) in 2.03s
  Executable unittests src/lib.rs (target/debug/deps/systemtests-4a9b67de54149231)
  Executable tests/device/main.rs (target/debug/deps/device-cfdcb3ff3e5eaaa5

The line with “Executable tests” is the one we’re looking for. (The string of hex digits after the executable name changes every time the sources change.) It’s possible that we could “cheat” here and just pick the first file we find starting with target/debug/deps/device-, as this is CI so we’re always building from clean — but this is a more robust way of determining the most recent binary.

(You might feel that this section is a bit of a cop-out, a bit white-box: knowing that there’s only one host-side binary does make the packaging a lot easier. If there were a lot of host-side binaries to package, and this technique started creaking at the seams, I’d look into cargo-nextest which has features specifically designed for packaging and unpackaging suites of Cargo tests.)

Once everything the system-tests job will need is stored in $ARCHIVE/binaries.tar, we can trigger the system-tests job — making sure to tell it, in $PARENT_RUN, which build in the archive it should be testing. (Initially I had the system-tests job use “latest”, but that’s wrong: it doesn’t handle multiple queued jobs correctly, and has a race condition even without queued jobs. The “latest” archive is that of the most recent successfully-finished job — but the build job hasn’t yet finished at the time it triggers the test job.)

The final prune-archives command is something I added after the initial Laminar writeup when some of the archive directories (particularly doc and coverage) started getting big: it just deletes all but the most recent N non-empty archives:

/var/lib/laminar/cfg/scripts/prune-archives
#!/bin/bash -xe

PROJECT=$1
KEEP=${2-2}

cd /var/lib/laminar/archive/$PROJECT
for d in `find * -maxdepth 0 -type d -a \! -empty | sort -n | head -n -$KEEP`; do
    rm -r /var/lib/laminar/archive/$PROJECT/$d
done

No-one likes deleting data, but in this case older archives should all be recoverable at any time, if the need arises, just by building the source again at that revision.

Now the cotton-system-tests-dev.run job needs to pass the $PARENT_RUN variable on to the underlying script:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run
#!/bin/bash -xe

exec do-system-tests cotton-embedded-dev $PARENT_RUN

and the do-system-tests script can use it to recover the tarball and scp it off to the test-runner:

/var/lib/laminar/cfg/scripts/do-system-tests
#!/bin/bash -xe

PARENT_JOB=$1
PARENT_RUN=$2

scp /var/lib/laminar/archive/$PARENT_JOB/$PARENT_RUN/binaries.tar laminar@$RUNNER:

ssh laminar@$RUNNER /bin/bash -xeE << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  cleanup_function() {
    echo 0 > /sys/class/gpio/gpio13/value
    exit 1
  }
  trap 'cleanup_function' ERR
  export PS4='+ \t '
  export PATH=/home/laminar/.cargo/bin:$PATH
  rm -rf tests
  mkdir -p tests/systemtests
  ( cd tests
    tar xf ../binaries.tar
    cd systemtests
    export CARGO_MANIFEST_DIR=`pwd`
    ../target/debug/deps/device-* --list
    ../target/debug/deps/device-* --test
  )
  echo 0 > /sys/class/gpio/gpio13/value
EOF

The rest of the script has also gained in features and complexity. It now includes a trap handler to make sure that the testing-in-progress LED is extinguished even if the tests fail with an error. (See here for why this requires the -E flag to /bin/bash.)

The script goes on to add timestamps to the shell output (and thus to the logs) by adding \t to PS4, and add the Cargo bin directory to the path (because that’s where probe-rs got installed by Ansible).

The tests themselves need to be executed as if from the systemtests directory of a full checkout of Cotton — which we don’t have here on the test-runner — so the directories must be created manually. With all that in place, we can finally run the host-side test binary, which will run all the device-side tests including flashing the STM32 with the binaries, found via relative paths from $CARGO_MANIFEST_DIR.

That’s a lot, but it is all we need to successfully list and run all the device-side tests on our test-runner. Here’s (the best part of) the Laminar logs from a successful run:

+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --list
stm32f746_nucleo::arm_stm32f746_nucleo_dhcp: test
stm32f746_nucleo::arm_stm32f746_nucleo_hello: test
stm32f746_nucleo::arm_stm32f746_nucleo_ssdp: test

3 tests, 0 benchmarks
+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --test

running 3 tests
test stm32f746_nucleo::arm_stm32f746_nucleo_dhcp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 66.37s

+ 16:48:00 echo 0

And now that everything’s working, we can add it to the chain of events that’s triggered whenever a branch is pushed to the CI server:

/var/lib/laminar/cfg/job/cotton-dev.run
#!/bin/bash -xe

BRANCH=${BRANCH-main}
do-checkout cotton $BRANCH
export LAMINAR_REASON="built $BRANCH at `cat git-revision`"
laminarc queue \
	 cotton-embedded-dev BRANCH=$BRANCH \
	 cotton-doc-dev BRANCH=$BRANCH \
         cotton-grcov-dev BRANCH=$BRANCH \
         cotton-msrv-dev BRANCH=$BRANCH \
         cotton-beta-dev BRANCH=$BRANCH \
         cotton-nightly-dev BRANCH=$BRANCH \
	 cotton-minver-dev BRANCH=$BRANCH

If you’re wondering about the -dev suffix on all those jobs: I set up Laminar with two parallel sets of identical build jobs. There’s the ones with -dev which are triggered when pushing feature branches, and the ones without the -dev which are triggered when pushing to main. This is arranged by the Git server post-receive hook:

git/cotton.git/hooks/post-receive
#!/bin/bash -ex

while read oldrev newrev ref
do
    if [ "${ref:0:11}" == "refs/heads/" -a "$newrev" != "0000000000000000000000000000000000000000" ];
    then
        export BRANCH=${ref:11}
        export LAMINAR_REASON="git push $BRANCH"
        if [ "$BRANCH" == "main" ];
        then
           laminarc queue cotton BRANCH=$BRANCH
        else
           laminarc queue cotton-dev BRANCH=$BRANCH
        fi
    fi
done

In a sense there’s no engineering need for this duplication: exactly the same actual work is undertaken in either case. But there’s a social need: feature branches (with -dev) aren’t expected to always pass all tests — indeed, such branches are often pushed for the express purpose of determining whether or not they pass all tests. But the main branch is expected to pass all tests, all the time, and a regression is to be taken seriously. That is: if the cotton-dev build or one of its downstreams fails, that’s got a very different implication from the cotton build or one of its downstreams failing. The cotton build itself should enjoy long, uninterrupted streaks of regression-free passes (and indeed it does; the last failure was in November 2023 due to this issue causing intermittent unit-test failures).

6. Go and sit down and think about what we’ve done

Well, what have we done? We have, over the course of three blog posts, taken a bit of knowledge of bash, cron, and SSH that we already had, then gone and learned a bit about Laminar, Ansible, and Cargo, and the “full stack” that we then engineered for ourselves using all that knowledge is this: any time we push a feature branch, we (eventually) get told automatically, as a straight yes/no answer, whether it’s OK for main or not. That’s an immensely powerful thing to have, an immensely useful property to have an oracle for.

Having that facility available is surely expected these days in other areas of software engineering — where test-harnesses are easier to create — but I hope I’ve demonstrated in these blog posts that even those working on embedded systems can also enjoy and benefit from the reliability (of main in particular) that’s due to this type of workflow.

(Other versions of the workflow could be constructed if preferred. Perhaps you don’t want every push to start a system-test sequence — in that case, you could either write a more complex Git post-receive hook, or set up an alternative Git remote, called perhaps “tests”, so that pushing to that remote kicked off the test sequence. Or you could tie the test sequence in to your pull-request process somehow.)

To repeat a line from the previous instalment, any software you need to test can be tested automatically in some way that is superior to not testing it at all. Compared to Rust’s unit tests, which are always just a cargo test away, it took us three quite detailed blog posts and a small pile of physical hardware to get us to the stage where this embedded code could be tested automatically. If I had to distill the message of these three posts from their ~11,000 words down to just six, they’d be: this sort of effort is worthwhile. If your product is, or runs on, a specific platform or piece of hardware, it’s worth spending a lot of effort arranging to test automatically on the actual target hardware, or as near to the actual target as is practical. (Sometimes embedded products are locked-down, fused-off, potted, or otherwise rendered inaccessible; testing, in that case, requires access to unlocked variants.)

That is to say: is the effort worth it for the cotton-ssdp crate — a few thousand lines of code, about 60% of which is already tests, and the rest of which has 100% test coverage? Arguably yes, but also arguably no, especially as almost all of cotton-ssdp can be tested in hosted builds. The cotton-ssdp crate has acted here more as a spike, a proof-of-concept. But the point is, the concept was proved, a baseline has now been set, and all the right testing infrastructure is in place if I want to write a power-aware RTOS, or implement HAL crates for some of these weird development boards in my desk drawer, or if I want to disrupt the way PAC crates are generated in order to improve the testing story of HAL crates. Now when I start doing those things, I can start defending the functionality with system-tests from the outset. If I want to do those more complex, more embedded-centric things — which I do — then all the effort expended so far will ultimately be very beneficial indeed. If you, too, aim to do complex or embedded-centric things, then similar levels of effort will benefit your projects.

6.1 War stories from the front lines of not-system-testing

I have some war stories for you. I heard tell of a company back in the day whose product, a hardware peripheral device, was (for sound commercial reasons) sold as working with numerous wonky proprietary Unixes. But for some of the more obscure platforms, there had been literally no automated testing: a release was declared by development, it was thrown over the wall into the QA department, and in due course a human tester would physically insert the CD-R into one of these wonky old machines and manually run through a test script ensuring that everything appeard to work. This was such a heavyweight process that it was run very rarely — meaning that, if an issue was found on, say, AIX, then the code change that caused it probably happened months ago and with significant newer work built on top of it. And of course such a discovery at “QA time” meant that the whole, lumbering manual release process had to be started again from scratch once the issue was fixed. This was exactly the pathology that CI was invented to fix! I mean, I’m pretty sure Laminar doesn’t support antediluvian AIX versions out of the box, but given the impact of any issues on release schedules, it was definitely worth their putting in quite significant development effort to bring the installation process under CI — automatically on main (at least nightly, if not on every push), and by request on any feature branch. (Developers need a way to have confidence they haven’t broken AIX before merging to main.) They should have asked themselves, “What can be done to automate testing of the install CD, in some way that is superior to not testing it at all?” — to place it under the control of a CI test-runner, as surely as the STM32F746-Nucleo is under the control of this Raspberry Pi? Well — what’s the simplest thing that can act as a fake CD-ROM drive well enough to fool an AIX box? Do those things have USB host? Can you bitbang SCSI-1 target on a Raspberry Pi? Would a BlueSCSI help? Or even if they were to decide that “CI-ing” the actual install CD image is too hard — how often are the failures specifically CD-related? Could they just SSH on the installer tarballs and test every other part of the process? How did this pathology last for more than one single release?

I also heard tell of a different company whose product was an embedded system, and was under automated test, including before merging to main — but following a recent “urgent” porting exercise (again for sound commercial reasons), many of the tests didn’t pass. The test harness they used supported marking tests as expected-failure — but no-one bothered doing so. So every test run “failed”, and developers had to manually pore over the entire list of test results to determine whether they’d regressed anything. In a sense the hard part of testing was automated, but the easy part not! This company had put in 99% of the effort towards automated testing, but were reaping just a tiny fraction of the benefits, because the very final step — looking at the test results and turning them into a yes/no for whether the code is OK for main — was not automated. How did this pathology last for more than one single afternoon?

6.2 People

The rhetorical-looking questions posed above about the AIX CI and the expected-fail tests (“How did these obviously wrong situations continue?”) did in fact have an answer in the real world. Indeed, it was the same answer in both cases: people.

In the first company, the head of QA presided over a large and (self-)important department — which it needed to be in order to have enough staff to manually perform quite so much automatable work. If QA was run with the attitude that human testers are exploratory testers, squirrelers-out of corner-cases, stern critics of developers’ assumptions — if the testers’ work-product, including during early-development-phase involvement, was more and better automated tests — then they probably wouldn’t need anything like so many testers, and the rôle of “Head of QA” would perhaps be viewed as a less big cheese than hitherto. Although the product quality would benefit, the company’s bottom-line would benefit, and even the remaining testers’ career-progression would benefit — the Head of QA’s incentives were misaligned with all of that, and they played the game they were given by the rules that they found to be in effect.

The second company seems harder to diagnose, but fundamentally the questions are, “Who is in charge of setting the quality bar for merges to main?” and “Who is in charge of what happens when that bar is not met?”. Those are likely to be two different people — they require very different skills — but if you find that a gulf is opening up between your team’s best practices and your team’s typical practices, then both those people are needed in order to bring the two closer together again. (In my own career I’ve a few times acted as the first one, but I’ve never acted as the second one: as Dr. Anthony Fauci famously didn’t say, “I don’t know how to explain to you that you should care for other people software quality.”)

This post, and this blog, and this blogger, cannot help you to deal with people. But often the talking point of the people you need to convince (or whose boss you need to convince to overrule them) is that better automated testing isn’t technologically feasible, isn’t worth attempting. I hope I’ve done a little to dispel that myth at least.

Monday, 18 March 2024

System-testing embedded code in Rust, part two: Things I learned testing SSDP

Previously on #rust:

     
With the basic system-test infrastructure now in place thanks to the previous post in this series, it’s time to wire the STM32F746-Nucleo development board up to Ethernet and start testing actual code: the cotton-ssdp crate.

Having said that, in fact the first thing to do after plugging-in Ethernet, is to write a test that verifies basic connectivity. If packets can’t flow between the Nucleo and the rest of the network for any reason, then there’s no point disparaging the SSDP code for its failure to communicate. The basic test will establish that simple networking is operational: that the Ethernet interface sees link (i.e., that the Ethernet cable is actually connected to the Nucleo, and also to something at the other end), and also that DHCP can succeed and the Nucleo obtain an IP address.

And of course writing that code isn’t throwaway effort: every other network-related test will need to do all those things first before getting on with more specific tasks. So the setup code will form part of all subsequent SSDP test binaries.

As always, I encountered problems along the way, because Rust. But all of those problems eventually had solutions, also because Rust.

The aim of the tests

It’s worth just going over what the goals are here. Why spend development time on writing these tests, and on cabling up these test rigs? What is the payback?

My answer is, that I’d like to be able to work on cotton-ssdp, and eventually other similar crates, knowing that I have automated testing that verifies new functionality and defends against regressions in existing functionality. When I push a new feature branch to the CI server, I want it to tell me, as straightforwardly and clearly as possible, whether or not my changes are OK for main.

I don’t think the testing constitutes a promise that the code is completely bug-free (even less, that the functionality it provides is objectively useful). But it does, at the very least, constitute a promise that certain types or certain show-stopping severities of bug are absent. Tom DeMarco, writing in Peopleware, describes Gilb’s Law: “Anything you need to quantify can be measured in some way that is superior to not measuring it at all”. Something similar applies here: any software you need to test can be tested automatically in some way that is superior to not testing it at all.

As a target, “superior to doing nothing” is not a very high bar to clear. These system-tests aren’t very comprehensive in terms of, say, line coverage of the cotton-ssdp crate. But then the crate, after all, is thoroughly unit-tested. These system-tests are more about testing the platform integration code — here, with the RTIC real-time operating system, and the smoltcp networking stack — and about systematically verifying the crate’s original goal of being useful to implementers of embedded systems.

Concretely, the tests presented here really just check that the device can discover resources on the network, and advertise its own resources. Once that two-way communication is proven to work, everything else about exactly what is communicated, is already covered by the unit tests.

I’m not saying that the existence of a unit test automatically renders useless any system testing of the same function. Unit tests and system tests have different (human, organisational) readership — typically, unit tests are only interesting to developers, whereas system tests are often high-level enough and visible enough to serve as technology demonstrations to project managers and beyond — and both audiences are entitled to ask for and to see evidence of all claimed functionality. But in this case, the developers, project managers and beyond are all me, and anyway adding a huge variety of tests would clog up the narrative of this blog post, which focuses more on describing the framework.

Being a good citizen of Ethernet: MAC addresses

We’ll need to start by getting this Nucleo board onto the local Ethernet. In order to participate in SSDP, it’s going to need an IP address, as handed out by the DHCP server in my router. But in order to even participate in Ethernet enough to communicate with the DHCP server, it’s going to need its own Ethernet address. This is also called a hardware address or MAC address, it’s 48 bits (six bytes) long, every device on an Ethernet network has one, and it’s often printed on the back of routers and suchlike as twelve hex digits separated by colons. Some networking hardware comes with an officially-allocated MAC address built-in (as a company, you can get ranges of them allocated to you, like IP addresses) — but STM32s don’t, probably because ST Micro sell a lot of STM32s, many of them into designs (such as the Electric Imp) where they never even use their Ethernet circuitry, and it’d be a waste of a finite resource for ST Micro to allocate each one its own MAC address from the fixed pool.

For our purposes it’d be overkill to get an official address block allocated (though you’d need to if you intended to sell actual products), so it’s fortunate that an alternative way of obtaining an address is possible. One of those 48 bits is set to zero in every official (“Universally Administered”) address, but can be set to one to indicate a “Locally Administered” address, i.e. one chosen by the local network administrator. Which is also me! So in the tests, we set that bit, then pick a device-specific value for the other 47 bits (in fact 46 as there’s another reserved one), and use the result as our MAC address. So long as the 46 device-specific bits are chosen randomly enough, the chance of an accidental collision is suitably negligible.

So we need a calculation that always provides the same answer when performed on the same device, but always provides different answers when performed on different devices. Fortunately each STM32 does include a unique chip ID, burned into each individual die at chip-manufacture time, as described in the STM32F74x reference manual (“RM0385”) section 41.1.

But it’s not a good idea to just use the raw chip ID as the MAC address, for several reasons: it’s the wrong size, it’s quite predictable (it’s not 96 random bits per chip, it encodes the die position on the wafer, so two different STM32s might have IDs that differ only in one or two bits, meaning we can’t just pick any 46 bits from the 96 in case we accidentally pick seldom-changing ones) — and, worst of all, if anyone were to use the same ID for anything else later, they might be surprised if it were very closely correlated with the device’s MAC address.

So the thing to do, is to hash the unique ID along with a key, or salt, which indicates what we’re using it for. You can see this on Github or right here:

pub fn stm32_unique_id() -> &'static [u32; 3] {
    // SAFETY: this address only valid when running on STM32
    unsafe {
        let ptr = 0x1ff0_f420 as *const [u32; 3];
        &*ptr
    }
}

pub fn unique_id(salt: &[u8]) -> u64 {
    let id = stm32_unique_id();
    let key1 = (u64::from(id[0]) << 32) + u64::from(id[1]);
    let key2 = u64::from(id[2]);
    let mut h = siphasher::sip::SipHasher::new_with_keys(key1, key2);
    h.write(salt);
    h.finish()
}

pub fn mac_address() -> [u8; 6] {
    let mut mac_address = [0u8; 6];
    let r = unique_id(b"stm32-eth-mac").to_ne_bytes();
    mac_address.copy_from_slice(&r[0..6]);
    mac_address[0] &= 0xFE; // clear multicast bit
    mac_address[0] |= 2; // set local bit
    mac_address
}

This is quite a general mechanism for producing device-specific but deterministic unique IDs; it’s also used by the SSDP test for generating the UUID for the example resource. And it can be made more sophisticated by hashing-in more information. Want a unique, but deterministic, Wifi MAC address that’s per-network to defeat tracking? Hash in the Wifi network name, or the router's MAC address. (It’s harder to avoid tracking on Ethernet, because you need the MAC address before you even start using DHCP, i.e. before you know anything about the network you’re joining. But of course passive tracking isn’t really a threat model on Ethernet, where you must have chosen to plug in the cable.) Want different UUIDs for different UPnP services? Hash in the service name. One thing you mustn’t do, though, is to use these IDs as cryptographic identifiers, as the hash function in question hasn’t been analysed for the collision-resistance and irreversibility properties which you need for such identifiers.

Spike then refactor: Adding smoltcp support

Getting the cotton-ssdp crate tested on embedded devices, was always going to involve porting it to use smoltcp as an alternative to the standard-library socket implementation that it currently uses. I fondly imagined that the result would look like the idealised “triangle of initialisation”:

fn main() {
    let mut a = A::new();
    let mut b = B::new(&a);
    let mut c = C::new(&a, &b);
    let mut d = D::new(&a, &b, &c);
    ...
    do_the_thing(&a, &b, &c, &d, ...);
}

so, in this case, perhaps:

fn main() {
    let mut stm32 = Stm32::<STM32::F746, STM32::Power::Reliable3V3>::default();
    let mut smoltcp = Smoltcp::new(stm32.ethernet());
    run_dhcp_test(&mut stm32, &mut smoltcp);
}

I didn’t quite get to that level of neatness — there’s a lot of boilerplate in most embedded software, and anyway that pseudocode appears to allocate everything on the stack, where overruns are a runtime failure, as opposed to being in the data segment, where overrun would be (as an improvement) a link-time failure.

But the first version of the DHCP test was over 550 lines of boilerplate, so most of the commits on the pdh-stm32-ssdp branch are just about tidying away the common code to make the intent of the test more obvious.

Once DHCP was working, implementing a simple test for SSDP could commence. (And adding a second network-related test spurred the factoring-out of yet more common code between the two.) Rather than immediately wade in to changing cotton-ssdp itself, though, I was able to use the exposed networking traits which cotton-ssdp already contained, to start implementing smoltcp support directly in the test. This was a fortuitous case of “pre-adaptation”: the abstractions that let cotton-ssdp’s core be agnostic on the matter of mio versus tokio, turned out to be exactly those needed for also abstracting away smoltcp. (Well, okay, that wasn’t completely fortuitous, as I did have embedded systems in mind even when developing hosted cotton-ssdp.)

Once the test was working, I could then move those trait implementations into cotton-ssdp, hiding them behind a new Cargo feature smoltcp in order not to introduce needless new dependencies for those using cotton-ssdp on hosted platforms.

I’m not sure I’m quite your (IPv4) type

If you look at the cotton-ssdp additions for smoltcp support, you’ll see that only a small part of it (lines 220-300) is the actual smoltcp API usage. A much larger part is taken up by conversions between types representing IP addresses.

The issue is, that networking APIs are typically system-specific, and not present on many embedded systems, so the Rust people very sensibly and usefully left those functions out of the embedded, no_std configuration. But, a bit less usefully for our purposes, they also left out the types representing IPv4 and IPv6 addresses. These types are not platform-specific — they’re straight from the RFCs — but, as they weren’t available to no_std builds of smoltcp, the smoltcp people were forced to invent their own versions.

Subsequently a crate called “no-std-net” was released, containing IP address types (structurally) identical to the standard-library ones (when built with no_std and just renaming the standard-library ones when built hosted. The cotton-ssdp crate uses the no-std-net names.

Now in fairness the Rust people did then realise that this situation wasn’t ideal, and standardised types are set to land in Rust 1.77 — but that’s much newer than most people’s minimum-supported-Rust-version (MSRV), and anyway tons of smoltcp users in the field are still using the smoltcp versions. So conversions were necessary.

Rust has very well-defined idioms for type conversion, via implementing the From trait — so on the face of it, all that’s needed is to implement From for the standard types on the smoltcp types, and vice versa. That doesn’t work, though, because of Rust’s “Orphan Rule”, which allows trait implementations only in the crate defining the trait or the one defining the type being extended — and cotton doesn’t define From, std, or smoltcp. The best we can do, it seems, is to invent yet another set of “Generic” IP address types so that we can define the conversions both ways. This leads to lots of double-conversions; stm32f746-nucleo-ssdp-rtic has the likes of

no_std_net::IpAddr::V4(GenericIpv4Address::from(ip).into())

and

GenericSocketAddr::from(sender.endpoint).into()

but it keeps the core code looking sane.

Parts of the generic-types code are more complex than I’d hoped because the smoltcp stack itself can be built either with or without IPv6 (and IPv4 for that matter), using Cargo features. As it stands, cotton-ssdp only uses IPv4, so it depends on smoltcp with the IPv4 feature. But users of cotton-ssdp will of course be using it in some wider system, and, because of the way Rust’s crate feature resolution works, if any part of that system enables the IPv6 feature of smoltcp, then everyone gets a smoltcp with IPv6 enabled. But the enumerated IP address types inside smoltcp — such as wire::Endpointchange if IPv6 is enabled! That means that cotton-ssdp has no way of knowing whether its usage of smoltcp will result in it getting handed the one-variant version of those enumerated types, or two-variant versions. That’s why the conversions accepting those smoltcp types must look like this:

        // smoltcp may or may not have been compiled with IPv6 support, and
        // we can't tell
        #[allow(unreachable_patterns)]
        match ip {
            wire::IpAddress::Ipv4(v4) => ...
            _ => ...
        }

in order to compile successfully however many variants ip has; if there’s only one variant, it needs the allow() in order to compile, but if there’s two or more, it needs the “_ =>” in order to compile.

“Don’t Panic” in large, friendly letters

The first version of the DHCP test worked much like the existing “Hello World” test, which was refactored to match: spawn probe-run as a child process, listen to (and run tests against) its trace output, and then shut it down (in a Drop handler) once all tests have passed.

This was a plausible first stab, and worked well every time that the tests passed, but turned out not to be sound if a test ever failed. (Edit: A previous version of this page blamed the way that a failing test panics, for not unwinding the stack, meaning Drop handlers don’t run. But that’s not actually the case — panics do unwind the stack and do run Drop handlers — so it’s now not clear why the following change was needed. But at least it’s still a valid way of doing it, even though it’s not required.)

I found the solution on the Eric Opines blog: run the test inside panic::catch_unwind(). That function takes a closure to run, and returns a Result indicating either successful exit or a (contained) panic. This required refactoring the DeviceTest struct to also use a closure, but actually the resulting tests look quite neat and well-defined — here’s “Hello World”:

fn arm_stm32f746_nucleo_hello() {
    nucleo_test(
        "../cross/stm32f746-nucleo/target/thumbv7em-none-eabi/debug/stm32f746-nucleo-hello",
        |t| {
            t.expect("Hello STM32F746 Nucleo", Duration::from_secs(5));
        },
    );
}

The two parameters to nucleo_test() are of course the compiled binary to run — remembering that the build system ensures that these are up-to-date before starting the test — and a closure containing the body of the test. The closure gets passed the DeviceTest object, on which the available methods are expect(), which waits up to the given timeout for a message to appear on the device’s (virtualised, RTT) standard output (and if the timeout elapses without seeing it, fails the test), and expect_stderr() which does just the same for the device’s standard-error stream.

Because panic::catch_unwind() requires its closure to be “unwind-safe”, so does nucleo_test(); so far, this hasn’t been an issue in practice, so I haven’t looked deeply into what to do about it otherwise. The DHCP test, at least on the host side, is just as straightforward:

fn arm_stm32f746_nucleo_dhcp() {
    nucleo_test(
        "../cross/stm32f746-nucleo/target/thumbv7em-none-eabi/debug/stm32f746-nucleo-dhcp-rtic",
        |t| {
            t.expect_stderr("(HOST) INFO  success!", Duration::from_secs(30));
            t.expect("DHCP config acquired!", Duration::from_secs(10));
        },
    );
}

The closure pattern was so appealing that I made the SSDP test use the same design — not least to ensure that it, too, was correctly shut down even following a failing test:

fn arm_stm32f746_nucleo_ssdp() {
    nucleo_test(
        "../cross/stm32f746-nucleo/target/thumbv7em-none-eabi/debug/stm32f746-nucleo-ssdp-rtic",
        |nt| {
            nt.expect_stderr("(HOST) INFO  success!", Duration::from_secs(30));
            nt.expect("DHCP config acquired!", Duration::from_secs(10));
            ssdp_test(
                Some("cotton-test-server-stm32f746".to_string()),
                |st| {
                    nt.expect("SSDP! cotton-test-server-stm32f746",
                              Duration::from_secs(20));
                    st.expect_seen("stm32f746-nucleo-test",
                              Duration::from_secs(10));
                }
            );
        }
    );
}

The implementation of ssdp_test() itself is a little more involved, because it must spawn a temporary background thread to start and run the host’s SSDP engine which communicates with the one on the device. The two parameters are an optional SSDP notification-type to advertise to the device, and a closure to contain the body of the test. Here the available method on the SsdpTest object passed to the closure, is expect_seen(), which waits with a timeout for someone on the network (hopefully, the device under test) to advertise a specific notification-type. Here the nt.expect() line checks that the device has seen the host’s advertisement, and the st.expect_seen() line checks that the host has seen the one from the device.

Those two events can occur in either order in practice, but both DeviceTest and SsdpTest buffer-up notifications, so that an expectation that has already come to pass before the expect call is made, completes immediately. In the future it might be interesting to investigate using async/await to express the asynchronous nature of this test more explicitly.

The SSDP test only works if the Nucleo board and the test running on the host, can exchange packets. Typically this means that they must be on the same Ethernet network — or, if the host is on Wifi, that the Wifi network must be bridged to the Ethernet (e.g., by cabling the Nucleo to one of the Ethernet LAN sockets on the Wifi router).

Putting it all together

As of the merge of the pdh-stm32-ssdp branch, the following command runs the system-tests on an attached STM32F746-Nucleo:

cargo test -F arm,stm32f746-nucleo

And for those without a Nucleo, the following commands still work, building all the device code but testing only the host code:

cargo build -F arm,stm32f746-nucleo
cargo test -F arm

And for those without even the cross-compiler installed, which is probably most people, the following commands still work, building and testing only the host code:

cargo build
cargo test
cargo build-all-features --all-targets
cargo test-all-features --all-targets

This all makes it easy for a developer to determine, before pushing to the central git server, whether their branch is likely to be OK for main — but the most definitive answer to that question is only available when going to the trouble of using a local Nucleo development board. If you’re working on something you think probably won’t affect embedded builds, what you really want is to not faff about with development boards (particularly multiple ones): what you want is for continuous integration to perform all of these system-tests as part of its mission to answer the question of whether your branch is OK for main. Adding a CI runner that can run these tests automatically on every push, is the topic of the third post in this series.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.