Whitebait, Kleftiko, Ekmek Special

Not in fact any relation to the famous large Greek meal of the same name.

Tuesday 2 April 2024

Solved: KDE Sleep immediately reawakens (Nvidia vs systemd)

Previously on #homelab:

     
I’ve had this PC, a self-assembled Linux (Kubuntu 22.04) desktop with Intel DX79SR motherboard and Core i7-3930K CPU, since late 2012. And today I finally got sleep working. (Though in fairness it’s not like I spent the entire intervening 11½ years investigating the problems.)

Problem One was that the USB3 controllers never came back after a resume (the USB2 sockets continued to work). And not just resume-after-suspend: USB3 stopped working if I power-cycled using the front panel button instead of the physical switch on the back of the PSU. This turns out to be a silicon (or at least firmware) bug in the suspend modes of the NEC/Renesas uPD720200 and/or uPD720201 controllers on the motherboard; newer firmware is probably available but, last I checked, could only be applied under Windows (not even DOS). The workaround is to edit /etc/default/grub and add usbcore.autosuspend=-1 to GRUB_CMDLINE_LINUX_DEFAULT.

Fixing that got power-cycling using the front-panel power button working, but exposed Problem Two — choosing “Sleep” in KDE’s shutdown menu (or on the icon bar) or pressing the dedicated sleep (“⏾”) button on the keyboard (Dell KB522) successfully momentarily blanked the screen but then immediately woke up again to the login prompt. But I discovered that executing pm-suspend worked (the system powered-down, the power LED started blinking, and any keyboard key woke it up again), as did executing /lib/systemd/systemd-sleep suspend.

So something must have been getting systemd in a pickle in-between it deciding to suspend, and it actually triggering the kernel-level suspending (which it does in /usr/lib/systemd/system/systemd-suspend.service). Eventually I found advice to check journalctl | grep logind and it showed this:

Apr 01 10:06:23 amd64 systemd-logind[806]:\
      Error during inhibitor-delayed operation (already returned success to client): Unit nvidia-suspend.service is masked.
Apr 01 10:14:28 amd64 systemd-logind[806]: Suspend key pressed.
Apr 01 10:14:28 amd64 systemd-logind[806]:\
      Error during inhibitor-delayed operation (already returned success to client): Unit nvidia-resume.service is masked.

This PC has an Nvidia graphics card (it’s a GeForce 1050Ti / GP107), but it uses the Nouveau driver, not the proprietary Nvidia one to which those suspend and resume services belong. And that turned out to be the issue: with those services masked (due to the driver not being present), there were dangling symlinks to their service files present as /etc/systemd/system/systemd-suspend.service.requires/nvidia-{suspend,resume}.service. Removing those dangling symlinks made both the Sleep menu option and the keyboard button work.

It’s possible that I once had the Nvidia proprietary driver installed (dpkg -S isn’t prepared to own up to who put those Nvidia symlinks there) but that, being “configuration files”, removing the driver didn’t remove them.

If you had to name the two most controversial parts of a modern-day Linux install, I think you’d probably come up with (1) systemd and (2) the proprietary Nvidia drivers. I’m not usually a follower of internet hate mobs, but I do have to say: it turned out that the issue was an interaction between systemd and the proprietary Nvidia drivers, which weren’t even installed.

SEO bait: Kubuntu 22.04, Ubuntu 22.04, won't sleep, suspend fails, wakes up, systemd-suspend, pm-suspend, solved.

System-testing embedded code in Rust, part three: A CI test-runner

Previously on #rust:

     
Thanks to earlier parts of this series, developers of Cotton now have the ability to run automated system-tests of the embedded builds using their own computer as the test host — if they have the right STM32F746-Nucleo development board to hand. What we need to do now, is add the ability for continuous integration to run those tests automatically on request (e.g. whenever a feature branch is pushed to the central git server). For at least the third time on this blog, we’re going to employ a Raspberry Pi 3; the collection of those sitting unused in my desk drawer somehow never seems to get any smaller. (And the supply shortages of recent years seem to have abated.)

First, set up Ubuntu 22.04 with USB boot and headless full-disk encryption in the usual way.

OK, perhaps doing so isn’t all that everyday (that remark was slightly a dig at Larousse Gastronomique, at least one recipe in which starts, “First, make a brioche in the usual way”). But this very blog has exactly the instructions you need. This time, instead of a 512GB Samsung USB SSD, I used a 32GB Sandisk USB flash drive — the test-runner won’t be needing tons of storage.

Alternatively, you could use any other reasonable computer you’ve got lying around — an Intel NUC, say, or a cast-off laptop. Most or all of the following steps will apply to any freshly-installed Ubuntu box, and many of them to other Linux distributions too. But you’ll have the easiest time of it if your CI test-runner has the same architecture and OS as your main CI server, which is another reason I’m pressing yet another Raspberry Pi into service here.

However you get there, what you need to proceed with the rest of this post (at least the way I proceeded) is:

  • An existing Laminar CI server, on your local Ethernet.
  • A Raspberry Pi to become the test-runner;
  • with Ubuntu 22.04 freshly installed;
  • plugged into your local Ethernet, able to DHCP (I gave mine a fixed address in OpenWRT’s DHCP server setup) — or, at least, somewhere where the CI server can SSH to it;
  • with your own user, that you can SSH in as, and use sudo.
  • A USB-to-Ethernet adaptor (I used the Amazon Basics one);
  • an Ethernet switch, at least 100Mbit (I used this TP-Link LS1008);
  • and an STM32F746-Nucleo development board.

In this blog post we’re going to:

  1. Connect everything together, making a separate test network
  2. Automate the remaining setup of the test-runner, using Ansible
  3. Arrange that the CI server can SSH to the test-runner autonomously
  4. Run a trivial CI job on the test-runner
  5. Package up the system-tests and run a real CI job on the test-runner
  6. Go and sit down and think about what we’ve done

1. Connect everything together, making a separate test network

For once, instead of being a whimsical stock photo of something tangentially related, the image at the top of this blog post is an actual photograph of the actual physical items being discussed herein (click to enlarge if need be). The only connections to the outside world are from my home network to the Raspberry Pi’s built-in Ethernet (lower left) and power to the Raspberry Pi and to the Ethernet switch (top left and centre bottom). The test network is otherwise self-contained: the STM32 development board is on a private Ethernet segment with just the Raspberry Pi’s USB Ethernet for company. This network has its own RFC1918 address range, 192.168.3.x, distinct from the rest of the home network. (The Wifi interface on the Raspberry Pi is not currently being used.)

The breadboard is attached to the Raspberry Pi’s GPIO connector, and at present is only used to provide a “testing in progress” LED attached to GPIO13 (glowing white in the photo, but therefore hard to see). GPIO usage could become more sophisticated in the future: for instance, if I was writing a HAL crate for a new embedded device, I could connect the Raspberry Pi’s GPIO inputs to the GPIO outputs of the embedded device (and vice-versa) and system-test my GPIO code.

The Raspberry Pi programs the STM32 over USB; apart from that and the Ethernet adaptor, the USB flash drive also takes up one of the four USB sockets, leaving just one spare for future enhancements. (But hubs are a thing, of course.)

2. Automate the remaining setup of the test-runner, using Ansible

The initial setup of a new Ubuntu installation is arguably best done manually, as you might need to react to things that the output of the commands is telling you. But once the basics are in place, the complexities of setting up everything else that the test-runner needs, are best automated — so they can be repeatable, documented, and explainable to others (in this case: you).

Automating server setup is not new news to cloud engineers, who often use tools such as Chef or Puppet to bring one or more new hosts or containers up-to-speed in a single command. Electric Imp used Chef, which I never really got on with, partly because of the twee yet mixed metaphors (“Let’s knife this cookbook — solo!”, which is a thing all chefs say), but mostly because it was inherently bound up with Ruby. Yet I felt I needed a bit more structure than just “copy on a shell script and run it”. So for #homelab purposes, I thought I’d try Ansible.

Ansible is configured using YAML files, which at least is one step up from Ruby as a configuration language. The main configuration file is called a “playbook”, which contains “plays” (think sports, not theatre), which are in turn made up of individual “tasks”. A task can be as simple as executing a single shell command, but the benefit of Ansible is that it comes with a huge variety of add-ons which allow tasks to be written in a more expressive way. For instance, instead of faffing about to automate cron, there’s a “cron” add-on which even knows about the @reboot directive and lets you write:

    - name: init gpios on startup
      cron:
        name: init-gpios
        special_time: reboot
        job: /usr/local/bin/init-gpios

Tasks can be linked to the outcome of earlier tasks, so that for instance it can restart the DHCP server if, and only if, the DHCP configuration has been changed. With most types of task, the Ansible configuration is “declarative”: it describes what the situation ought to be, and Ansible checks whether that’s already the case, and changes things only where they’re not as desired.

Ansible can be used for considerably more complex setups than the one in this blog post — many, many servers of different types that all need different things doing to them — but I’ve made an effort at least to split up the playbook into plays relevant to basically any machine (hosts: all), or ones relevant just to the Raspberry Pi (hosts: raspberrypis), or ones specific to the rôle of being a system-test runner (hosts: testrunners).

I ended up with 40 or so tasks in the one playbook, which between them install all the needed system packages as root, then install Rust, Cargo and probe-rs as the laminar user, then set up the USB Ethernet adaptor as eth1 and run a DHCP server on it.

Declaring which actual machines make up “all”, “testrunners”, etc., is the job of a separate “inventory” file; the one included in the repository matches my setup at home. The inventory file is also the place to specify per-host information that shouldn’t be hard-coded into the tasks: in this case, the DHCP setup tasks need to know the MAC address of the USB Ethernet adaptor, but that’s host-specific, so it goes in the inventory file.

All the tasks in the main YAML file are commented, so I won’t re-explain them here, except to say that the “mark wifi optional”, “rename eth1”, and “set static IP for eth1” tasks do nothing to dispel my suspicion that Linux networking is nowadays just a huge seven-layer dip of xkcd-927 all the way down, with KDE and probably Gnome only adding their own frothy outpourings on top.

I added a simple Makefile to the systemtests/ansible directory, just to automate running the one playbook with the one inventory.

The name “Ansible” comes originally from science-fiction, where it’s used to mean a device that can communicate across deep space without experiencing speed-of-light latency. I only mention this because when communicating across about a metre of my desk, it’s bewilderingly slow — taking seconds to update each line of sshd_config. That’s about the same as speed-of-light latency would be if the test-runner was on the Moon.

But having said all that, there’s still value in just being able to re-run Ansible and know that everything is set consistently and repeatably.

I did wonder about running Ansible itself under CI — after all, it’s software, it needs to be correct when infrequently called upon, so it ought therefore to be automatically tested to avoid bugs creeping in. But running Ansible needs root (or sudo) access on the test-runner, which in turn means it needs SSH access to the test-runner as a user which is capable of sudo — and I don’t want to leave either of those capabilities lying around unencrypted-at-rest in CI. So for the time being it’s down to an unreliable human agent — that’s me — to periodically run make in the ansible directory.

3. Arrange that the CI server can SSH to the test-runner autonomously

Most of the hosts in this series of #homelab posts are set up with, effectively, zero-trust networking: they can only be accessed via SSH, and all SSH sessions start with me personally logged-in somewhere and running ssh-agent (and using agent-forwarding). But because the CI server needs to be able to start system-test jobs on the test-runner, it needs to be able to login via SSH completely autonomously.

This isn’t as alarming as it sounds, as the user it logs into (the laminar user on the test-runner) isn’t very privileged; in particular, it’s not in the sudo group and thus can’t use sudo at all. (The Ansible setup explicitly grants that user permissions to the hardware it needs to access.)

Setting this up is a bit like setting up your own SSH for the first time. First generate a key-pair:

ssh-keygen -t ed25519

— when prompted, give the empty passphrase, and choose “laminar-ssh” as the output file. The public key will be written to “laminar-ssh.pub”.

The public key needs to be added to ~laminar/.ssh/authorized_keys (American spelling!) on the test-runner; the Ansible setup already does this for my own CI server’s public key.

Once authorized_keys is in place, you can test the setup using:

ssh -i laminar-ssh laminar@scotch

Once you’re happy that it works, copy the file laminar-ssh as ~laminar/.ssh/id_ed25519 on the main CI server (not the test-runner!):

sudo cp ~peter/laminar-ssh ~laminar/.ssh/id_ed25519
sudo chown -R laminar.laminar ~laminar/.ssh
sudo chmod 0700 ~laminar/.ssh

You can test that setup by using this command on the CI server (there should be no password or pass-phrase prompt):

sudo -u laminar ssh scotch

— indeed, you need to do this at least once, in order to reassure the CI server’s SSH client that you trust the test-runner’s host key.

4. Run a trivial CI job on the test-runner

Now that the laminar user on the CI server can SSH freely to scotch, what remains is mostly Laminar setup. This part is adapted very closely from Laminar’s own documentation: we set up a “context” for the test-runner, separate from the default context used by all the other jobs (because the test-runner can carry on when other CPU-intensive jobs are running on the CI server), then add a remote job that executes in that context.

/var/lib/laminar/cfg/contexts/test-runner-scotch.conf
EXECUTORS=1
/var/lib/laminar/cfg/contexts/test-runner-scotch.env
RUNNER=scotch

The context is partly named after the test-runner host, but it also includes the name of the test-runner host as an environment variable. This means that the job-running scripts don’t need to hard-code that name.

As before, the actual job file in the jobs directory defers all the complexity to a generic do-system-tests script in the scripts directory:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run
#!/bin/bash -xe
exec do-system-tests

In keeping with the Laminar philosophy of not poorly reinventing things that already exist, Laminar itself has no built-in support for running jobs remotely — because that’s what SSH is for. This, too, is closely-inspired by the Laminar documentation:

/var/lib/laminar/cfg/scripts/do-system-tests
#!/bin/bash -xe

ssh laminar@$RUNNER /bin/bash -xe << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  uname -a
  run-parts /etc/update-motd.d
  sleep 20
  echo 0 > /sys/class/gpio/gpio13/value
EOF

This, of course, doesn’t run any actual tests, but it provides validation that the remote-job mechanism is working. The LED attached to GPIO13 on the test-runner Raspberry Pi serves as the “testing in progress” indicator. (And the run-parts invocation is an Ubuntu-ism: it reproduces the “message of the day”, the welcome message that’s printed when you log in. Most of it is adverts these days.)

Tying the job to the context is the $JOB.conf file:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.conf
CONTEXTS=test-runner-*

Due to the judicious use of a wildcard, the job can run on any test-runner; my setup only has the one, but if you found yourself in a team with heavy contention for the system-test hardware, this setup would allow you to build a second, identical Raspberry Pi with all the same hardware attached — called shepherds, maybe — and add it as a separate Laminar context. Because Laminar runs a job as soon as any context it’s fit for becomes available, this would automatically split queued system-test jobs across the two test-runners.

With all of this in place, it’s time to trigger the job on the CI server:

laminarc queue cotton-system-tests-dev

After a few teething troubles, including the thing I mentioned above about making sure that the SSH client accepts scotch’s host key, I was pleased to see the “testing in progress” LED come on and the message-of-the-day spool out in the Laminar logs.

5. Package up the system-tests and run a real CI job on the test-runner

We didn’t come here just to read some Ubuntu adverts in the message-of-the-day. Now we need to do the real work of building Cotton for the STM32 target, packaging-up the results, and transferring them to the test-runner where they can be run on the target hardware. First we build:

/var/lib/laminar/cfg/jobs/cotton-embedded-dev.run
#!/bin/bash -xe

PROJECT=cotton
RUST=stable
BRANCH=${BRANCH-main}
SOURCE=/var/lib/laminar/run/$PROJECT/workspace

(
    flock 200
    cd $SOURCE/$PROJECT
    git checkout $BRANCH
    cd -
    cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock

source $HOME/.cargo/env
rustup default $RUST

cd $PROJECT
cargo build -p systemtests -F arm,stm32f746-nucleo --all-targets
cargo test --no-run -p systemtests -F arm,stm32f746-nucleo 2> $ARCHIVE/binaries.txt
grep "Executable tests/" $ARCHIVE/binaries.txt  | cut -d'(' -f 2 | cut -d')' -f 1 > binaries.2.txt

tar cf $ARCHIVE/binaries.tar `find cross/*/target -type f -a -executable \
        | grep -v /deps/ | grep -v /build/` `cat binaries.2.txt`
laminarc queue cotton-system-tests-dev PARENT_RUN=$RUN
exec prune-archives cotton-embedded-dev 10

The actual build commands look much like many of the other Laminar jobs but with the extra Cargo features added which enable the cross-compiled targets; the interesting parts of this script come once the cargo build is done and the results must be tarred-up ready to be sent to the test-runner.

Finding all the target binaries is fairly easy using find cross/*/target, but we also need to find the host binary from the systemtests package. The easiest way to do that is to parse the output of cargo test --no-run, which includes lines such as:

   Compiling systemtests v0.0.1 (/home/peter/src/cotton/systemtests)
    Finished test [unoptimized + debuginfo] target(s) in 2.03s
  Executable unittests src/lib.rs (target/debug/deps/systemtests-4a9b67de54149231)
  Executable tests/device/main.rs (target/debug/deps/device-cfdcb3ff3e5eaaa5

The line with “Executable tests” is the one we’re looking for. (The string of hex digits after the executable name changes every time the sources change.) It’s possible that we could “cheat” here and just pick the first file we find starting with target/debug/deps/device-, as this is CI so we’re always building from clean — but this is a more robust way of determining the most recent binary.

(You might feel that this section is a bit of a cop-out, a bit white-box: knowing that there’s only one host-side binary does make the packaging a lot easier. If there were a lot of host-side binaries to package, and this technique started creaking at the seams, I’d look into cargo-nextest which has features specifically designed for packaging and unpackaging suites of Cargo tests.)

Once everything the system-tests job will need is stored in $ARCHIVE/binaries.tar, we can trigger the system-tests job — making sure to tell it, in $PARENT_RUN, which build in the archive it should be testing. (Initially I had the system-tests job use “latest”, but that’s wrong: it doesn’t handle multiple queued jobs correctly, and has a race condition even without queued jobs. The “latest” archive is that of the most recent successfully-finished job — but the build job hasn’t yet finished at the time it triggers the test job.)

The final prune-archives command is something I added after the initial Laminar writeup when some of the archive directories (particularly doc and coverage) started getting big: it just deletes all but the most recent N non-empty archives:

/var/lib/laminar/cfg/scripts/prune-archives
#!/bin/bash -xe

PROJECT=$1
KEEP=${2-2}

cd /var/lib/laminar/archive/$PROJECT
for d in `find * -maxdepth 0 -type d -a \! -empty | sort -n | head -n -$KEEP`; do
    rm -r /var/lib/laminar/archive/$PROJECT/$d
done

No-one likes deleting data, but in this case older archives should all be recoverable at any time, if the need arises, just by building the source again at that revision.

Now the cotton-system-tests-dev.run job needs to pass the $PARENT_RUN variable on to the underlying script:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run
#!/bin/bash -xe

exec do-system-tests cotton-embedded-dev $PARENT_RUN

and the do-system-tests script can use it to recover the tarball and scp it off to the test-runner:

/var/lib/laminar/cfg/scripts/do-system-tests
#!/bin/bash -xe

PARENT_JOB=$1
PARENT_RUN=$2

scp /var/lib/laminar/archive/$PARENT_JOB/$PARENT_RUN/binaries.tar laminar@$RUNNER:

ssh laminar@$RUNNER /bin/bash -xeE << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  cleanup_function() {
    echo 0 > /sys/class/gpio/gpio13/value
    exit 1
  }
  trap 'cleanup_function' ERR
  export PS4='+ \t '
  export PATH=/home/laminar/.cargo/bin:$PATH
  rm -rf tests
  mkdir -p tests/systemtests
  ( cd tests
    tar xf ../binaries.tar
    cd systemtests
    export CARGO_MANIFEST_DIR=`pwd`
    ../target/debug/deps/device-* --list
    ../target/debug/deps/device-* --test
  )
  echo 0 > /sys/class/gpio/gpio13/value
EOF

The rest of the script has also gained in features and complexity. It now includes a trap handler to make sure that the testing-in-progress LED is extinguished even if the tests fail with an error. (See here for why this requires the -E flag to /bin/bash.)

The script goes on to add timestamps to the shell output (and thus to the logs) by adding \t to PS4, and add the Cargo bin directory to the path (because that’s where probe-rs got installed by Ansible).

The tests themselves need to be executed as if from the systemtests directory of a full checkout of Cotton — which we don’t have here on the test-runner — so the directories must be created manually. With all that in place, we can finally run the host-side test binary, which will run all the device-side tests including flashing the STM32 with the binaries, found via relative paths from $CARGO_MANIFEST_DIR.

That’s a lot, but it is all we need to successfully list and run all the device-side tests on our test-runner. Here’s (the best part of) the Laminar logs from a successful run:

+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --list
stm32f746_nucleo::arm_stm32f746_nucleo_dhcp: test
stm32f746_nucleo::arm_stm32f746_nucleo_hello: test
stm32f746_nucleo::arm_stm32f746_nucleo_ssdp: test

3 tests, 0 benchmarks
+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --test

running 3 tests
test stm32f746_nucleo::arm_stm32f746_nucleo_dhcp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 66.37s

+ 16:48:00 echo 0

And now that everything’s working, we can add it to the chain of events that’s triggered whenever a branch is pushed to the CI server:

/var/lib/laminar/cfg/job/cotton-dev.run
#!/bin/bash -xe

BRANCH=${BRANCH-main}
do-checkout cotton $BRANCH
export LAMINAR_REASON="built $BRANCH at `cat git-revision`"
laminarc queue \
	 cotton-embedded-dev BRANCH=$BRANCH \
	 cotton-doc-dev BRANCH=$BRANCH \
         cotton-grcov-dev BRANCH=$BRANCH \
         cotton-msrv-dev BRANCH=$BRANCH \
         cotton-beta-dev BRANCH=$BRANCH \
         cotton-nightly-dev BRANCH=$BRANCH \
	 cotton-minver-dev BRANCH=$BRANCH

If you’re wondering about the -dev suffix on all those jobs: I set up Laminar with two parallel sets of identical build jobs. There’s the ones with -dev which are triggered when pushing feature branches, and the ones without the -dev which are triggered when pushing to main. This is arranged by the Git server post-receive hook:

git/cotton.git/hooks/post-receive
#!/bin/bash -ex

while read oldrev newrev ref
do
    if [ "${ref:0:11}" == "refs/heads/" -a "$newrev" != "0000000000000000000000000000000000000000" ];
    then
        export BRANCH=${ref:11}
        export LAMINAR_REASON="git push $BRANCH"
        if [ "$BRANCH" == "main" ];
        then
           laminarc queue cotton BRANCH=$BRANCH
        else
           laminarc queue cotton-dev BRANCH=$BRANCH
        fi
    fi
done

In a sense there’s no engineering need for this duplication: exactly the same actual work is undertaken in either case. But there’s a social need: feature branches (with -dev) aren’t expected to always pass all tests — indeed, such branches are often pushed for the express purpose of determining whether or not they pass all tests. But the main branch is expected to pass all tests, all the time, and a regression is to be taken seriously. That is: if the cotton-dev build or one of its downstreams fails, that’s got a very different implication from the cotton build or one of its downstreams failing. The cotton build itself should enjoy long, uninterrupted streaks of regression-free passes (and indeed it does; the last failure was in November 2023 due to this issue causing intermittent unit-test failures).

6. Go and sit down and think about what we’ve done

Well, what have we done? We have, over the course of three blog posts, taken a bit of knowledge of bash, cron, and SSH that we already had, then gone and learned a bit about Laminar, Ansible, and Cargo, and the “full stack” that we then engineered for ourselves using all that knowledge is this: any time we push a feature branch, we (eventually) get told automatically, as a straight yes/no answer, whether it’s OK for main or not. That’s an immensely powerful thing to have, an immensely useful property to have an oracle for.

Having that facility available is surely expected these days in other areas of software engineering — where test-harnesses are easier to create — but I hope I’ve demonstrated in these blog posts that even those working on embedded systems can also enjoy and benefit from the reliability (of main in particular) that’s due to this type of workflow.

(Other versions of the workflow could be constructed if preferred. Perhaps you don’t want every push to start a system-test sequence — in that case, you could either write a more complex Git post-receive hook, or set up an alternative Git remote, called perhaps “tests”, so that pushing to that remote kicked off the test sequence. Or you could tie the test sequence in to your pull-request process somehow.)

To repeat a line from the previous instalment, any software you need to test can be tested automatically in some way that is superior to not testing it at all. Compared to Rust’s unit tests, which are always just a cargo test away, it took us three quite detailed blog posts and a small pile of physical hardware to get us to the stage where this embedded code could be tested automatically. If I had to distill the message of these three posts from their ~11,000 words down to just six, they’d be: this sort of effort is worthwhile. If your product is, or runs on, a specific platform or piece of hardware, it’s worth spending a lot of effort arranging to test automatically on the actual target hardware, or as near to the actual target as is practical. (Sometimes embedded products are locked-down, fused-off, potted, or otherwise rendered inaccessible; testing, in that case, requires access to unlocked variants.)

That is to say: is the effort worth it for the cotton-ssdp crate — a few thousand lines of code, about 60% of which is already tests, and the rest of which has 100% test coverage? Arguably yes, but also arguably no, especially as almost all of cotton-ssdp can be tested in hosted builds. The cotton-ssdp crate has acted here more as a spike, a proof-of-concept. But the point is, the concept was proved, a baseline has now been set, and all the right testing infrastructure is in place if I want to write a power-aware RTOS, or implement HAL crates for some of these weird development boards in my desk drawer, or if I want to disrupt the way PAC crates are generated in order to improve the testing story of HAL crates. Now when I start doing those things, I can start defending the functionality with system-tests from the outset. If I want to do those more complex, more embedded-centric things — which I do — then all the effort expended so far will ultimately be very beneficial indeed. If you, too, aim to do complex or embedded-centric things, then similar levels of effort will benefit your projects.

6.1 War stories from the front lines of not-system-testing

I have some war stories for you. I heard tell of a company back in the day whose product, a hardware peripheral device, was (for sound commercial reasons) sold as working with numerous wonky proprietary Unixes. But for some of the more obscure platforms, there had been literally no automated testing: a release was declared by development, it was thrown over the wall into the QA department, and in due course a human tester would physically insert the CD-R into one of these wonky old machines and manually run through a test script ensuring that everything appeard to work. This was such a heavyweight process that it was run very rarely — meaning that, if an issue was found on, say, AIX, then the code change that caused it probably happened months ago and with significant newer work built on top of it. And of course such a discovery at “QA time” meant that the whole, lumbering manual release process had to be started again from scratch once the issue was fixed. This was exactly the pathology that CI was invented to fix! I mean, I’m pretty sure Laminar doesn’t support antediluvian AIX versions out of the box, but given the impact of any issues on release schedules, it was definitely worth their putting in quite significant development effort to bring the installation process under CI — automatically on main (at least nightly, if not on every push), and by request on any feature branch. (Developers need a way to have confidence they haven’t broken AIX before merging to main.) They should have asked themselves, “What can be done to automate testing of the install CD, in some way that is superior to not testing it at all?” — to place it under the control of a CI test-runner, as surely as the STM32F746-Nucleo is under the control of this Raspberry Pi? Well — what’s the simplest thing that can act as a fake CD-ROM drive well enough to fool an AIX box? Do those things have USB host? Can you bitbang SCSI-1 target on a Raspberry Pi? Would a BlueSCSI help? Or even if they were to decide that “CI-ing” the actual install CD image is too hard — how often are the failures specifically CD-related? Could they just SSH on the installer tarballs and test every other part of the process? How did this pathology last for more than one single release?

I also heard tell of a different company whose product was an embedded system, and was under automated test, including before merging to main — but following a recent “urgent” porting exercise (again for sound commercial reasons), many of the tests didn’t pass. The test harness they used supported marking tests as expected-failure — but no-one bothered doing so. So every test run “failed”, and developers had to manually pore over the entire list of test results to determine whether they’d regressed anything. In a sense the hard part of testing was automated, but the easy part not! This company had put in 99% of the effort towards automated testing, but were reaping just a tiny fraction of the benefits, because the very final step — looking at the test results and turning them into a yes/no for whether the code is OK for main — was not automated. How did this pathology last for more than one single afternoon?

6.2 People

The rhetorical-looking questions posed above about the AIX CI and the expected-fail tests (“How did these obviously wrong situations continue?”) did in fact have an answer in the real world. Indeed, it was the same answer in both cases: people.

In the first company, the head of QA presided over a large and (self-)important department — which it needed to be in order to have enough staff to manually perform quite so much automatable work. If QA was run with the attitude that human testers are exploratory testers, squirrelers-out of corner-cases, stern critics of developers’ assumptions — if the testers’ work-product, including during early-development-phase involvement, was more and better automated tests — then they probably wouldn’t need anything like so many testers, and the rôle of “Head of QA” would perhaps be viewed as a less big cheese than hitherto. Although the product quality would benefit, the company’s bottom-line would benefit, and even the remaining testers’ career-progression would benefit — the Head of QA’s incentives were misaligned with all of that, and they played the game they were given by the rules that they found to be in effect.

The second company seems harder to diagnose, but fundamentally the questions are, “Who is in charge of setting the quality bar for merges to main?” and “Who is in charge of what happens when that bar is not met?”. Those are likely to be two different people — they require very different skills — but if you find that a gulf is opening up between your team’s best practices and your team’s typical practices, then both those people are needed in order to bring the two closer together again. (In my own career I’ve a few times acted as the first one, but I’ve never acted as the second one: as Dr. Anthony Fauci famously didn’t say, “I don’t know how to explain to you that you should care for other people software quality.”)

This post, and this blog, and this blogger, cannot help you to deal with people. But often the talking point of the people you need to convince (or whose boss you need to convince to overrule them) is that better automated testing isn’t technologically feasible, isn’t worth attempting. I hope I’ve done a little to dispel that myth at least.

Monday 18 March 2024

System-testing embedded code in Rust, part two: Things I learned testing SSDP

Previously on #rust:

     
With the basic system-test infrastructure now in place thanks to the previous post in this series, it’s time to wire the STM32F746-Nucleo development board up to Ethernet and start testing actual code: the cotton-ssdp crate.

Having said that, in fact the first thing to do after plugging-in Ethernet, is to write a test that verifies basic connectivity. If packets can’t flow between the Nucleo and the rest of the network for any reason, then there’s no point disparaging the SSDP code for its failure to communicate. The basic test will establish that simple networking is operational: that the Ethernet interface sees link (i.e., that the Ethernet cable is actually connected to the Nucleo, and also to something at the other end), and also that DHCP can succeed and the Nucleo obtain an IP address.

And of course writing that code isn’t throwaway effort: every other network-related test will need to do all those things first before getting on with more specific tasks. So the setup code will form part of all subsequent SSDP test binaries.

As always, I encountered problems along the way, because Rust. But all of those problems eventually had solutions, also because Rust.

The aim of the tests

It’s worth just going over what the goals are here. Why spend development time on writing these tests, and on cabling up these test rigs? What is the payback?

My answer is, that I’d like to be able to work on cotton-ssdp, and eventually other similar crates, knowing that I have automated testing that verifies new functionality and defends against regressions in existing functionality. When I push a new feature branch to the CI server, I want it to tell me, as straightforwardly and clearly as possible, whether or not my changes are OK for main.

I don’t think the testing constitutes a promise that the code is completely bug-free (even less, that the functionality it provides is objectively useful). But it does, at the very least, constitute a promise that certain types or certain show-stopping severities of bug are absent. Tom DeMarco, writing in Peopleware, describes Gilb’s Law: “Anything you need to quantify can be measured in some way that is superior to not measuring it at all”. Something similar applies here: any software you need to test can be tested automatically in some way that is superior to not testing it at all.

As a target, “superior to doing nothing” is not a very high bar to clear. These system-tests aren’t very comprehensive in terms of, say, line coverage of the cotton-ssdp crate. But then the crate, after all, is thoroughly unit-tested. These system-tests are more about testing the platform integration code — here, with the RTIC real-time operating system, and the smoltcp networking stack — and about systematically verifying the crate’s original goal of being useful to implementers of embedded systems.

Concretely, the tests presented here really just check that the device can discover resources on the network, and advertise its own resources. Once that two-way communication is proven to work, everything else about exactly what is communicated, is already covered by the unit tests.

I’m not saying that the existence of a unit test automatically renders useless any system testing of the same function. Unit tests and system tests have different (human, organisational) readership — typically, unit tests are only interesting to developers, whereas system tests are often high-level enough and visible enough to serve as technology demonstrations to project managers and beyond — and both audiences are entitled to ask for and to see evidence of all claimed functionality. But in this case, the developers, project managers and beyond are all me, and anyway adding a huge variety of tests would clog up the narrative of this blog post, which focuses more on describing the framework.

Being a good citizen of Ethernet: MAC addresses

We’ll need to start by getting this Nucleo board onto the local Ethernet. In order to participate in SSDP, it’s going to need an IP address, as handed out by the DHCP server in my router. But in order to even participate in Ethernet enough to communicate with the DHCP server, it’s going to need its own Ethernet address. This is also called a hardware address or MAC address, it’s 48 bits (six bytes) long, every device on an Ethernet network has one, and it’s often printed on the back of routers and suchlike as twelve hex digits separated by colons. Some networking hardware comes with an officially-allocated MAC address built-in (as a company, you can get ranges of them allocated to you, like IP addresses) — but STM32s don’t, probably because ST Micro sell a lot of STM32s, many of them into designs (such as the Electric Imp) where they never even use their Ethernet circuitry, and it’d be a waste of a finite resource for ST Micro to allocate each one its own MAC address from the fixed pool.

For our purposes it’d be overkill to get an official address block allocated (though you’d need to if you intended to sell actual products), so it’s fortunate that an alternative way of obtaining an address is possible. One of those 48 bits is set to zero in every official (“Universally Administered”) address, but can be set to one to indicate a “Locally Administered” address, i.e. one chosen by the local network administrator. Which is also me! So in the tests, we set that bit, then pick a device-specific value for the other 47 bits (in fact 46 as there’s another reserved one), and use the result as our MAC address. So long as the 46 device-specific bits are chosen randomly enough, the chance of an accidental collision is suitably negligible.

So we need a calculation that always provides the same answer when performed on the same device, but always provides different answers when performed on different devices. Fortunately each STM32 does include a unique chip ID, burned into each individual die at chip-manufacture time, as described in the STM32F74x reference manual (“RM0385”) section 41.1.

But it’s not a good idea to just use the raw chip ID as the MAC address, for several reasons: it’s the wrong size, it’s quite predictable (it’s not 96 random bits per chip, it encodes the die position on the wafer, so two different STM32s might have IDs that differ only in one or two bits, meaning we can’t just pick any 46 bits from the 96 in case we accidentally pick seldom-changing ones) — and, worst of all, if anyone were to use the same ID for anything else later, they might be surprised if it were very closely correlated with the device’s MAC address.

So the thing to do, is to hash the unique ID along with a key, or salt, which indicates what we’re using it for. You can see this on Github or right here:

pub fn stm32_unique_id() -> &'static [u32; 3] {
    // SAFETY: this address only valid when running on STM32
    unsafe {
        let ptr = 0x1ff0_f420 as *const [u32; 3];
        &*ptr
    }
}

pub fn unique_id(salt: &[u8]) -> u64 {
    let id = stm32_unique_id();
    let key1 = (u64::from(id[0]) << 32) + u64::from(id[1]);
    let key2 = u64::from(id[2]);
    let mut h = siphasher::sip::SipHasher::new_with_keys(key1, key2);
    h.write(salt);
    h.finish()
}

pub fn mac_address() -> [u8; 6] {
    let mut mac_address = [0u8; 6];
    let r = unique_id(b"stm32-eth-mac").to_ne_bytes();
    mac_address.copy_from_slice(&r[0..6]);
    mac_address[0] &= 0xFE; // clear multicast bit
    mac_address[0] |= 2; // set local bit
    mac_address
}

This is quite a general mechanism for producing device-specific but deterministic unique IDs; it’s also used by the SSDP test for generating the UUID for the example resource. And it can be made more sophisticated by hashing-in more information. Want a unique, but deterministic, Wifi MAC address that’s per-network to defeat tracking? Hash in the Wifi network name, or the router's MAC address. (It’s harder to avoid tracking on Ethernet, because you need the MAC address before you even start using DHCP, i.e. before you know anything about the network you’re joining. But of course passive tracking isn’t really a threat model on Ethernet, where you must have chosen to plug in the cable.) Want different UUIDs for different UPnP services? Hash in the service name. One thing you mustn’t do, though, is to use these IDs as cryptographic identifiers, as the hash function in question hasn’t been analysed for the collision-resistance and irreversibility properties which you need for such identifiers.

Spike then refactor: Adding smoltcp support

Getting the cotton-ssdp crate tested on embedded devices, was always going to involve porting it to use smoltcp as an alternative to the standard-library socket implementation that it currently uses. I fondly imagined that the result would look like the idealised “triangle of initialisation”:

fn main() {
    let mut a = A::new();
    let mut b = B::new(&a);
    let mut c = C::new(&a, &b);
    let mut d = D::new(&a, &b, &c);
    ...
    do_the_thing(&a, &b, &c, &d, ...);
}

so, in this case, perhaps:

fn main() {
    let mut stm32 = Stm32::<STM32::F746, STM32::Power::Reliable3V3>::default();
    let mut smoltcp = Smoltcp::new(stm32.ethernet());
    run_dhcp_test(&mut stm32, &mut smoltcp);
}

I didn’t quite get to that level of neatness — there’s a lot of boilerplate in most embedded software, and anyway that pseudocode appears to allocate everything on the stack, where overruns are a runtime failure, as opposed to being in the data segment, where overrun would be (as an improvement) a link-time failure.

But the first version of the DHCP test was over 550 lines of boilerplate, so most of the commits on the pdh-stm32-ssdp branch are just about tidying away the common code to make the intent of the test more obvious.

Once DHCP was working, implementing a simple test for SSDP could commence. (And adding a second network-related test spurred the factoring-out of yet more common code between the two.) Rather than immediately wade in to changing cotton-ssdp itself, though, I was able to use the exposed networking traits which cotton-ssdp already contained, to start implementing smoltcp support directly in the test. This was a fortuitous case of “pre-adaptation”: the abstractions that let cotton-ssdp’s core be agnostic on the matter of mio versus tokio, turned out to be exactly those needed for also abstracting away smoltcp. (Well, okay, that wasn’t completely fortuitous, as I did have embedded systems in mind even when developing hosted cotton-ssdp.)

Once the test was working, I could then move those trait implementations into cotton-ssdp, hiding them behind a new Cargo feature smoltcp in order not to introduce needless new dependencies for those using cotton-ssdp on hosted platforms.

I’m not sure I’m quite your (IPv4) type

If you look at the cotton-ssdp additions for smoltcp support, you’ll see that only a small part of it (lines 220-300) is the actual smoltcp API usage. A much larger part is taken up by conversions between types representing IP addresses.

The issue is, that networking APIs are typically system-specific, and not present on many embedded systems, so the Rust people very sensibly and usefully left those functions out of the embedded, no_std configuration. But, a bit less usefully for our purposes, they also left out the types representing IPv4 and IPv6 addresses. These types are not platform-specific — they’re straight from the RFCs — but, as they weren’t available to no_std builds of smoltcp, the smoltcp people were forced to invent their own versions.

Subsequently a crate called “no-std-net” was released, containing IP address types (structurally) identical to the standard-library ones (when built with no_std and just renaming the standard-library ones when built hosted. The cotton-ssdp crate uses the no-std-net names.

Now in fairness the Rust people did then realise that this situation wasn’t ideal, and standardised types are set to land in Rust 1.77 — but that’s much newer than most people’s minimum-supported-Rust-version (MSRV), and anyway tons of smoltcp users in the field are still using the smoltcp versions. So conversions were necessary.

Rust has very well-defined idioms for type conversion, via implementing the From trait — so on the face of it, all that’s needed is to implement From for the standard types on the smoltcp types, and vice versa. That doesn’t work, though, because of Rust’s “Orphan Rule”, which allows trait implementations only in the crate defining the trait or the one defining the type being extended — and cotton doesn’t define From, std, or smoltcp. The best we can do, it seems, is to invent yet another set of “Generic” IP address types so that we can define the conversions both ways. This leads to lots of double-conversions; stm32f746-nucleo-ssdp-rtic has the likes of

no_std_net::IpAddr::V4(GenericIpv4Address::from(ip).into())

and

GenericSocketAddr::from(sender.endpoint).into()

but it keeps the core code looking sane.

Parts of the generic-types code are more complex than I’d hoped because the smoltcp stack itself can be built either with or without IPv6 (and IPv4 for that matter), using Cargo features. As it stands, cotton-ssdp only uses IPv4, so it depends on smoltcp with the IPv4 feature. But users of cotton-ssdp will of course be using it in some wider system, and, because of the way Rust’s crate feature resolution works, if any part of that system enables the IPv6 feature of smoltcp, then everyone gets a smoltcp with IPv6 enabled. But the enumerated IP address types inside smoltcp — such as wire::Endpointchange if IPv6 is enabled! That means that cotton-ssdp has no way of knowing whether its usage of smoltcp will result in it getting handed the one-variant version of those enumerated types, or two-variant versions. That’s why the conversions accepting those smoltcp types must look like this:

        // smoltcp may or may not have been compiled with IPv6 support, and
        // we can't tell
        #[allow(unreachable_patterns)]
        match ip {
            wire::IpAddress::Ipv4(v4) => ...
            _ => ...
        }

in order to compile successfully however many variants ip has; if there’s only one variant, it needs the allow() in order to compile, but if there’s two or more, it needs the “_ =>” in order to compile.

“Don’t Panic” in large, friendly letters

The first version of the DHCP test worked much like the existing “Hello World” test, which was refactored to match: spawn probe-run as a child process, listen to (and run tests against) its trace output, and then shut it down (in a Drop handler) once all tests have passed.

This was a plausible first stab, and worked well every time that the tests passed, but turned out not to be sound if a test ever failed. (Edit: A previous version of this page blamed the way that a failing test panics, for not unwinding the stack, meaning Drop handlers don’t run. But that’s not actually the case — panics do unwind the stack and do run Drop handlers — so it’s now not clear why the following change was needed. But at least it’s still a valid way of doing it, even though it’s not required.)

I found the solution on the Eric Opines blog: run the test inside panic::catch_unwind(). That function takes a closure to run, and returns a Result indicating either successful exit or a (contained) panic. This required refactoring the DeviceTest struct to also use a closure, but actually the resulting tests look quite neat and well-defined — here’s “Hello World”:

fn arm_stm32f746_nucleo_hello() {
    nucleo_test(
        "../cross/stm32f746-nucleo/target/thumbv7em-none-eabi/debug/stm32f746-nucleo-hello",
        |t| {
            t.expect("Hello STM32F746 Nucleo", Duration::from_secs(5));
        },
    );
}

The two parameters to nucleo_test() are of course the compiled binary to run — remembering that the build system ensures that these are up-to-date before starting the test — and a closure containing the body of the test. The closure gets passed the DeviceTest object, on which the available methods are expect(), which waits up to the given timeout for a message to appear on the device’s (virtualised, RTT) standard output (and if the timeout elapses without seeing it, fails the test), and expect_stderr() which does just the same for the device’s standard-error stream.

Because panic::catch_unwind() requires its closure to be “unwind-safe”, so does nucleo_test(); so far, this hasn’t been an issue in practice, so I haven’t looked deeply into what to do about it otherwise. The DHCP test, at least on the host side, is just as straightforward:

fn arm_stm32f746_nucleo_dhcp() {
    nucleo_test(
        "../cross/stm32f746-nucleo/target/thumbv7em-none-eabi/debug/stm32f746-nucleo-dhcp-rtic",
        |t| {
            t.expect_stderr("(HOST) INFO  success!", Duration::from_secs(30));
            t.expect("DHCP config acquired!", Duration::from_secs(10));
        },
    );
}

The closure pattern was so appealing that I made the SSDP test use the same design — not least to ensure that it, too, was correctly shut down even following a failing test:

fn arm_stm32f746_nucleo_ssdp() {
    nucleo_test(
        "../cross/stm32f746-nucleo/target/thumbv7em-none-eabi/debug/stm32f746-nucleo-ssdp-rtic",
        |nt| {
            nt.expect_stderr("(HOST) INFO  success!", Duration::from_secs(30));
            nt.expect("DHCP config acquired!", Duration::from_secs(10));
            ssdp_test(
                Some("cotton-test-server-stm32f746".to_string()),
                |st| {
                    nt.expect("SSDP! cotton-test-server-stm32f746",
                              Duration::from_secs(20));
                    st.expect_seen("stm32f746-nucleo-test",
                              Duration::from_secs(10));
                }
            );
        }
    );
}

The implementation of ssdp_test() itself is a little more involved, because it must spawn a temporary background thread to start and run the host’s SSDP engine which communicates with the one on the device. The two parameters are an optional SSDP notification-type to advertise to the device, and a closure to contain the body of the test. Here the available method on the SsdpTest object passed to the closure, is expect_seen(), which waits with a timeout for someone on the network (hopefully, the device under test) to advertise a specific notification-type. Here the nt.expect() line checks that the device has seen the host’s advertisement, and the st.expect_seen() line checks that the host has seen the one from the device.

Those two events can occur in either order in practice, but both DeviceTest and SsdpTest buffer-up notifications, so that an expectation that has already come to pass before the expect call is made, completes immediately. In the future it might be interesting to investigate using async/await to express the asynchronous nature of this test more explicitly.

The SSDP test only works if the Nucleo board and the test running on the host, can exchange packets. Typically this means that they must be on the same Ethernet network — or, if the host is on Wifi, that the Wifi network must be bridged to the Ethernet (e.g., by cabling the Nucleo to one of the Ethernet LAN sockets on the Wifi router).

Putting it all together

As of the merge of the pdh-stm32-ssdp branch, the following command runs the system-tests on an attached STM32F746-Nucleo:

cargo test -F arm,stm32f746-nucleo

And for those without a Nucleo, the following commands still work, building all the device code but testing only the host code:

cargo build -F arm,stm32f746-nucleo
cargo test -F arm

And for those without even the cross-compiler installed, which is probably most people, the following commands still work, building and testing only the host code:

cargo build
cargo test
cargo build-all-features --all-targets
cargo test-all-features --all-targets

This all makes it easy for a developer to determine, before pushing to the central git server, whether their branch is likely to be OK for main — but the most definitive answer to that question is only available when going to the trouble of using a local Nucleo development board. If you’re working on something you think probably won’t affect embedded builds, what you really want is to not faff about with development boards (particularly multiple ones): what you want is for continuous integration to perform all of these system-tests as part of its mission to answer the question of whether your branch is OK for main. Adding a CI runner that can run these tests automatically on every push, is the topic of the third post in this series.

Sunday 18 February 2024

System-testing embedded code in Rust, part one: Infrastructure

Previously on #rust:

     
One of the goals of the Rust crates I’ve been working on, cotton-netif and cotton-ssdp, is for them to be useful on embedded systems: microcontroller-based devices only capable of running simple real-time operating systems, as opposed to full-size Linux systems.

There is good support in Rust itself for targetting such platforms: it’s relatively easy to write such code (with the no_std attribute), and not even much harder to cross-compile it (using Cargo) and even run it on the target (the probe-rs folks do great work there). But if you’ve read some of the other posts here, you’ll be familiar with the idea that software isn’t done until it can be repeatably shown to be done. Someone — perhaps not Themis, the goddess of Justice, as pictured; more likely Laminar, the goddess of CI — must solemnly, dispassionately, objectively weigh the code’s activities against (some representation of) its specification, and hold it in judgement if it falls short.

Less fancifully, what’s needed is an automated way for CI (running on a rich, non-embedded host platform) to run the code on a genuine, embedded, target platform and check its functionality. Of course, it’s best to arrange that as as much code as possible is abstracted away from the hardware so that it can be unit-tested on the host, run through Miri and Valgrind and other dynamic-analysis tools on the host, and just plain debugged on the host, where everything is a little less awkward. But the proof of the pudding is still in the eating: only tests that run on the target can be the final arbiter.

(In particular, whether the target hardware appears well-documented or not, it’s all too easy to have misconceptions about how peripherals behave, leading to a situation where the code and the host-side unit-test agree about what’s going on, but they’re both wrong because the actual hardware does something completely different.)

At Electric Imp we had (what became) a large subsystem of Python scripts which our CI system ran, and which in turn installed the newly-built device firmware on some dozens of Imp devices, running through system-tests including thorough regression-testing of the wifi connectivity, the “curated” peripherals, and really the entire of the firmware functionality. This worked extremely well, and over the years caught simply oodles of bugs which had passed unit-testing but would have in some way scuppered our customers’ actual devices — but all of that Python always felt like an add-on to the main C++ build system, needing different skills to maintain. As Rust (or at least Cargo), by contrast, quite rightly represents testing as a first-class language feature, I wondered whether Cargo’s own facilities could be used to system-test embedded Rust without having to invent lots of extra infrastructure bolted on the side.

In this blog post I’ll focus on getting the testing infrastructure set up, literally just far enough to get a target-side test that prints “Hello, World” and a host-side test runner which checks that it has done so. Adding actual tests for the crates’ functionality (SSDP, to start with) will likely come in a subsequent post; CI considerations in yet another.

Goals of the Cotton automated system-tests

  1. Joel Test #2 compatibilityJoel Spolsky, writing some years ago now, has some pithy questions to ask software development teams. “Can you make a build in one step?” is always valid to ask, for the reasons he lists — mostly about not forgetting intermediate steps — but also because in practice what’s easy is what most people will do. If the easy thing to do is run a command that builds absolutely everything, then most of the time developers will run that command. (Except perhaps if it starts to take too long, but that’s a different issue.) And the more that’s built by the one command that everybody uses each day, the harder it is to accidentally introduce a bug that affects some builds or facets of the system but not others.
  2. Easy for me/anyone/CI to test everything on the host — Because most of the code compiles for the host (potentially even parts that are only useful on the target), host-side development should remain straightforward. In particular, it mustn’t require the presence of a cross-toolchain, or nightly Rust, or any target hardware.

    It certainly mustn’t require people using the crates from crates.io in normal host-side builds in the normal way, to install or attach anything special. (But Cargo makes sure of that anyway.)

    Someone wanting to work on a Cotton crate in combination with their own project, can check out the Cotton repository alongside their own, and use a path dependency when importing the Cotton crate (just as they would for a one-crate upstream repository).

  3. Easy for me/anyone/CI to test everything on one target — Even if Cotton eventually targets many different embedded systems, almost all embedded developers will only have one type of hardware attached to their development host at any one time. It must be straightforward to run all the tests that can run on (say) an STM32F746-Nucleo, but none that requires different hardware.
  4. Easy for me/anyone/CI to test one crate on one target — Eventually several crates (not just SSDP) will share the same system-test infrastructure; it must remain possible to test just one crate at a time, for the sake of development cycle time.

  5. Possible for CI to test everything on many/all targets — Once Cotton targets several different embedded systems, each one should have its tests run by CI. But this mustn’t require one CI host per target — it must be possible to attach several development boards to the same CI host and have it run the right tests on the right boards. Notice that this goal is for it to be “possible”, not “easy”: having lots of different development boards attached is going to be uncommon, so it’s okay for it to need slightly more awkward setup.
  6. Separation of concerns — The cotton-ssdp crate, say, can probably be tested on each of several different target devices. Adding a new target device mustn’t require changes to cotton-ssdp itself.

Outline of the solution

You can see the merge that creates this infrastructure at commit 181a8fdc and the whole tree at that commit here on Github.

There were definitely a few false starts along the way to achieving those goals. The first thing I tried was “per-package targets”; this is a Rust facility that in theory should make it possible to mark certain packages in a workspace as building for a different platform than the rest of the workspace. The inspiration for the feature was people building web apps where the server end compiles to x86_64 or ARM64 or whatever’s cheap in AWS these days, but the client end compiles to WASM to run in browsers. It’s more-or-less exactly what’s needed here too — but sadly it’s more complicated to implement than you’d first think, and is only available in nightly Rust where it doesn’t work very well. (When I tried it, cargo test kept trying to run my STM32 binaries on the host.)

Without per-package targets, I was going to need different invocations of Cargo for host and device builds (because each invocation of Cargo can only build for a single platform). So I tried having a build.rs build script that, when invoked for the host platform, re-runs Cargo for the device platform. Yee-hah, right? Sheer cowboyery. It doesn’t work, because Cargo deadlocks trying to rebuild the same crate it’s already building. Can’t really blame it, either.

I looked for a while into setting rustflags in Cargo.toml, but that can’t be set per-package in a single workspace, let alone per-test in a single package. Each different set of rustflags must currently be a separate invocation of Cargo.

The answer seems to be, to take parts of each of those ideas:

  • Have the workspace as a whole continue to build native for the host,
  • and have a build script that re-invokes Cargo for each target platform,
  • but have each target platform’s root crate be separate, and never built for the host, using an exclusion in the workspace Cargo.toml:
    Cargo.toml
    [workspace]
    members = [
        "cotton-netif",
        "cotton-ssdp",
        "systemtests",
    ]
    
    exclude = [
        "cross",
    ]

The resulting workspace structure looks like this:

cotton
├── Cargo.toml
├── cotton-netif
│   ├── Cargo.toml
│   └── ...
├── cotton-ssdp
│   ├── Cargo.toml
│   └── ...
├── cross
│   └── stm32f746-nucleo
│       ├── .cargo
│       │   └── config.toml
│       ├── Cargo.toml
│       ├── memory.x
│       └── src
│           └── bin
│               └── hello.rs
└── systemtests
    ├── build.rs
    ├── Cargo.toml
    ├── src
    │   └── lib.rs
    └── tests
        └── stm32f746-nucleo.rs


The cross subdirectory is excluded from the root workspace (in the root Cargo.toml), and the crates inside it (at present, only stm32f746-nucleo) are built by a recursive Cargo invocation in systemtests/build.rs; recursively invoking Cargo is a bit subtle, as you need to unset a bunch of environment variables in order that the sub-Cargo runs mostly as a new top-level Cargo (otherwise, the deadlocking issues reappear). Here’s the build-script section that achieves that, with the cross-compilation guarded by a (Cargo) feature called arm:

systemtests/build.rs (partial)
    if env::var("CARGO_FEATURE_ARM").is_ok() {
        /* Run the inner Cargo without any Cargo-related environment variables
         * from this outer Cargo.
         */
        let filtered_env: HashMap<String, String> = env::vars()
            .filter(|(k, _)| !k.starts_with("CARGO"))
            .collect();
        let child = Command::new("cargo")
            .arg("build")
            .arg("-vv")
            .arg("--all-targets")
            .arg("--target")
            .arg("thumbv7em-none-eabi")
            .current_dir("../cross/stm32f746-nucleo")
            .env_clear()
            .envs(&filtered_env)
            .output()
            .expect("failed to cross-compile for ARM");
        io::stdout().write_all(&child.stderr).unwrap();
        io::stdout().write_all(&child.stdout).unwrap();
        assert!(child.status.success());
    }

The obvious downside here, is that build scripts aren’t run with a live terminal: unless the outer Cargo is invoked with -vv, the script’s standard output and standard error are written only to files, and shown only if the script fails. If the recursive Cargo invocation succeeds, all you see is a long pause in your build — though at least if the recursive invocation fails, you do see its error output.

For the time being, this acts as extra encouragement to keep complex code out of the only-compiled-for-target crates — though if need be you can always do a normal top-level Cargo build inside the STM32 crate, to see live standard output and standard error:

cargo -C cross/stm32f746-nucleo build --target thumbv7em-none-eabi

Host-side runner for a device-side test

To start with (and to validate the rest of the solution), there is only one actual system-test in the first merge: writing the “Hello World” binary onto an STM32F746-Nucleo development board, running it, and checking the output. Cargo will have run our build script before running the test, so we know that our device-side binaries have all been built and are up-to-date (another Joel Test benefit) — and it’s just a question of using (the excellent) probe-run, which is built on probe-rs and already knows about STM32 development boards, to write the binary to the STM32 chip and then run it. The STM32 binary uses defmt-rtt for its logging output, using the Cortex-M Real-Time Tracing system, support for which is again built-in to probe-run (no semihosting! no UARTs!), so the device side is as simple as this:

cross/stm32f746-nucleo/src/bin/hello.rs
#![no_std]
#![no_main]
 
use defmt_rtt as _; // global logger
use panic_probe as _;
use cortex_m::asm;
 
#[cortex_m_rt::entry]
fn main() -> ! {
    defmt::println!("Hello STM32F746 Nucleo!");
 
    loop {
        asm::bkpt()
    }
}
and the host side only a little less simple:
systemtests/tests/stm32f746-nucleo.rs
use assertables::*;
use serial_test::*;
use std::env;
use std::path::Path;
use std::process::Command;
 
use std::io::{self, Write};
 
#[test]
#[serial]
#[cfg_attr(miri, ignore)]
fn arm_stm32f7_hello() {
    let elf = Path::new(env!("CARGO_MANIFEST_DIR")).join(
        "../cross/stm32f746-nucleo/target/thumbv7em-none-eabi/debug/hello",
    );
 
    let mut cmd = Command::new("probe-run");
    if let Ok(serial) = env::var("COTTON_PROBE_STM32F746_NUCLEO") {
        cmd.arg("--probe");
        cmd.arg(serial);
    }
    let output = cmd
        .arg("--chip")
        .arg("STM32F746ZGTx")
        .arg(elf)
        .output()
        .expect("failed to execute probe-run");
 
    println!("manifest: {}", env!("CARGO_MANIFEST_DIR"));
    println!("status: {}", output.status);
    io::stdout().write_all(&output.stderr).unwrap();
    io::stdout().write_all(&output.stdout).unwrap();
    assert!(output.status.success());
 
    let stdout = String::from_utf8_lossy(&output.stdout);
    assert_contains!(stdout, "Hello STM32F746 Nucleo");
}

Notice the use of serial_test to defeat Rust’s default behaviour of running multiple integration tests in parallel — that wouldn’t end well if they were all competing for one single physical development board. (Although obviously there’s only one test at the moment.)

Clearly much more complex logging and checking could (and will) be implemented via the same mechanism, but as a proof-of-concept this suffices for this initial blog post.

Does this meet our goals?

  1. Joel Test — the commands:
    cargo build
    cargo test
    cargo build-all-features --all-targets
    cargo test-all-features --all-targets
    all do as expected, and none need any cross-toolchains, special hardware, or even nightly Rust. (The all-features ones work because the arm feature is carefully excluded from all-features builds.) The same generic “CI for Rust” scripts that worked for simple old host-side Cotton, continue to work fine with the exciting new multi-platform Cotton.
  2. Test everything on the host — as above, the easy, everyday commands build and test all the host-compatible crates. The device-only crates can be built with:
    cargo build -F arm
    cargo test -F arm
    This needs the thumbv7em-none-eabi target to be installed for the current Rust toolchain:
    rustup target add thumbv7em-none-eabi
    but does not need any hardware, nor nightly Rust.
  3. Test everything on one target — The Cargo.toml in the systemtests declares a Cargo feature stm32f746-nucleo, which depends on feature arm and enables the integration test whose host side is shown above. So building and running this test (i.e., running all the tests for that particular development board) looks like this:
    cargo build -F arm,stm32f746-nucleo
    cargo test -F arm,stm32f746-nucleo
    This needs the cross-toolchain installed as above, and of course it needs an actual physical STM32F746-Nucleo development board attached via USB to the host computer running the tests.
  4. Test one crate on one target — Because the system-tests infrastructure is shared, this would have to be accomplished by careful naming of tests, and the use of Cargo’s wildcard test name option:
    cargo test -F arm,stm32f746-nucleo --test '*ssdp*'
  5. Test everything on many targets — the main issue here is that, if several different (probe-rs compatible) development boards are attached to the same host, the probe-run command needs to be told each time which one to use. This is the idea behind the optional COTTON_PROBE_STM32F746_NUCLEO environment variable seen in the host-side runner above: a CI or other setup that has several development boards attached, needs to specify using these environment variables the unique “probe identifier” of each one. This feature will get more of a workout in a future blog post when a second target platform is added.
  6. Separation of concerns — So far, this is good but not perfect. Device-side code that can be compiled for the host goes in the top-level workspace (perhaps alongside host-side code that can’t be compiled for the device). Device-side code that can’t be compiled for the host, goes in one of the crates under the “cross” directory. (Maybe one day those too should become workspaces?) Entire device-side applications (or, at the very least, example ones) can live there too.

    At some level it would be appealing for (say) system-tests for SSDP to go somewhere under the cotton-ssdp create. But device-side tests are inherently device-specific, and given N crates and M devices and a potentially N×M-sized testing matrix, it seemed better to keep the tests by device rather than by crate-under-test, on the basis that all engineers working with embedded Rust (in the device directories) are surely also familiar with host-side Rust, but the reverse (engineers working on host-side Rust in, say, the SSDP crate being familiar with embedded Rust) seems less guaranteed.

    Also, all the code in the systemtests crate, including much of build.rs, is generic across any crate wanting system-testing, and isn’t Cotton-specific. As of this blog post, no better advice is being offered here than to copy-and-paste it into your own projects, but in the future it would be more Rust-like to offer this functionality in free-standing crates that other projects desiring system-testing could just import in the normal way.

That Github link once again

You can see the merge that creates this infrastructure at commit 181a8fdc and the whole tree at that commit here on Github.

Similar blogs elsewhere

Ferrous Systems have a series of blog posts covering very similar topics, though they use cargo-xtask to construct their single build commands.

Continue to...

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.