Previously on #rust:
      
      
  
    
    Thanks
    to 
earlier parts
    of this series, developers of Cotton now have the ability to run
    automated system-tests of the embedded builds using their own
    computer as the test host — 
if they have the right
    STM32F746-Nucleo development board to hand. What we need to do
    now, is add the ability
    for 
continuous
    integration to run those tests automatically on request (e.g.
    whenever a feature branch is pushed to the central git server).
    For at least the third time on this blog, we’re going to
    employ a Raspberry Pi 3; the collection of those sitting
    unused in my desk drawer somehow never seems to get any smaller.
    (And the supply shortages of recent
    years 
seem to have
    abated.)
  
First, set up Ubuntu 22.04 with USB boot and headless full-disk
    encryption in the usual way.
  OK, perhaps doing so isn’t all that everyday (that remark
    was slightly a dig at Larousse Gastronomique, at least one
    recipe in which starts, “First, make a brioche in the usual
    way”).
    But this
    very blog has exactly the instructions you need. This time,
    instead of a 512GB Samsung USB SSD, I used a 32GB Sandisk USB
    flash drive — the test-runner won’t be needing tons of
    storage.
  Alternatively, you could use any other reasonable computer you’ve
    got lying around — an Intel NUC, say, or a cast-off laptop. Most
    or all of the following steps will apply to any freshly-installed
    Ubuntu box, and many of them to other Linux distributions too. But
    you’ll have the easiest time of it if your CI test-runner has the
    same architecture and OS as your main CI server, which is another
    reason I’m pressing yet another Raspberry Pi into service
    here.
  However you get there, what you need to proceed with the rest of this
    post (at least the way I proceeded) is:
  
    - An existing
        Laminar CI server, on your local Ethernet.
    
- A Raspberry Pi to become the test-runner;
    
- with Ubuntu 22.04 freshly installed;
    
- plugged into your local Ethernet, able to DHCP (I gave mine a
      fixed address in OpenWRT’s DHCP server setup) — or,
      at least, somewhere where the CI server can SSH to it;
    
- with your own user, that you can SSH in as, and use sudo.
    
- A USB-to-Ethernet adaptor (I used
    the
    Amazon Basics one);
    
- an Ethernet switch, at least 100Mbit (I used this
    TP-Link LS1008);
    
- and an STM32F746-Nucleo development board.
  
In this blog post we’re going to:
  
    - Connect everything together, making a
    separate test network
    
- Automate the remaining setup of the
    test-runner, using Ansible
    
- Arrange that the CI server can SSH to the
    test-runner autonomously
    
- Run a trivial CI job on the test-runner
    
- Package up the system-tests and run a real
    CI job on the test-runner
    
- Go and sit down and think about what we’ve
    done
  
1. Connect everything together, making a
  separate test network
  For once, instead of being a whimsical stock photo of something
    tangentially related, the image at the top of this blog post is an
    actual photograph of the actual physical items being discussed
    herein (click to enlarge if need be). The only
    connections to the outside world are from my home network to the
    Raspberry Pi’s built-in Ethernet (lower left) and power
    to the Raspberry Pi and to the Ethernet switch (top left and
    centre bottom). The test network is otherwise self-contained: the
    STM32 development board is on a private Ethernet segment
    with just the Raspberry Pi’s USB Ethernet for company.
    This network has its
    own RFC1918
    address range, 192.168.3.x, distinct from the rest of the home
    network. (The Wifi interface on the Raspberry Pi is not
    currently being used.)
  The breadboard is attached to the Raspberry Pi’s GPIO
    connector, and at present is only used to provide a “testing
    in progress” LED attached to GPIO13 (glowing white in the
    photo, but therefore hard to see). GPIO usage could become more
    sophisticated in the future: for instance, if I was writing a HAL
    crate for a new embedded device, I could connect the
    Raspberry Pi’s GPIO inputs to the GPIO outputs of the
    embedded device (and vice-versa) and system-test my GPIO code.
  The Raspberry Pi programs the STM32 over USB; apart from
    that and the Ethernet adaptor, the USB flash drive also takes up
    one of the four USB sockets, leaving just one spare for future
    enhancements. (But hubs are a thing, of course.)
  2. Automate the remaining setup of the
  test-runner, using Ansible
  The initial
    setup of a new Ubuntu installation is arguably best done
    manually, as you might need to react to things that the output of
    the commands is telling you. But once the basics are in place,
    the complexities of setting up everything else that the test-runner
    needs, are best automated — so they can be repeatable,
    documented, and explainable to others (in this case:
    you).
  Automating server setup is not new news to cloud engineers, who
    often use tools such as Chef or Puppet to bring one or more new
    hosts or containers up-to-speed in a single command. Electric Imp
    used Chef, which I never really got on with, partly because of the
    twee yet mixed metaphors (“Let’s knife this cookbook
    — solo!”, which is a thing all chefs say), but mostly
    because it was inherently bound up with Ruby. Yet I felt I needed
    a bit more structure than just “copy on a shell script and
    run it”. So for #homelab purposes, I thought I’d try
    Ansible.
  Ansible is configured using YAML files, which at least is one
    step up from Ruby as a configuration language. The main
    configuration file is called a “playbook”, which
    contains “plays” (think sports, not theatre), which
    are in turn made up of individual “tasks”. A task can
    be as simple as executing a single shell command, but the benefit
    of Ansible is that it comes
    with a
    huge variety of add-ons which allow tasks to be written in a
    more expressive way. For instance, instead of faffing about to
    automate cron, there’s a “cron” add-on which
    even knows about the @reboot directive and lets you
    write:
    - name: init gpios on startup
      cron:
        name: init-gpios
        special_time: reboot
        job: /usr/local/bin/init-gpios
  Tasks can be linked to the outcome of earlier tasks, so that for
    instance it can restart the DHCP server if, and only if, the DHCP
    configuration has been changed. With most types of task, the
    Ansible configuration is “declarative”: it describes
    what the situation
    ought to be, and Ansible checks whether that’s already the
    case, and changes things only where they’re not as desired.
  Ansible can be used for considerably more complex setups than the
    one in this blog post — many, many servers of different
    types that all need different things doing to them — but
    I’ve made an effort at least to split up the playbook into plays
    relevant to basically any machine (hosts: all), or ones
    relevant just to the Raspberry Pi (hosts:
    raspberrypis), or ones specific to the rôle of being a
    system-test runner (hosts: testrunners).
 I ended up with 40 or so tasks in the one playbook, which between
   them install all the needed system packages as root, then install
   Rust, Cargo and probe-rs as
   the laminar user, then set up the USB Ethernet adaptor
   as eth1 and run a DHCP server on it.
  Declaring which actual machines make up “all”,
    “testrunners”, etc., is the job of a separate
    “inventory” file; the one included in the repository
    matches my setup at home. The inventory file is also the place to
    specify per-host information that shouldn’t be hard-coded
    into the tasks: in this case, the DHCP setup tasks need to know
    the MAC address of the USB Ethernet adaptor, but that’s
    host-specific, so it goes in the inventory file.
  All the tasks in the
  main YAML file are
  commented, so I won’t re-explain them here, except to say that
  the “mark wifi optional”, “rename eth1”, and
  “set static IP for eth1” tasks do nothing to dispel my
  suspicion that Linux networking is nowadays just a huge seven-layer
  dip of xkcd-927 all the way
  down, with KDE and probably Gnome only adding their own frothy
  outpourings on top.
  I added a simple Makefile to the systemtests/ansible
    directory, just to automate running the one playbook with the one
    inventory.
  The name
    “Ansible”
    comes originally from science-fiction, where it’s used to mean a
    device that can communicate across deep space without experiencing
    speed-of-light latency. I only mention this because when
    communicating across about a metre of my desk, it’s bewilderingly
    slow — taking seconds to update each line
    of sshd_config. That’s about the same as speed-of-light
    latency would be if the test-runner was on the Moon.
  But having said all that, there’s still value in just being able
    to re-run Ansible and know that everything is set consistently and
    repeatably.
  I did wonder about running Ansible itself under CI — after
    all, it’s software, it needs to be correct when infrequently
    called upon, so it ought therefore to be automatically tested to
    avoid bugs creeping in. But running Ansible needs root (or sudo)
    access on the test-runner, which in turn means it needs SSH access
    to the test-runner as a user which is capable of sudo — and
    I don’t want to leave either of those capabilities lying around
    unencrypted-at-rest in CI. So for the time being it’s down to an
    unreliable human agent — that’s me — to
    periodically run make in the ansible
    directory.
  3. Arrange that the CI server can SSH to
  the test-runner autonomously
  Most of the hosts in this series
  of #homelab
  posts are set up with, effectively, zero-trust networking: they can
  only be accessed via SSH, and all SSH sessions start with me
  personally logged-in somewhere and
  running ssh-agent
  (and using agent-forwarding). But because the CI server needs to be
  able to start system-test jobs on the test-runner, it needs to be
  able to login via SSH completely autonomously.
  This isn’t as alarming as it sounds, as the user it logs
    into (the laminar user on the test-runner) isn’t very
    privileged; in particular, it’s not in the sudo group and
    thus can’t use sudo at all. (The Ansible setup explicitly
    grants that user permissions to the hardware it needs to
    access.)
  Setting this up is a bit like setting up your own SSH for the
    first time. First generate a key-pair:
  ssh-keygen -t ed25519
  — when prompted, give the empty passphrase, and choose
    “laminar-ssh” as the output file. The public key will
    be written to “laminar-ssh.pub”.
  The public key needs to be added
    to ~laminar/.ssh/authorized_keys (American spelling!) on
    the test-runner; the Ansible setup already does this for my own CI
    server’s public key.
  Once authorized_keys is in place, you can test the setup using:
  ssh -i laminar-ssh laminar@scotch
  Once you’re happy that it works, copy the
  file laminar-ssh as
    ~laminar/.ssh/id_ed25519 on the main CI server
    (not the test-runner!):
sudo cp ~peter/laminar-ssh ~laminar/.ssh/id_ed25519
sudo chown -R laminar.laminar ~laminar/.ssh
sudo chmod 0700 ~laminar/.ssh
  You can test that setup by using
    this command on the CI server (there should be no password or
    pass-phrase prompt):
  sudo -u laminar ssh scotch
  — indeed, you need to do this at least once, in order
    to reassure the CI server’s SSH client that you trust the
    test-runner’s host key.
4. Run a trivial CI job on the test-runner
Now that the laminar user on the CI server can SSH freely
  to scotch, what remains is mostly Laminar setup. This part
  is adapted very closely from Laminar’s own documentation: we
  set up a “context” for the test-runner, separate from
  the default context used by all the other jobs (because the
  test-runner can carry on when other CPU-intensive jobs are running
  on the CI server), then add a remote job that executes in that
  context.
  | /var/lib/laminar/cfg/contexts/test-runner-scotch.conf | 
|---|
  | EXECUTORS=1 | 
  | /var/lib/laminar/cfg/contexts/test-runner-scotch.env | 
|---|
  | RUNNER=scotch | 
  The context is partly named after the test-runner host, but it
    also includes the name of the test-runner host as an environment
    variable. This means that the job-running scripts don’t need to
    hard-code that name.
As before, the actual
  job file in the jobs directory defers all the complexity to
  a generic do-system-tests script in the scripts
  directory:
  | /var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run | 
|---|
  | #!/bin/bash -xe
exec do-system-tests | 
In keeping with the Laminar philosophy of not poorly
  reinventing things that already exist, Laminar itself has no
  built-in support for running jobs remotely — because that’s
  what SSH is for. This, too, is closely-inspired by the Laminar
  documentation:
  | /var/lib/laminar/cfg/scripts/do-system-tests | 
|---|
  | #!/bin/bash -xe
ssh laminar@$RUNNER /bin/bash -xe << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  uname -a
  run-parts /etc/update-motd.d
  sleep 20
  echo 0 > /sys/class/gpio/gpio13/value
EOF | 
  This, of course, doesn’t run any actual tests, but it
    provides validation that the remote-job mechanism is working. The
    LED attached to GPIO13 on the test-runner Raspberry Pi serves
    as the “testing in progress” indicator. (And
    the run-parts invocation is an Ubuntu-ism: it reproduces
    the “message of the day”, the welcome message
    that’s printed when you log in. Most of it is adverts these
    days.)
  Tying the job to the context is the $JOB.conf file:
  | /var/lib/laminar/cfg/jobs/cotton-system-tests-dev.conf | 
|---|
  | CONTEXTS=test-runner-* | 
  Due to the judicious use of a wildcard, the job can run
    on any test-runner; my setup only has the one, but if you
    found yourself in a team with heavy contention for the system-test
    hardware, this setup would allow you to build a second, identical
    Raspberry Pi with all the same hardware attached —
    called shepherds, maybe — and add it as a separate
    Laminar context. Because Laminar runs a job as soon as any
    context it’s fit for becomes available, this
    would automatically split queued system-test jobs across
    the two test-runners.
 With all of this in place, it’s time to trigger the job on the
   CI server:
laminarc queue cotton-system-tests-dev
 After a few teething troubles, including the thing I mentioned
   above about making sure that the SSH client
   accepts scotch’s host key, I was pleased to see the
   “testing in progress” LED come on and the
   message-of-the-day spool out in the Laminar logs.
 5. Package up the system-tests and run a real CI job on the test-runner
 We didn’t come here just to read some Ubuntu adverts in the
   message-of-the-day. Now we need to do the real work of building
   Cotton for the STM32 target, packaging-up the results, and
   transferring them to the test-runner where they can be run on the
   target hardware. First we build:
  | /var/lib/laminar/cfg/jobs/cotton-embedded-dev.run | 
|---|
  | #!/bin/bash -xe
PROJECT=cotton
RUST=stable
BRANCH=${BRANCH-main}
SOURCE=/var/lib/laminar/run/$PROJECT/workspace
(
    flock 200
    cd $SOURCE/$PROJECT
    git checkout $BRANCH
    cd -
    cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock
source $HOME/.cargo/env
rustup default $RUST
cd $PROJECT
cargo build -p systemtests -F arm,stm32f746-nucleo --all-targets
cargo test --no-run -p systemtests -F arm,stm32f746-nucleo 2> $ARCHIVE/binaries.txt
grep "Executable tests/" $ARCHIVE/binaries.txt  | cut -d'(' -f 2 | cut -d')' -f 1 > binaries.2.txt
tar cf $ARCHIVE/binaries.tar `find cross/*/target -type f -a -executable \
        | grep -v /deps/ | grep -v /build/` `cat binaries.2.txt`
laminarc queue cotton-system-tests-dev PARENT_RUN=$RUN
exec prune-archives cotton-embedded-dev 10 | 
The actual build commands look much
like many
of the other Laminar jobs but with the extra Cargo features
added which
enable the cross-compiled targets; the interesting parts of this
script come once the cargo build is done and the results must
be tarred-up ready to be sent to the test-runner.
Finding all the target binaries is fairly easy using find
cross/*/target, but we also need to find the host binary from the
  systemtests package. The easiest way to do that is to parse the output
  of cargo test --no-run, which includes lines such as:
   Compiling systemtests v0.0.1 (/home/peter/src/cotton/systemtests)
    Finished test [unoptimized + debuginfo] target(s) in 2.03s
  Executable unittests src/lib.rs (target/debug/deps/systemtests-4a9b67de54149231)
  Executable tests/device/main.rs (target/debug/deps/device-cfdcb3ff3e5eaaa5
  The line with “Executable tests” is the one
    we’re looking for. (The string of hex digits after the
    executable name changes every time the sources change.) It’s
    possible that we could “cheat” here and just pick the
    first file we find starting
    with target/debug/deps/device-, as this is CI so
    we’re always building from clean — but this is a more
    robust way of determining the most recent binary.
  (You might feel that this section is a bit of a cop-out, a bit
    white-box: knowing that there’s only one host-side binary
    does make the packaging a lot easier. If there were a lot of
    host-side binaries to package, and this technique started creaking
    at the seams, I’d look
    into cargo-nextest which
    has features
    specifically designed for packaging and unpackaging suites of
    Cargo tests.)
  Once everything the system-tests job will need is stored
    in $ARCHIVE/binaries.tar, we can trigger the system-tests
    job — making sure to tell it,
    in $PARENT_RUN, which build in the archive it
    should be testing. (Initially I had the system-tests job use
    “latest”, but that’s wrong: it doesn’t
    handle multiple queued jobs correctly, and has a race condition
    even without queued jobs. The “latest” archive is that
    of the most recent successfully-finished job — but
    the build job hasn’t yet finished at the time it
    triggers the test job.)
  The final prune-archives command is something I added
  after the initial Laminar writeup when some of the
  archive directories (particularly doc and coverage) started getting
  big: it just deletes all but the most recent N non-empty archives:
  | /var/lib/laminar/cfg/scripts/prune-archives | 
|---|
  | #!/bin/bash -xe
PROJECT=$1
KEEP=${2-2}
cd /var/lib/laminar/archive/$PROJECT
for d in `find * -maxdepth 0 -type d -a \! -empty | sort -n | head -n -$KEEP`; do
    rm -r /var/lib/laminar/archive/$PROJECT/$d
done | 
  No-one likes deleting data, but in this case older archives
    should all be recoverable at any time, if the need arises, just by
    building the source again at that revision.
  Now the cotton-system-tests-dev.run job needs to pass the
    $PARENT_RUN variable on to the underlying script:
  | /var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run | 
|---|
  | #!/bin/bash -xe
exec do-system-tests cotton-embedded-dev $PARENT_RUN | 
  and the do-system-tests script can use it to recover the tarball
    and scp it off to the test-runner:
  | /var/lib/laminar/cfg/scripts/do-system-tests | 
|---|
  | #!/bin/bash -xe
PARENT_JOB=$1
PARENT_RUN=$2
scp /var/lib/laminar/archive/$PARENT_JOB/$PARENT_RUN/binaries.tar laminar@$RUNNER:
ssh laminar@$RUNNER /bin/bash -xeE << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  cleanup_function() {
    echo 0 > /sys/class/gpio/gpio13/value
    exit 1
  }
  trap 'cleanup_function' ERR
  export PS4='+ \t '
  export PATH=/home/laminar/.cargo/bin:$PATH
  rm -rf tests
  mkdir -p tests/systemtests
  ( cd tests
    tar xf ../binaries.tar
    cd systemtests
    export CARGO_MANIFEST_DIR=`pwd`
    ../target/debug/deps/device-* --list
    ../target/debug/deps/device-* --test
  )
  echo 0 > /sys/class/gpio/gpio13/value
EOF | 
  The rest of the script has also gained in features and
  complexity. It now includes
  a trap
  handler to make sure that the testing-in-progress LED is
  extinguished even if the tests fail with an error.
  (See here
    for why this requires the -E flag to /bin/bash.)
  The script goes on to add timestamps to the shell output (and
    thus to the logs)
    by adding \t
    to PS4, and add the Cargo bin directory to the path
    (because that’s where probe-rs got installed by Ansible).
  The tests themselves need to be executed as if from
    the systemtests directory of a full checkout of Cotton
    — which we don’t have here on the test-runner — so the
    directories must be created manually. With all that in place, we
    can finally run the host-side test binary, which will run all the
    device-side tests including flashing the STM32 with the binaries, found via
    relative paths
    from $CARGO_MANIFEST_DIR.
  That’s a lot, but it is all we need to successfully list
    and run all the device-side tests on our test-runner. Here’s (the
    best part of) the Laminar logs from a successful run:
  + 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --list
stm32f746_nucleo::arm_stm32f746_nucleo_dhcp: test
stm32f746_nucleo::arm_stm32f746_nucleo_hello: test
stm32f746_nucleo::arm_stm32f746_nucleo_ssdp: test
3 tests, 0 benchmarks
+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --test
running 3 tests
test stm32f746_nucleo::arm_stm32f746_nucleo_dhcp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 66.37s
+ 16:48:00 echo 0
  And now that everything’s working, we can add it to the chain of
    events that’s triggered whenever a branch is pushed to the CI server:
  | /var/lib/laminar/cfg/job/cotton-dev.run | 
|---|
  | #!/bin/bash -xe
BRANCH=${BRANCH-main}
do-checkout cotton $BRANCH
export LAMINAR_REASON="built $BRANCH at `cat git-revision`"
laminarc queue \
	 cotton-embedded-dev BRANCH=$BRANCH \
	 cotton-doc-dev BRANCH=$BRANCH \
         cotton-grcov-dev BRANCH=$BRANCH \
         cotton-msrv-dev BRANCH=$BRANCH \
         cotton-beta-dev BRANCH=$BRANCH \
         cotton-nightly-dev BRANCH=$BRANCH \
	 cotton-minver-dev BRANCH=$BRANCH
 | 
  If you’re wondering about the -dev suffix on all those
    jobs: I set up Laminar with two parallel sets of identical build
    jobs. There’s the ones with -dev which are triggered when
    pushing feature branches, and the ones without the -dev
    which are triggered when pushing to main. This is arranged by the
    Git server post-receive hook:
  | git/cotton.git/hooks/post-receive | 
|---|
  | #!/bin/bash -ex
while read oldrev newrev ref
do
    if [ "${ref:0:11}" == "refs/heads/" -a "$newrev" != "0000000000000000000000000000000000000000" ];
    then
        export BRANCH=${ref:11}
        export LAMINAR_REASON="git push $BRANCH"
        if [ "$BRANCH" == "main" ];
        then
           laminarc queue cotton BRANCH=$BRANCH
        else
           laminarc queue cotton-dev BRANCH=$BRANCH
        fi
    fi
done | 
  In a sense there’s no engineering need for this duplication:
    exactly the same actual work is undertaken in either case. But
    there’s a social need: feature branches
    (with -dev) aren’t expected to always pass all tests
    — indeed, such branches are often pushed for the express
    purpose of determining whether or not they pass all tests. But the
    main branch is expected to pass all tests, all the time,
    and a regression is to be taken seriously. That is: if
    the cotton-dev build or one of its downstreams fails,
    that’s got a very different implication from the
    cotton build or one of its downstreams failing. The
    cotton build itself should enjoy long, uninterrupted
    streaks of regression-free passes (and indeed it does; the last
    failure was in November 2023 due
    to this
    issue causing intermittent unit-test failures).
6. Go and sit down and think about what we’ve
done
  Well, what have we done? We have, over the course of three
    blog posts, taken a bit of knowledge of bash, cron, and SSH that
    we already had, then gone and learned a bit about Laminar,
    Ansible, and Cargo, and the “full stack” that we then
    engineered for ourselves using all that knowledge is this: any
    time we push a feature branch, we (eventually) get told
    automatically, as a straight yes/no answer, whether it’s OK
    for main or not. That’s an immensely powerful thing to have,
    an immensely useful property to have an oracle for.
  Having that facility available is surely expected these days in
    other areas of software engineering — where
    test-harnesses are easier to create — but I hope I’ve
    demonstrated in these blog posts that even those working on
    embedded systems can also enjoy and benefit from the reliability
    (of main in particular) that’s due to this type of workflow.
    
  (Other versions of the workflow could be constructed if
    preferred. Perhaps you don’t want every push to start
    a system-test sequence — in that case, you could either
    write a more complex Git post-receive hook, or set up an
    alternative Git remote, called perhaps “tests”, so
    that pushing to that remote kicked off the test sequence.
    Or you could tie the test sequence in to your pull-request process
    somehow.)
  To repeat a line from the previous instalment, any software
      you need to test can be tested automatically in some way that is
      superior to not testing it at all. Compared to
      Rust’s unit tests, which are always just
      a cargo test away, it took us three quite detailed blog
      posts and a small pile of physical hardware to get us to the
      stage where this embedded code could be tested automatically. If
      I had to distill the message of these three posts from their
      ~11,000 words down to just six, they’d be: this sort of
      effort is worthwhile. If your product is, or runs on, a
      specific platform or piece of hardware, it’s worth
      spending a lot of effort arranging to test automatically
      on the actual target hardware, or as near to the actual target
      as is practical. (Sometimes embedded products are locked-down,
      fused-off, potted, or otherwise rendered inaccessible; testing,
      in that case, requires access to unlocked variants.)
  
  That is to say: is the effort worth it for the cotton-ssdp crate
    — a few thousand lines of code, about 60% of which
    is already tests, and the rest of which
    has 100% test
    coverage? Arguably yes, but also arguably no, especially as
    almost all of cotton-ssdp can be tested in hosted builds. The
    cotton-ssdp crate has acted here more as a spike, a
    proof-of-concept. But the point is, the concept was proved,
    a baseline has now been set, and all the right testing
    infrastructure is in place if I want to write a power-aware RTOS,
    or implement HAL crates for some of these weird development boards
    in my desk drawer, or if I want to disrupt the way PAC crates are
    generated in order to improve the testing story of HAL crates. Now
    when I start doing those things, I can start defending the
    functionality with system-tests from the outset. If I want to do
    those more complex, more embedded-centric things — which I
    do — then all the effort expended so far will ultimately be
    very beneficial indeed. If you, too, aim to do complex or
    embedded-centric things, then similar levels of effort will
    benefit your projects.
  6.1 War stories from the front lines of not-system-testing
  I have some war stories for you. I heard tell of a company back
    in the day whose product, a hardware peripheral device, was (for
    sound commercial reasons) sold as working with numerous wonky
    proprietary Unixes. But for some of the more obscure platforms,
    there had been literally no automated testing: a release
    was declared by development, it was thrown over the wall into the
    QA department, and in due course a human tester would physically
    insert the CD-R into one of these wonky old machines and manually
    run through a test script ensuring that everything appeard to
    work. This was such a heavyweight process that it was run very
    rarely — meaning that, if an issue was found on, say, AIX,
    then the code change that caused it probably happened months ago
    and with significant newer work built on top of it. And of course
    such a discovery at “QA time” meant that the whole,
    lumbering manual release process had to be started again from
    scratch once the issue was fixed. This was exactly the pathology
    that CI was invented to fix! I mean, I’m pretty sure Laminar
    doesn’t support antediluvian AIX versions out of the box,
    but given the impact of any issues on release schedules, it
    was definitely worth their putting in quite significant
    development effort to bring the installation process under CI
    — automatically on main (at least nightly, if not on every
    push), and by request on any feature branch. (Developers need a
    way to have confidence they haven’t broken AIX before
    merging to main.) They should have asked
    themselves, “What can be done to automate testing of the
    install CD, in some way that is superior to not testing it at
    all?” — to place it under the control of a CI
    test-runner, as surely as the STM32F746-Nucleo is under the
    control of this Raspberry Pi? Well — what’s the
    simplest thing that can act as a fake CD-ROM drive well enough to
    fool an AIX box? Do those things have USB host? Can you bitbang
    SCSI-1 target on a Raspberry Pi? Would
    a BlueSCSI help? Or even if
    they were to decide that “CI-ing” the actual install
    CD image is too hard — how often are the failures
    specifically CD-related? Could they just SSH on the installer
    tarballs and test every other part of the process? How did this
    pathology last for more than one single release?
  I also heard tell of a different company whose product was an
    embedded system, and was under automated test, including
    before merging to main — but following a recent “urgent”
    porting exercise (again for sound commercial reasons), many of the
    tests didn’t pass. The test harness they used supported
    marking tests as expected-failure — but no-one bothered
    doing so. So every test run “failed”, and
    developers had to manually pore over the entire list of test
    results to determine whether they’d regressed anything. In a
    sense the hard part of testing was automated, but the easy part
    not! This company had put in 99% of the effort towards automated
    testing, but were reaping just a tiny fraction of the benefits,
    because the very final step — looking at the test results
    and turning them into a yes/no for whether the code is OK for main
    — was not automated. How did this pathology last
    for more than one single afternoon?
  
  6.2 People
  The rhetorical-looking questions posed above about the AIX CI and
    the expected-fail tests (“How did these obviously wrong
    situations continue?”) did in fact have an answer in the
    real world. Indeed, it was the same answer in both
    cases: people.
  In the first company, the head of QA presided over a large and
    (self-)important department — which it needed to be in order
    to have enough staff to manually perform quite so much automatable
    work. If QA was run with the attitude that human testers are
    exploratory testers, squirrelers-out of corner-cases, stern
    critics of developers’ assumptions — if the
    testers’ work-product, including during
    early-development-phase involvement, was more and better
    automated tests — then they probably wouldn’t need
    anything like so many testers, and the rôle of
    “Head of QA” would perhaps be viewed as a less big
    cheese than hitherto. Although the product quality would benefit,
    the company’s bottom-line would benefit, and even the
    remaining testers’ career-progression would benefit —
    the Head of QA’s incentives were misaligned with all of
    that, and they played the game they were given by the rules that
    they found to be in effect.
  
  The second company seems harder to diagnose, but fundamentally
    the questions are, “Who is in charge of setting the
    quality bar for merges to main?” and “Who is in
    charge of what happens when that bar is not met?”. Those
    are likely to be two different people — they require very
    different skills — but if you find that a gulf is opening up
    between your team’s best practices and your
    team’s typical practices, then both those people are needed
    in order to bring the two closer together again. (In my own career
    I’ve a few times acted as the first one, but I’ve never acted as
    the second one: as Dr. Anthony Fauci famously didn’t say,
    “I
    don’t know how to explain to you that you should care
    for other people software quality.”)
    
  This post, and this blog, and this blogger, cannot help you to
    deal with people. But often the talking point of the people you
    need to convince (or whose boss you need to convince to overrule
    them) is that better automated testing
    isn’t technologically feasible, isn’t worth attempting. I
    hope I’ve done a little to dispel that myth at least.