Not in fact any relation to the famous large Greek meal of the same name.

Tuesday 2 April 2024

System-testing embedded code in Rust, part three: A CI test-runner

Previously on #rust:

     
Thanks to earlier parts of this series, developers of Cotton now have the ability to run automated system-tests of the embedded builds using their own computer as the test host — if they have the right STM32F746-Nucleo development board to hand. What we need to do now, is add the ability for continuous integration to run those tests automatically on request (e.g. whenever a feature branch is pushed to the central git server). For at least the third time on this blog, we’re going to employ a Raspberry Pi 3; the collection of those sitting unused in my desk drawer somehow never seems to get any smaller. (And the supply shortages of recent years seem to have abated.)

First, set up Ubuntu 22.04 with USB boot and headless full-disk encryption in the usual way.

OK, perhaps doing so isn’t all that everyday (that remark was slightly a dig at Larousse Gastronomique, at least one recipe in which starts, “First, make a brioche in the usual way”). But this very blog has exactly the instructions you need. This time, instead of a 512GB Samsung USB SSD, I used a 32GB Sandisk USB flash drive — the test-runner won’t be needing tons of storage.

Alternatively, you could use any other reasonable computer you’ve got lying around — an Intel NUC, say, or a cast-off laptop. Most or all of the following steps will apply to any freshly-installed Ubuntu box, and many of them to other Linux distributions too. But you’ll have the easiest time of it if your CI test-runner has the same architecture and OS as your main CI server, which is another reason I’m pressing yet another Raspberry Pi into service here.

However you get there, what you need to proceed with the rest of this post (at least the way I proceeded) is:

  • An existing Laminar CI server, on your local Ethernet.
  • A Raspberry Pi to become the test-runner;
  • with Ubuntu 22.04 freshly installed;
  • plugged into your local Ethernet, able to DHCP (I gave mine a fixed address in OpenWRT’s DHCP server setup) — or, at least, somewhere where the CI server can SSH to it;
  • with your own user, that you can SSH in as, and use sudo.
  • A USB-to-Ethernet adaptor (I used the Amazon Basics one);
  • an Ethernet switch, at least 100Mbit (I used this TP-Link LS1008);
  • and an STM32F746-Nucleo development board.

In this blog post we’re going to:

  1. Connect everything together, making a separate test network
  2. Automate the remaining setup of the test-runner, using Ansible
  3. Arrange that the CI server can SSH to the test-runner autonomously
  4. Run a trivial CI job on the test-runner
  5. Package up the system-tests and run a real CI job on the test-runner
  6. Go and sit down and think about what we’ve done

1. Connect everything together, making a separate test network

For once, instead of being a whimsical stock photo of something tangentially related, the image at the top of this blog post is an actual photograph of the actual physical items being discussed herein (click to enlarge if need be). The only connections to the outside world are from my home network to the Raspberry Pi’s built-in Ethernet (lower left) and power to the Raspberry Pi and to the Ethernet switch (top left and centre bottom). The test network is otherwise self-contained: the STM32 development board is on a private Ethernet segment with just the Raspberry Pi’s USB Ethernet for company. This network has its own RFC1918 address range, 192.168.3.x, distinct from the rest of the home network. (The Wifi interface on the Raspberry Pi is not currently being used.)

The breadboard is attached to the Raspberry Pi’s GPIO connector, and at present is only used to provide a “testing in progress” LED attached to GPIO13 (glowing white in the photo, but therefore hard to see). GPIO usage could become more sophisticated in the future: for instance, if I was writing a HAL crate for a new embedded device, I could connect the Raspberry Pi’s GPIO inputs to the GPIO outputs of the embedded device (and vice-versa) and system-test my GPIO code.

The Raspberry Pi programs the STM32 over USB; apart from that and the Ethernet adaptor, the USB flash drive also takes up one of the four USB sockets, leaving just one spare for future enhancements. (But hubs are a thing, of course.)

2. Automate the remaining setup of the test-runner, using Ansible

The initial setup of a new Ubuntu installation is arguably best done manually, as you might need to react to things that the output of the commands is telling you. But once the basics are in place, the complexities of setting up everything else that the test-runner needs, are best automated — so they can be repeatable, documented, and explainable to others (in this case: you).

Automating server setup is not new news to cloud engineers, who often use tools such as Chef or Puppet to bring one or more new hosts or containers up-to-speed in a single command. Electric Imp used Chef, which I never really got on with, partly because of the twee yet mixed metaphors (“Let’s knife this cookbook — solo!”, which is a thing all chefs say), but mostly because it was inherently bound up with Ruby. Yet I felt I needed a bit more structure than just “copy on a shell script and run it”. So for #homelab purposes, I thought I’d try Ansible.

Ansible is configured using YAML files, which at least is one step up from Ruby as a configuration language. The main configuration file is called a “playbook”, which contains “plays” (think sports, not theatre), which are in turn made up of individual “tasks”. A task can be as simple as executing a single shell command, but the benefit of Ansible is that it comes with a huge variety of add-ons which allow tasks to be written in a more expressive way. For instance, instead of faffing about to automate cron, there’s a “cron” add-on which even knows about the @reboot directive and lets you write:

    - name: init gpios on startup
      cron:
        name: init-gpios
        special_time: reboot
        job: /usr/local/bin/init-gpios

Tasks can be linked to the outcome of earlier tasks, so that for instance it can restart the DHCP server if, and only if, the DHCP configuration has been changed. With most types of task, the Ansible configuration is “declarative”: it describes what the situation ought to be, and Ansible checks whether that’s already the case, and changes things only where they’re not as desired.

Ansible can be used for considerably more complex setups than the one in this blog post — many, many servers of different types that all need different things doing to them — but I’ve made an effort at least to split up the playbook into plays relevant to basically any machine (hosts: all), or ones relevant just to the Raspberry Pi (hosts: raspberrypis), or ones specific to the rôle of being a system-test runner (hosts: testrunners).

I ended up with 40 or so tasks in the one playbook, which between them install all the needed system packages as root, then install Rust, Cargo and probe-rs as the laminar user, then set up the USB Ethernet adaptor as eth1 and run a DHCP server on it.

Declaring which actual machines make up “all”, “testrunners”, etc., is the job of a separate “inventory” file; the one included in the repository matches my setup at home. The inventory file is also the place to specify per-host information that shouldn’t be hard-coded into the tasks: in this case, the DHCP setup tasks need to know the MAC address of the USB Ethernet adaptor, but that’s host-specific, so it goes in the inventory file.

All the tasks in the main YAML file are commented, so I won’t re-explain them here, except to say that the “mark wifi optional”, “rename eth1”, and “set static IP for eth1” tasks do nothing to dispel my suspicion that Linux networking is nowadays just a huge seven-layer dip of xkcd-927 all the way down, with KDE and probably Gnome only adding their own frothy outpourings on top.

I added a simple Makefile to the systemtests/ansible directory, just to automate running the one playbook with the one inventory.

The name “Ansible” comes originally from science-fiction, where it’s used to mean a device that can communicate across deep space without experiencing speed-of-light latency. I only mention this because when communicating across about a metre of my desk, it’s bewilderingly slow — taking seconds to update each line of sshd_config. That’s about the same as speed-of-light latency would be if the test-runner was on the Moon.

But having said all that, there’s still value in just being able to re-run Ansible and know that everything is set consistently and repeatably.

I did wonder about running Ansible itself under CI — after all, it’s software, it needs to be correct when infrequently called upon, so it ought therefore to be automatically tested to avoid bugs creeping in. But running Ansible needs root (or sudo) access on the test-runner, which in turn means it needs SSH access to the test-runner as a user which is capable of sudo — and I don’t want to leave either of those capabilities lying around unencrypted-at-rest in CI. So for the time being it’s down to an unreliable human agent — that’s me — to periodically run make in the ansible directory.

3. Arrange that the CI server can SSH to the test-runner autonomously

Most of the hosts in this series of #homelab posts are set up with, effectively, zero-trust networking: they can only be accessed via SSH, and all SSH sessions start with me personally logged-in somewhere and running ssh-agent (and using agent-forwarding). But because the CI server needs to be able to start system-test jobs on the test-runner, it needs to be able to login via SSH completely autonomously.

This isn’t as alarming as it sounds, as the user it logs into (the laminar user on the test-runner) isn’t very privileged; in particular, it’s not in the sudo group and thus can’t use sudo at all. (The Ansible setup explicitly grants that user permissions to the hardware it needs to access.)

Setting this up is a bit like setting up your own SSH for the first time. First generate a key-pair:

ssh-keygen -t ed25519

— when prompted, give the empty passphrase, and choose “laminar-ssh” as the output file. The public key will be written to “laminar-ssh.pub”.

The public key needs to be added to ~laminar/.ssh/authorized_keys (American spelling!) on the test-runner; the Ansible setup already does this for my own CI server’s public key.

Once authorized_keys is in place, you can test the setup using:

ssh -i laminar-ssh laminar@scotch

Once you’re happy that it works, copy the file laminar-ssh as ~laminar/.ssh/id_ed25519 on the main CI server (not the test-runner!):

sudo cp ~peter/laminar-ssh ~laminar/.ssh/id_ed25519
sudo chown -R laminar.laminar ~laminar/.ssh
sudo chmod 0700 ~laminar/.ssh

You can test that setup by using this command on the CI server (there should be no password or pass-phrase prompt):

sudo -u laminar ssh scotch

— indeed, you need to do this at least once, in order to reassure the CI server’s SSH client that you trust the test-runner’s host key.

4. Run a trivial CI job on the test-runner

Now that the laminar user on the CI server can SSH freely to scotch, what remains is mostly Laminar setup. This part is adapted very closely from Laminar’s own documentation: we set up a “context” for the test-runner, separate from the default context used by all the other jobs (because the test-runner can carry on when other CPU-intensive jobs are running on the CI server), then add a remote job that executes in that context.

/var/lib/laminar/cfg/contexts/test-runner-scotch.conf
EXECUTORS=1
/var/lib/laminar/cfg/contexts/test-runner-scotch.env
RUNNER=scotch

The context is partly named after the test-runner host, but it also includes the name of the test-runner host as an environment variable. This means that the job-running scripts don’t need to hard-code that name.

As before, the actual job file in the jobs directory defers all the complexity to a generic do-system-tests script in the scripts directory:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run
#!/bin/bash -xe
exec do-system-tests

In keeping with the Laminar philosophy of not poorly reinventing things that already exist, Laminar itself has no built-in support for running jobs remotely — because that’s what SSH is for. This, too, is closely-inspired by the Laminar documentation:

/var/lib/laminar/cfg/scripts/do-system-tests
#!/bin/bash -xe

ssh laminar@$RUNNER /bin/bash -xe << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  uname -a
  run-parts /etc/update-motd.d
  sleep 20
  echo 0 > /sys/class/gpio/gpio13/value
EOF

This, of course, doesn’t run any actual tests, but it provides validation that the remote-job mechanism is working. The LED attached to GPIO13 on the test-runner Raspberry Pi serves as the “testing in progress” indicator. (And the run-parts invocation is an Ubuntu-ism: it reproduces the “message of the day”, the welcome message that’s printed when you log in. Most of it is adverts these days.)

Tying the job to the context is the $JOB.conf file:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.conf
CONTEXTS=test-runner-*

Due to the judicious use of a wildcard, the job can run on any test-runner; my setup only has the one, but if you found yourself in a team with heavy contention for the system-test hardware, this setup would allow you to build a second, identical Raspberry Pi with all the same hardware attached — called shepherds, maybe — and add it as a separate Laminar context. Because Laminar runs a job as soon as any context it’s fit for becomes available, this would automatically split queued system-test jobs across the two test-runners.

With all of this in place, it’s time to trigger the job on the CI server:

laminarc queue cotton-system-tests-dev

After a few teething troubles, including the thing I mentioned above about making sure that the SSH client accepts scotch’s host key, I was pleased to see the “testing in progress” LED come on and the message-of-the-day spool out in the Laminar logs.

5. Package up the system-tests and run a real CI job on the test-runner

We didn’t come here just to read some Ubuntu adverts in the message-of-the-day. Now we need to do the real work of building Cotton for the STM32 target, packaging-up the results, and transferring them to the test-runner where they can be run on the target hardware. First we build:

/var/lib/laminar/cfg/jobs/cotton-embedded-dev.run
#!/bin/bash -xe

PROJECT=cotton
RUST=stable
BRANCH=${BRANCH-main}
SOURCE=/var/lib/laminar/run/$PROJECT/workspace

(
    flock 200
    cd $SOURCE/$PROJECT
    git checkout $BRANCH
    cd -
    cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock

source $HOME/.cargo/env
rustup default $RUST

cd $PROJECT
cargo build -p systemtests -F arm,stm32f746-nucleo --all-targets
cargo test --no-run -p systemtests -F arm,stm32f746-nucleo 2> $ARCHIVE/binaries.txt
grep "Executable tests/" $ARCHIVE/binaries.txt  | cut -d'(' -f 2 | cut -d')' -f 1 > binaries.2.txt

tar cf $ARCHIVE/binaries.tar `find cross/*/target -type f -a -executable \
        | grep -v /deps/ | grep -v /build/` `cat binaries.2.txt`
laminarc queue cotton-system-tests-dev PARENT_RUN=$RUN
exec prune-archives cotton-embedded-dev 10

The actual build commands look much like many of the other Laminar jobs but with the extra Cargo features added which enable the cross-compiled targets; the interesting parts of this script come once the cargo build is done and the results must be tarred-up ready to be sent to the test-runner.

Finding all the target binaries is fairly easy using find cross/*/target, but we also need to find the host binary from the systemtests package. The easiest way to do that is to parse the output of cargo test --no-run, which includes lines such as:

   Compiling systemtests v0.0.1 (/home/peter/src/cotton/systemtests)
    Finished test [unoptimized + debuginfo] target(s) in 2.03s
  Executable unittests src/lib.rs (target/debug/deps/systemtests-4a9b67de54149231)
  Executable tests/device/main.rs (target/debug/deps/device-cfdcb3ff3e5eaaa5

The line with “Executable tests” is the one we’re looking for. (The string of hex digits after the executable name changes every time the sources change.) It’s possible that we could “cheat” here and just pick the first file we find starting with target/debug/deps/device-, as this is CI so we’re always building from clean — but this is a more robust way of determining the most recent binary.

(You might feel that this section is a bit of a cop-out, a bit white-box: knowing that there’s only one host-side binary does make the packaging a lot easier. If there were a lot of host-side binaries to package, and this technique started creaking at the seams, I’d look into cargo-nextest which has features specifically designed for packaging and unpackaging suites of Cargo tests.)

Once everything the system-tests job will need is stored in $ARCHIVE/binaries.tar, we can trigger the system-tests job — making sure to tell it, in $PARENT_RUN, which build in the archive it should be testing. (Initially I had the system-tests job use “latest”, but that’s wrong: it doesn’t handle multiple queued jobs correctly, and has a race condition even without queued jobs. The “latest” archive is that of the most recent successfully-finished job — but the build job hasn’t yet finished at the time it triggers the test job.)

The final prune-archives command is something I added after the initial Laminar writeup when some of the archive directories (particularly doc and coverage) started getting big: it just deletes all but the most recent N non-empty archives:

/var/lib/laminar/cfg/scripts/prune-archives
#!/bin/bash -xe

PROJECT=$1
KEEP=${2-2}

cd /var/lib/laminar/archive/$PROJECT
for d in `find * -maxdepth 0 -type d -a \! -empty | sort -n | head -n -$KEEP`; do
    rm -r /var/lib/laminar/archive/$PROJECT/$d
done

No-one likes deleting data, but in this case older archives should all be recoverable at any time, if the need arises, just by building the source again at that revision.

Now the cotton-system-tests-dev.run job needs to pass the $PARENT_RUN variable on to the underlying script:

/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run
#!/bin/bash -xe

exec do-system-tests cotton-embedded-dev $PARENT_RUN

and the do-system-tests script can use it to recover the tarball and scp it off to the test-runner:

/var/lib/laminar/cfg/scripts/do-system-tests
#!/bin/bash -xe

PARENT_JOB=$1
PARENT_RUN=$2

scp /var/lib/laminar/archive/$PARENT_JOB/$PARENT_RUN/binaries.tar laminar@$RUNNER:

ssh laminar@$RUNNER /bin/bash -xeE << "EOF"
  echo 1 > /sys/class/gpio/gpio13/value
  cleanup_function() {
    echo 0 > /sys/class/gpio/gpio13/value
    exit 1
  }
  trap 'cleanup_function' ERR
  export PS4='+ \t '
  export PATH=/home/laminar/.cargo/bin:$PATH
  rm -rf tests
  mkdir -p tests/systemtests
  ( cd tests
    tar xf ../binaries.tar
    cd systemtests
    export CARGO_MANIFEST_DIR=`pwd`
    ../target/debug/deps/device-* --list
    ../target/debug/deps/device-* --test
  )
  echo 0 > /sys/class/gpio/gpio13/value
EOF

The rest of the script has also gained in features and complexity. It now includes a trap handler to make sure that the testing-in-progress LED is extinguished even if the tests fail with an error. (See here for why this requires the -E flag to /bin/bash.)

The script goes on to add timestamps to the shell output (and thus to the logs) by adding \t to PS4, and add the Cargo bin directory to the path (because that’s where probe-rs got installed by Ansible).

The tests themselves need to be executed as if from the systemtests directory of a full checkout of Cotton — which we don’t have here on the test-runner — so the directories must be created manually. With all that in place, we can finally run the host-side test binary, which will run all the device-side tests including flashing the STM32 with the binaries, found via relative paths from $CARGO_MANIFEST_DIR.

That’s a lot, but it is all we need to successfully list and run all the device-side tests on our test-runner. Here’s (the best part of) the Laminar logs from a successful run:

+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --list
stm32f746_nucleo::arm_stm32f746_nucleo_dhcp: test
stm32f746_nucleo::arm_stm32f746_nucleo_hello: test
stm32f746_nucleo::arm_stm32f746_nucleo_ssdp: test

3 tests, 0 benchmarks
+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --test

running 3 tests
test stm32f746_nucleo::arm_stm32f746_nucleo_dhcp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 66.37s

+ 16:48:00 echo 0

And now that everything’s working, we can add it to the chain of events that’s triggered whenever a branch is pushed to the CI server:

/var/lib/laminar/cfg/job/cotton-dev.run
#!/bin/bash -xe

BRANCH=${BRANCH-main}
do-checkout cotton $BRANCH
export LAMINAR_REASON="built $BRANCH at `cat git-revision`"
laminarc queue \
	 cotton-embedded-dev BRANCH=$BRANCH \
	 cotton-doc-dev BRANCH=$BRANCH \
         cotton-grcov-dev BRANCH=$BRANCH \
         cotton-msrv-dev BRANCH=$BRANCH \
         cotton-beta-dev BRANCH=$BRANCH \
         cotton-nightly-dev BRANCH=$BRANCH \
	 cotton-minver-dev BRANCH=$BRANCH

If you’re wondering about the -dev suffix on all those jobs: I set up Laminar with two parallel sets of identical build jobs. There’s the ones with -dev which are triggered when pushing feature branches, and the ones without the -dev which are triggered when pushing to main. This is arranged by the Git server post-receive hook:

git/cotton.git/hooks/post-receive
#!/bin/bash -ex

while read oldrev newrev ref
do
    if [ "${ref:0:11}" == "refs/heads/" -a "$newrev" != "0000000000000000000000000000000000000000" ];
    then
        export BRANCH=${ref:11}
        export LAMINAR_REASON="git push $BRANCH"
        if [ "$BRANCH" == "main" ];
        then
           laminarc queue cotton BRANCH=$BRANCH
        else
           laminarc queue cotton-dev BRANCH=$BRANCH
        fi
    fi
done

In a sense there’s no engineering need for this duplication: exactly the same actual work is undertaken in either case. But there’s a social need: feature branches (with -dev) aren’t expected to always pass all tests — indeed, such branches are often pushed for the express purpose of determining whether or not they pass all tests. But the main branch is expected to pass all tests, all the time, and a regression is to be taken seriously. That is: if the cotton-dev build or one of its downstreams fails, that’s got a very different implication from the cotton build or one of its downstreams failing. The cotton build itself should enjoy long, uninterrupted streaks of regression-free passes (and indeed it does; the last failure was in November 2023 due to this issue causing intermittent unit-test failures).

6. Go and sit down and think about what we’ve done

Well, what have we done? We have, over the course of three blog posts, taken a bit of knowledge of bash, cron, and SSH that we already had, then gone and learned a bit about Laminar, Ansible, and Cargo, and the “full stack” that we then engineered for ourselves using all that knowledge is this: any time we push a feature branch, we (eventually) get told automatically, as a straight yes/no answer, whether it’s OK for main or not. That’s an immensely powerful thing to have, an immensely useful property to have an oracle for.

Having that facility available is surely expected these days in other areas of software engineering — where test-harnesses are easier to create — but I hope I’ve demonstrated in these blog posts that even those working on embedded systems can also enjoy and benefit from the reliability (of main in particular) that’s due to this type of workflow.

(Other versions of the workflow could be constructed if preferred. Perhaps you don’t want every push to start a system-test sequence — in that case, you could either write a more complex Git post-receive hook, or set up an alternative Git remote, called perhaps “tests”, so that pushing to that remote kicked off the test sequence. Or you could tie the test sequence in to your pull-request process somehow.)

To repeat a line from the previous instalment, any software you need to test can be tested automatically in some way that is superior to not testing it at all. Compared to Rust’s unit tests, which are always just a cargo test away, it took us three quite detailed blog posts and a small pile of physical hardware to get us to the stage where this embedded code could be tested automatically. If I had to distill the message of these three posts from their ~11,000 words down to just six, they’d be: this sort of effort is worthwhile. If your product is, or runs on, a specific platform or piece of hardware, it’s worth spending a lot of effort arranging to test automatically on the actual target hardware, or as near to the actual target as is practical. (Sometimes embedded products are locked-down, fused-off, potted, or otherwise rendered inaccessible; testing, in that case, requires access to unlocked variants.)

That is to say: is the effort worth it for the cotton-ssdp crate — a few thousand lines of code, about 60% of which is already tests, and the rest of which has 100% test coverage? Arguably yes, but also arguably no, especially as almost all of cotton-ssdp can be tested in hosted builds. The cotton-ssdp crate has acted here more as a spike, a proof-of-concept. But the point is, the concept was proved, a baseline has now been set, and all the right testing infrastructure is in place if I want to write a power-aware RTOS, or implement HAL crates for some of these weird development boards in my desk drawer, or if I want to disrupt the way PAC crates are generated in order to improve the testing story of HAL crates. Now when I start doing those things, I can start defending the functionality with system-tests from the outset. If I want to do those more complex, more embedded-centric things — which I do — then all the effort expended so far will ultimately be very beneficial indeed. If you, too, aim to do complex or embedded-centric things, then similar levels of effort will benefit your projects.

6.1 War stories from the front lines of not-system-testing

I have some war stories for you. I heard tell of a company back in the day whose product, a hardware peripheral device, was (for sound commercial reasons) sold as working with numerous wonky proprietary Unixes. But for some of the more obscure platforms, there had been literally no automated testing: a release was declared by development, it was thrown over the wall into the QA department, and in due course a human tester would physically insert the CD-R into one of these wonky old machines and manually run through a test script ensuring that everything appeard to work. This was such a heavyweight process that it was run very rarely — meaning that, if an issue was found on, say, AIX, then the code change that caused it probably happened months ago and with significant newer work built on top of it. And of course such a discovery at “QA time” meant that the whole, lumbering manual release process had to be started again from scratch once the issue was fixed. This was exactly the pathology that CI was invented to fix! I mean, I’m pretty sure Laminar doesn’t support antediluvian AIX versions out of the box, but given the impact of any issues on release schedules, it was definitely worth their putting in quite significant development effort to bring the installation process under CI — automatically on main (at least nightly, if not on every push), and by request on any feature branch. (Developers need a way to have confidence they haven’t broken AIX before merging to main.) They should have asked themselves, “What can be done to automate testing of the install CD, in some way that is superior to not testing it at all?” — to place it under the control of a CI test-runner, as surely as the STM32F746-Nucleo is under the control of this Raspberry Pi? Well — what’s the simplest thing that can act as a fake CD-ROM drive well enough to fool an AIX box? Do those things have USB host? Can you bitbang SCSI-1 target on a Raspberry Pi? Would a BlueSCSI help? Or even if they were to decide that “CI-ing” the actual install CD image is too hard — how often are the failures specifically CD-related? Could they just SSH on the installer tarballs and test every other part of the process? How did this pathology last for more than one single release?

I also heard tell of a different company whose product was an embedded system, and was under automated test, including before merging to main — but following a recent “urgent” porting exercise (again for sound commercial reasons), many of the tests didn’t pass. The test harness they used supported marking tests as expected-failure — but no-one bothered doing so. So every test run “failed”, and developers had to manually pore over the entire list of test results to determine whether they’d regressed anything. In a sense the hard part of testing was automated, but the easy part not! This company had put in 99% of the effort towards automated testing, but were reaping just a tiny fraction of the benefits, because the very final step — looking at the test results and turning them into a yes/no for whether the code is OK for main — was not automated. How did this pathology last for more than one single afternoon?

6.2 People

The rhetorical-looking questions posed above about the AIX CI and the expected-fail tests (“How did these obviously wrong situations continue?”) did in fact have an answer in the real world. Indeed, it was the same answer in both cases: people.

In the first company, the head of QA presided over a large and (self-)important department — which it needed to be in order to have enough staff to manually perform quite so much automatable work. If QA was run with the attitude that human testers are exploratory testers, squirrelers-out of corner-cases, stern critics of developers’ assumptions — if the testers’ work-product, including during early-development-phase involvement, was more and better automated tests — then they probably wouldn’t need anything like so many testers, and the rôle of “Head of QA” would perhaps be viewed as a less big cheese than hitherto. Although the product quality would benefit, the company’s bottom-line would benefit, and even the remaining testers’ career-progression would benefit — the Head of QA’s incentives were misaligned with all of that, and they played the game they were given by the rules that they found to be in effect.

The second company seems harder to diagnose, but fundamentally the questions are, “Who is in charge of setting the quality bar for merges to main?” and “Who is in charge of what happens when that bar is not met?”. Those are likely to be two different people — they require very different skills — but if you find that a gulf is opening up between your team’s best practices and your team’s typical practices, then both those people are needed in order to bring the two closer together again. (In my own career I’ve a few times acted as the first one, but I’ve never acted as the second one: as Dr. Anthony Fauci famously didn’t say, “I don’t know how to explain to you that you should care for other people software quality.”)

This post, and this blog, and this blogger, cannot help you to deal with people. But often the talking point of the people you need to convince (or whose boss you need to convince to overrule them) is that better automated testing isn’t technologically feasible, isn’t worth attempting. I hope I’ve done a little to dispel that myth at least.

No comments:

Post a Comment

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.