Previously on #rust:
Thanks
to
earlier parts
of this series, developers of Cotton now have the ability to run
automated system-tests of the embedded builds using their own
computer as the test host —
if they have the right
STM32F746-Nucleo development board to hand. What we need to do
now, is add the ability
for
continuous
integration to run those tests automatically on request (e.g.
whenever a feature branch is pushed to the central git server).
For at least the third time on this blog, we’re going to
employ a Raspberry Pi 3; the collection of those sitting
unused in my desk drawer somehow never seems to get any smaller.
(And the supply shortages of recent
years
seem to have
abated.)
First, set up Ubuntu 22.04 with USB boot and headless full-disk
encryption in the usual way.
OK, perhaps doing so isn’t all that everyday (that remark
was slightly a dig at Larousse Gastronomique, at least one
recipe in which starts, “First, make a brioche in the usual
way”).
But this
very blog has exactly the instructions you need. This time,
instead of a 512GB Samsung USB SSD, I used a 32GB Sandisk USB
flash drive — the test-runner won’t be needing tons of
storage.
Alternatively, you could use any other reasonable computer you’ve
got lying around — an Intel NUC, say, or a cast-off laptop. Most
or all of the following steps will apply to any freshly-installed
Ubuntu box, and many of them to other Linux distributions too. But
you’ll have the easiest time of it if your CI test-runner has the
same architecture and OS as your main CI server, which is another
reason I’m pressing yet another Raspberry Pi into service
here.
However you get there, what you need to proceed with the rest of this
post (at least the way I proceeded) is:
- An existing
Laminar CI server, on your local Ethernet.
- A Raspberry Pi to become the test-runner;
- with Ubuntu 22.04 freshly installed;
- plugged into your local Ethernet, able to DHCP (I gave mine a
fixed address in OpenWRT’s DHCP server setup) — or,
at least, somewhere where the CI server can SSH to it;
- with your own user, that you can SSH in as, and use sudo.
- A USB-to-Ethernet adaptor (I used
the
Amazon Basics one);
- an Ethernet switch, at least 100Mbit (I used this
TP-Link LS1008);
- and an STM32F746-Nucleo development board.
In this blog post we’re going to:
- Connect everything together, making a
separate test network
- Automate the remaining setup of the
test-runner, using Ansible
- Arrange that the CI server can SSH to the
test-runner autonomously
- Run a trivial CI job on the test-runner
- Package up the system-tests and run a real
CI job on the test-runner
- Go and sit down and think about what we’ve
done
1. Connect everything together, making a
separate test network
For once, instead of being a whimsical stock photo of something
tangentially related, the image at the top of this blog post is an
actual photograph of the actual physical items being discussed
herein (click to enlarge if need be). The only
connections to the outside world are from my home network to the
Raspberry Pi’s built-in Ethernet (lower left) and power
to the Raspberry Pi and to the Ethernet switch (top left and
centre bottom). The test network is otherwise self-contained: the
STM32 development board is on a private Ethernet segment
with just the Raspberry Pi’s USB Ethernet for company.
This network has its
own RFC1918
address range, 192.168.3.x, distinct from the rest of the home
network. (The Wifi interface on the Raspberry Pi is not
currently being used.)
The breadboard is attached to the Raspberry Pi’s GPIO
connector, and at present is only used to provide a “testing
in progress” LED attached to GPIO13 (glowing white in the
photo, but therefore hard to see). GPIO usage could become more
sophisticated in the future: for instance, if I was writing a HAL
crate for a new embedded device, I could connect the
Raspberry Pi’s GPIO inputs to the GPIO outputs of the
embedded device (and vice-versa) and system-test my GPIO code.
The Raspberry Pi programs the STM32 over USB; apart from
that and the Ethernet adaptor, the USB flash drive also takes up
one of the four USB sockets, leaving just one spare for future
enhancements. (But hubs are a thing, of course.)
2. Automate the remaining setup of the
test-runner, using Ansible
The initial
setup of a new Ubuntu installation is arguably best done
manually, as you might need to react to things that the output of
the commands is telling you. But once the basics are in place,
the complexities of setting up everything else that the test-runner
needs, are best automated — so they can be repeatable,
documented, and explainable to others (in this case:
you).
Automating server setup is not new news to cloud engineers, who
often use tools such as Chef or Puppet to bring one or more new
hosts or containers up-to-speed in a single command. Electric Imp
used Chef, which I never really got on with, partly because of the
twee yet mixed metaphors (“Let’s knife this cookbook
— solo!”, which is a thing all chefs say), but mostly
because it was inherently bound up with Ruby. Yet I felt I needed
a bit more structure than just “copy on a shell script and
run it”. So for #homelab purposes, I thought I’d try
Ansible.
Ansible is configured using YAML files, which at least is one
step up from Ruby as a configuration language. The main
configuration file is called a “playbook”, which
contains “plays” (think sports, not theatre), which
are in turn made up of individual “tasks”. A task can
be as simple as executing a single shell command, but the benefit
of Ansible is that it comes
with a
huge variety of add-ons which allow tasks to be written in a
more expressive way. For instance, instead of faffing about to
automate cron, there’s a “cron” add-on which
even knows about the @reboot directive and lets you
write:
- name: init gpios on startup
cron:
name: init-gpios
special_time: reboot
job: /usr/local/bin/init-gpios
Tasks can be linked to the outcome of earlier tasks, so that for
instance it can restart the DHCP server if, and only if, the DHCP
configuration has been changed. With most types of task, the
Ansible configuration is “declarative”: it describes
what the situation
ought to be, and Ansible checks whether that’s already the
case, and changes things only where they’re not as desired.
Ansible can be used for considerably more complex setups than the
one in this blog post — many, many servers of different
types that all need different things doing to them — but
I’ve made an effort at least to split up the playbook into plays
relevant to basically any machine (hosts: all), or ones
relevant just to the Raspberry Pi (hosts:
raspberrypis), or ones specific to the rôle of being a
system-test runner (hosts: testrunners).
I ended up with 40 or so tasks in the one playbook, which between
them install all the needed system packages as root, then install
Rust, Cargo and probe-rs as
the laminar user, then set up the USB Ethernet adaptor
as eth1 and run a DHCP server on it.
Declaring which actual machines make up “all”,
“testrunners”, etc., is the job of a separate
“inventory” file; the one included in the repository
matches my setup at home. The inventory file is also the place to
specify per-host information that shouldn’t be hard-coded
into the tasks: in this case, the DHCP setup tasks need to know
the MAC address of the USB Ethernet adaptor, but that’s
host-specific, so it goes in the inventory file.
All the tasks in the
main YAML file are
commented, so I won’t re-explain them here, except to say that
the “mark wifi optional”, “rename eth1”, and
“set static IP for eth1” tasks do nothing to dispel my
suspicion that Linux networking is nowadays just a huge seven-layer
dip of xkcd-927 all the way
down, with KDE and probably Gnome only adding their own frothy
outpourings on top.
I added a simple Makefile to the systemtests/ansible
directory, just to automate running the one playbook with the one
inventory.
The name
“Ansible”
comes originally from science-fiction, where it’s used to mean a
device that can communicate across deep space without experiencing
speed-of-light latency. I only mention this because when
communicating across about a metre of my desk, it’s bewilderingly
slow — taking seconds to update each line
of sshd_config. That’s about the same as speed-of-light
latency would be if the test-runner was on the Moon.
But having said all that, there’s still value in just being able
to re-run Ansible and know that everything is set consistently and
repeatably.
I did wonder about running Ansible itself under CI — after
all, it’s software, it needs to be correct when infrequently
called upon, so it ought therefore to be automatically tested to
avoid bugs creeping in. But running Ansible needs root (or sudo)
access on the test-runner, which in turn means it needs SSH access
to the test-runner as a user which is capable of sudo — and
I don’t want to leave either of those capabilities lying around
unencrypted-at-rest in CI. So for the time being it’s down to an
unreliable human agent — that’s me — to
periodically run make in the ansible
directory.
3. Arrange that the CI server can SSH to
the test-runner autonomously
Most of the hosts in this series
of #homelab
posts are set up with, effectively, zero-trust networking: they can
only be accessed via SSH, and all SSH sessions start with me
personally logged-in somewhere and
running ssh-agent
(and using agent-forwarding). But because the CI server needs to be
able to start system-test jobs on the test-runner, it needs to be
able to login via SSH completely autonomously.
This isn’t as alarming as it sounds, as the user it logs
into (the laminar user on the test-runner) isn’t very
privileged; in particular, it’s not in the sudo group and
thus can’t use sudo at all. (The Ansible setup explicitly
grants that user permissions to the hardware it needs to
access.)
Setting this up is a bit like setting up your own SSH for the
first time. First generate a key-pair:
ssh-keygen -t ed25519
— when prompted, give the empty passphrase, and choose
“laminar-ssh” as the output file. The public key will
be written to “laminar-ssh.pub”.
The public key needs to be added
to ~laminar/.ssh/authorized_keys (American spelling!) on
the test-runner; the Ansible setup already does this for my own CI
server’s public key.
Once authorized_keys is in place, you can test the setup using:
ssh -i laminar-ssh laminar@scotch
Once you’re happy that it works, copy the
file laminar-ssh as
~laminar/.ssh/id_ed25519 on the main CI server
(not the test-runner!):
sudo cp ~peter/laminar-ssh ~laminar/.ssh/id_ed25519
sudo chown -R laminar.laminar ~laminar/.ssh
sudo chmod 0700 ~laminar/.ssh
You can test that setup by using
this command on the CI server (there should be no password or
pass-phrase prompt):
sudo -u laminar ssh scotch
— indeed, you need to do this at least once, in order
to reassure the CI server’s SSH client that you trust the
test-runner’s host key.
4. Run a trivial CI job on the test-runner
Now that the laminar user on the CI server can SSH freely
to scotch, what remains is mostly Laminar setup. This part
is adapted very closely from Laminar’s own documentation: we
set up a “context” for the test-runner, separate from
the default context used by all the other jobs (because the
test-runner can carry on when other CPU-intensive jobs are running
on the CI server), then add a remote job that executes in that
context.
/var/lib/laminar/cfg/contexts/test-runner-scotch.conf |
EXECUTORS=1 |
/var/lib/laminar/cfg/contexts/test-runner-scotch.env |
RUNNER=scotch |
The context is partly named after the test-runner host, but it
also includes the name of the test-runner host as an environment
variable. This means that the job-running scripts don’t need to
hard-code that name.
As before, the actual
job file in the jobs directory defers all the complexity to
a generic do-system-tests script in the scripts
directory:
/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run |
#!/bin/bash -xe
exec do-system-tests |
In keeping with the Laminar philosophy of not poorly
reinventing things that already exist, Laminar itself has no
built-in support for running jobs remotely — because that’s
what SSH is for. This, too, is closely-inspired by the Laminar
documentation:
/var/lib/laminar/cfg/scripts/do-system-tests |
#!/bin/bash -xe
ssh laminar@$RUNNER /bin/bash -xe << "EOF"
echo 1 > /sys/class/gpio/gpio13/value
uname -a
run-parts /etc/update-motd.d
sleep 20
echo 0 > /sys/class/gpio/gpio13/value
EOF |
This, of course, doesn’t run any actual tests, but it
provides validation that the remote-job mechanism is working. The
LED attached to GPIO13 on the test-runner Raspberry Pi serves
as the “testing in progress” indicator. (And
the run-parts invocation is an Ubuntu-ism: it reproduces
the “message of the day”, the welcome message
that’s printed when you log in. Most of it is adverts these
days.)
Tying the job to the context is the $JOB.conf file:
/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.conf |
CONTEXTS=test-runner-* |
Due to the judicious use of a wildcard, the job can run
on any test-runner; my setup only has the one, but if you
found yourself in a team with heavy contention for the system-test
hardware, this setup would allow you to build a second, identical
Raspberry Pi with all the same hardware attached —
called shepherds, maybe — and add it as a separate
Laminar context. Because Laminar runs a job as soon as any
context it’s fit for becomes available, this
would automatically split queued system-test jobs across
the two test-runners.
With all of this in place, it’s time to trigger the job on the
CI server:
laminarc queue cotton-system-tests-dev
After a few teething troubles, including the thing I mentioned
above about making sure that the SSH client
accepts scotch’s host key, I was pleased to see the
“testing in progress” LED come on and the
message-of-the-day spool out in the Laminar logs.
5. Package up the system-tests and run a real CI job on the test-runner
We didn’t come here just to read some Ubuntu adverts in the
message-of-the-day. Now we need to do the real work of building
Cotton for the STM32 target, packaging-up the results, and
transferring them to the test-runner where they can be run on the
target hardware. First we build:
/var/lib/laminar/cfg/jobs/cotton-embedded-dev.run |
#!/bin/bash -xe
PROJECT=cotton
RUST=stable
BRANCH=${BRANCH-main}
SOURCE=/var/lib/laminar/run/$PROJECT/workspace
(
flock 200
cd $SOURCE/$PROJECT
git checkout $BRANCH
cd -
cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock
source $HOME/.cargo/env
rustup default $RUST
cd $PROJECT
cargo build -p systemtests -F arm,stm32f746-nucleo --all-targets
cargo test --no-run -p systemtests -F arm,stm32f746-nucleo 2> $ARCHIVE/binaries.txt
grep "Executable tests/" $ARCHIVE/binaries.txt | cut -d'(' -f 2 | cut -d')' -f 1 > binaries.2.txt
tar cf $ARCHIVE/binaries.tar `find cross/*/target -type f -a -executable \
| grep -v /deps/ | grep -v /build/` `cat binaries.2.txt`
laminarc queue cotton-system-tests-dev PARENT_RUN=$RUN
exec prune-archives cotton-embedded-dev 10 |
The actual build commands look much
like many
of the other Laminar jobs but with the extra Cargo features
added which
enable the cross-compiled targets; the interesting parts of this
script come once the cargo build is done and the results must
be tarred-up ready to be sent to the test-runner.
Finding all the target binaries is fairly easy using find
cross/*/target, but we also need to find the host binary from the
systemtests package. The easiest way to do that is to parse the output
of cargo test --no-run, which includes lines such as:
Compiling systemtests v0.0.1 (/home/peter/src/cotton/systemtests)
Finished test [unoptimized + debuginfo] target(s) in 2.03s
Executable unittests src/lib.rs (target/debug/deps/systemtests-4a9b67de54149231)
Executable tests/device/main.rs (target/debug/deps/device-cfdcb3ff3e5eaaa5
The line with “Executable tests” is the one
we’re looking for. (The string of hex digits after the
executable name changes every time the sources change.) It’s
possible that we could “cheat” here and just pick the
first file we find starting
with target/debug/deps/device-, as this is CI so
we’re always building from clean — but this is a more
robust way of determining the most recent binary.
(You might feel that this section is a bit of a cop-out, a bit
white-box: knowing that there’s only one host-side binary
does make the packaging a lot easier. If there were a lot of
host-side binaries to package, and this technique started creaking
at the seams, I’d look
into cargo-nextest which
has features
specifically designed for packaging and unpackaging suites of
Cargo tests.)
Once everything the system-tests job will need is stored
in $ARCHIVE/binaries.tar, we can trigger the system-tests
job — making sure to tell it,
in $PARENT_RUN, which build in the archive it
should be testing. (Initially I had the system-tests job use
“latest”, but that’s wrong: it doesn’t
handle multiple queued jobs correctly, and has a race condition
even without queued jobs. The “latest” archive is that
of the most recent successfully-finished job — but
the build job hasn’t yet finished at the time it
triggers the test job.)
The final prune-archives command is something I added
after the initial Laminar writeup when some of the
archive directories (particularly doc and coverage) started getting
big: it just deletes all but the most recent N non-empty archives:
/var/lib/laminar/cfg/scripts/prune-archives |
#!/bin/bash -xe
PROJECT=$1
KEEP=${2-2}
cd /var/lib/laminar/archive/$PROJECT
for d in `find * -maxdepth 0 -type d -a \! -empty | sort -n | head -n -$KEEP`; do
rm -r /var/lib/laminar/archive/$PROJECT/$d
done |
No-one likes deleting data, but in this case older archives
should all be recoverable at any time, if the need arises, just by
building the source again at that revision.
Now the cotton-system-tests-dev.run job needs to pass the
$PARENT_RUN variable on to the underlying script:
/var/lib/laminar/cfg/jobs/cotton-system-tests-dev.run |
#!/bin/bash -xe
exec do-system-tests cotton-embedded-dev $PARENT_RUN |
and the do-system-tests script can use it to recover the tarball
and scp it off to the test-runner:
/var/lib/laminar/cfg/scripts/do-system-tests |
#!/bin/bash -xe
PARENT_JOB=$1
PARENT_RUN=$2
scp /var/lib/laminar/archive/$PARENT_JOB/$PARENT_RUN/binaries.tar laminar@$RUNNER:
ssh laminar@$RUNNER /bin/bash -xeE << "EOF"
echo 1 > /sys/class/gpio/gpio13/value
cleanup_function() {
echo 0 > /sys/class/gpio/gpio13/value
exit 1
}
trap 'cleanup_function' ERR
export PS4='+ \t '
export PATH=/home/laminar/.cargo/bin:$PATH
rm -rf tests
mkdir -p tests/systemtests
( cd tests
tar xf ../binaries.tar
cd systemtests
export CARGO_MANIFEST_DIR=`pwd`
../target/debug/deps/device-* --list
../target/debug/deps/device-* --test
)
echo 0 > /sys/class/gpio/gpio13/value
EOF |
The rest of the script has also gained in features and
complexity. It now includes
a trap
handler to make sure that the testing-in-progress LED is
extinguished even if the tests fail with an error.
(See here
for why this requires the -E flag to /bin/bash.)
The script goes on to add timestamps to the shell output (and
thus to the logs)
by adding \t
to PS4, and add the Cargo bin directory to the path
(because that’s where probe-rs got installed by Ansible).
The tests themselves need to be executed as if from
the systemtests directory of a full checkout of Cotton
— which we don’t have here on the test-runner — so the
directories must be created manually. With all that in place, we
can finally run the host-side test binary, which will run all the
device-side tests including flashing the STM32 with the binaries, found via
relative paths
from $CARGO_MANIFEST_DIR.
That’s a lot, but it is all we need to successfully list
and run all the device-side tests on our test-runner. Here’s (the
best part of) the Laminar logs from a successful run:
+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --list
stm32f746_nucleo::arm_stm32f746_nucleo_dhcp: test
stm32f746_nucleo::arm_stm32f746_nucleo_hello: test
stm32f746_nucleo::arm_stm32f746_nucleo_ssdp: test
3 tests, 0 benchmarks
+ 16:46:54 ../target/debug/deps/device-453652d9c9dda7c1 --test
running 3 tests
test stm32f746_nucleo::arm_stm32f746_nucleo_dhcp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp has been running for over 60 seconds
test stm32f746_nucleo::arm_stm32f746_nucleo_ssdp ... ok
test stm32f746_nucleo::arm_stm32f746_nucleo_hello ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 66.37s
+ 16:48:00 echo 0
And now that everything’s working, we can add it to the chain of
events that’s triggered whenever a branch is pushed to the CI server:
/var/lib/laminar/cfg/job/cotton-dev.run |
#!/bin/bash -xe
BRANCH=${BRANCH-main}
do-checkout cotton $BRANCH
export LAMINAR_REASON="built $BRANCH at `cat git-revision`"
laminarc queue \
cotton-embedded-dev BRANCH=$BRANCH \
cotton-doc-dev BRANCH=$BRANCH \
cotton-grcov-dev BRANCH=$BRANCH \
cotton-msrv-dev BRANCH=$BRANCH \
cotton-beta-dev BRANCH=$BRANCH \
cotton-nightly-dev BRANCH=$BRANCH \
cotton-minver-dev BRANCH=$BRANCH
|
If you’re wondering about the -dev suffix on all those
jobs: I set up Laminar with two parallel sets of identical build
jobs. There’s the ones with -dev which are triggered when
pushing feature branches, and the ones without the -dev
which are triggered when pushing to main. This is arranged by the
Git server post-receive hook:
git/cotton.git/hooks/post-receive |
#!/bin/bash -ex
while read oldrev newrev ref
do
if [ "${ref:0:11}" == "refs/heads/" -a "$newrev" != "0000000000000000000000000000000000000000" ];
then
export BRANCH=${ref:11}
export LAMINAR_REASON="git push $BRANCH"
if [ "$BRANCH" == "main" ];
then
laminarc queue cotton BRANCH=$BRANCH
else
laminarc queue cotton-dev BRANCH=$BRANCH
fi
fi
done |
In a sense there’s no engineering need for this duplication:
exactly the same actual work is undertaken in either case. But
there’s a social need: feature branches
(with -dev) aren’t expected to always pass all tests
— indeed, such branches are often pushed for the express
purpose of determining whether or not they pass all tests. But the
main branch is expected to pass all tests, all the time,
and a regression is to be taken seriously. That is: if
the cotton-dev build or one of its downstreams fails,
that’s got a very different implication from the
cotton build or one of its downstreams failing. The
cotton build itself should enjoy long, uninterrupted
streaks of regression-free passes (and indeed it does; the last
failure was in November 2023 due
to this
issue causing intermittent unit-test failures).
6. Go and sit down and think about what we’ve
done
Well, what have we done? We have, over the course of three
blog posts, taken a bit of knowledge of bash, cron, and SSH that
we already had, then gone and learned a bit about Laminar,
Ansible, and Cargo, and the “full stack” that we then
engineered for ourselves using all that knowledge is this: any
time we push a feature branch, we (eventually) get told
automatically, as a straight yes/no answer, whether it’s OK
for main or not. That’s an immensely powerful thing to have,
an immensely useful property to have an oracle for.
Having that facility available is surely expected these days in
other areas of software engineering — where
test-harnesses are easier to create — but I hope I’ve
demonstrated in these blog posts that even those working on
embedded systems can also enjoy and benefit from the reliability
(of main in particular) that’s due to this type of workflow.
(Other versions of the workflow could be constructed if
preferred. Perhaps you don’t want every push to start
a system-test sequence — in that case, you could either
write a more complex Git post-receive hook, or set up an
alternative Git remote, called perhaps “tests”, so
that pushing to that remote kicked off the test sequence.
Or you could tie the test sequence in to your pull-request process
somehow.)
To repeat a line from the previous instalment, any software
you need to test can be tested automatically in some way that is
superior to not testing it at all. Compared to
Rust’s unit tests, which are always just
a cargo test away, it took us three quite detailed blog
posts and a small pile of physical hardware to get us to the
stage where this embedded code could be tested automatically. If
I had to distill the message of these three posts from their
~11,000 words down to just six, they’d be: this sort of
effort is worthwhile. If your product is, or runs on, a
specific platform or piece of hardware, it’s worth
spending a lot of effort arranging to test automatically
on the actual target hardware, or as near to the actual target
as is practical. (Sometimes embedded products are locked-down,
fused-off, potted, or otherwise rendered inaccessible; testing,
in that case, requires access to unlocked variants.)
That is to say: is the effort worth it for the cotton-ssdp crate
— a few thousand lines of code, about 60% of which
is already tests, and the rest of which
has 100% test
coverage? Arguably yes, but also arguably no, especially as
almost all of cotton-ssdp can be tested in hosted builds. The
cotton-ssdp crate has acted here more as a spike, a
proof-of-concept. But the point is, the concept was proved,
a baseline has now been set, and all the right testing
infrastructure is in place if I want to write a power-aware RTOS,
or implement HAL crates for some of these weird development boards
in my desk drawer, or if I want to disrupt the way PAC crates are
generated in order to improve the testing story of HAL crates. Now
when I start doing those things, I can start defending the
functionality with system-tests from the outset. If I want to do
those more complex, more embedded-centric things — which I
do — then all the effort expended so far will ultimately be
very beneficial indeed. If you, too, aim to do complex or
embedded-centric things, then similar levels of effort will
benefit your projects.
6.1 War stories from the front lines of not-system-testing
I have some war stories for you. I heard tell of a company back
in the day whose product, a hardware peripheral device, was (for
sound commercial reasons) sold as working with numerous wonky
proprietary Unixes. But for some of the more obscure platforms,
there had been literally no automated testing: a release
was declared by development, it was thrown over the wall into the
QA department, and in due course a human tester would physically
insert the CD-R into one of these wonky old machines and manually
run through a test script ensuring that everything appeard to
work. This was such a heavyweight process that it was run very
rarely — meaning that, if an issue was found on, say, AIX,
then the code change that caused it probably happened months ago
and with significant newer work built on top of it. And of course
such a discovery at “QA time” meant that the whole,
lumbering manual release process had to be started again from
scratch once the issue was fixed. This was exactly the pathology
that CI was invented to fix! I mean, I’m pretty sure Laminar
doesn’t support antediluvian AIX versions out of the box,
but given the impact of any issues on release schedules, it
was definitely worth their putting in quite significant
development effort to bring the installation process under CI
— automatically on main (at least nightly, if not on every
push), and by request on any feature branch. (Developers need a
way to have confidence they haven’t broken AIX before
merging to main.) They should have asked
themselves, “What can be done to automate testing of the
install CD, in some way that is superior to not testing it at
all?” — to place it under the control of a CI
test-runner, as surely as the STM32F746-Nucleo is under the
control of this Raspberry Pi? Well — what’s the
simplest thing that can act as a fake CD-ROM drive well enough to
fool an AIX box? Do those things have USB host? Can you bitbang
SCSI-1 target on a Raspberry Pi? Would
a BlueSCSI help? Or even if
they were to decide that “CI-ing” the actual install
CD image is too hard — how often are the failures
specifically CD-related? Could they just SSH on the installer
tarballs and test every other part of the process? How did this
pathology last for more than one single release?
I also heard tell of a different company whose product was an
embedded system, and was under automated test, including
before merging to main — but following a recent “urgent”
porting exercise (again for sound commercial reasons), many of the
tests didn’t pass. The test harness they used supported
marking tests as expected-failure — but no-one bothered
doing so. So every test run “failed”, and
developers had to manually pore over the entire list of test
results to determine whether they’d regressed anything. In a
sense the hard part of testing was automated, but the easy part
not! This company had put in 99% of the effort towards automated
testing, but were reaping just a tiny fraction of the benefits,
because the very final step — looking at the test results
and turning them into a yes/no for whether the code is OK for main
— was not automated. How did this pathology last
for more than one single afternoon?
6.2 People
The rhetorical-looking questions posed above about the AIX CI and
the expected-fail tests (“How did these obviously wrong
situations continue?”) did in fact have an answer in the
real world. Indeed, it was the same answer in both
cases: people.
In the first company, the head of QA presided over a large and
(self-)important department — which it needed to be in order
to have enough staff to manually perform quite so much automatable
work. If QA was run with the attitude that human testers are
exploratory testers, squirrelers-out of corner-cases, stern
critics of developers’ assumptions — if the
testers’ work-product, including during
early-development-phase involvement, was more and better
automated tests — then they probably wouldn’t need
anything like so many testers, and the rôle of
“Head of QA” would perhaps be viewed as a less big
cheese than hitherto. Although the product quality would benefit,
the company’s bottom-line would benefit, and even the
remaining testers’ career-progression would benefit —
the Head of QA’s incentives were misaligned with all of
that, and they played the game they were given by the rules that
they found to be in effect.
The second company seems harder to diagnose, but fundamentally
the questions are, “Who is in charge of setting the
quality bar for merges to main?” and “Who is in
charge of what happens when that bar is not met?”. Those
are likely to be two different people — they require very
different skills — but if you find that a gulf is opening up
between your team’s best practices and your
team’s typical practices, then both those people are needed
in order to bring the two closer together again. (In my own career
I’ve a few times acted as the first one, but I’ve never acted as
the second one: as Dr. Anthony Fauci famously didn’t say,
“I
don’t know how to explain to you that you should care
for other people software quality.”)
This post, and this blog, and this blogger, cannot help you to
deal with people. But often the talking point of the people you
need to convince (or whose boss you need to convince to overrule
them) is that better automated testing
isn’t technologically feasible, isn’t worth attempting. I
hope I’ve done a little to dispel that myth at least.