Not in fact any relation to the famous large Greek meal of the same name.

Saturday 12 August 2023

Rust crate release checklist

Previously on #rust:

     
There’s been a few public releases now of the Rust crates I’ve been working on, cotton-netif and cotton-ssdp. The SSDP one even has a merged pull request from a contributor! But because it’s often a little while between releases, I struggle to remember all the steps required. (Not so many, as Cotton is no massive failer of the Joel Test, but there are a few moving parts just because of all the fine, free open-source tools in use.) This document collects them all in one place, as much for my own benefit as anyone else’s.

  1. Check Github for third-party pull requests as I don’t want to annoy contributors by seeming to ignore their work. This check can be automated using a badge: GitHub pull requests
  2. Update Cargo.toml and CHANGELOG.md for all packages being released, like in commit 4ba3675c. If the package being released is depended on by other parts of Cotton, update their Cargo.toml dependencies too, like in commit c989d9f1.
  3. Check that everything is pushed upstream, both to self-hosted CI and to Github.
    git push main
    git push github main
  4. Check that both CI pipelines are passing; again, there’s a badge for the Github one: CI status

    If your CI includes statistical metrics (as opposed to pass/fail ones: coverage, for example), check that those are in acceptable ranges too.

  5. Do a dry-run publish, remembering to cd to the crate directory, not the workspace root:
    cargo publish --dry-run
    Cargo will check that the package is buildable; if any errors occur, fix them and go back to Step 3.
  6. Tag the release, using multiple tags if multiple crates are being released:
    git tag cotton-ssdp-0.0.3
  7. Push the new tag to both upstreams:
    git push cotton-ssdp-0.0.3
    git push github cotton-ssdp-0.0.3
  8. Actually publish the crate on crates.io:
    cargo publish
    Hopefully there won’t be any errors, given that the dry-run succeeded.
  9. Let any contributors know that their stuff is now in a release – if any pull requests have been merged, now is the time to let those contributors know that they can go back to using real upstream releases of your crate, and potentially stop maintaining their forks.

Tuesday 2 May 2023

Three SSH settings which aren’t the default, but which you probably want

Previously on #homelab:

     
Everybody uses OpenSSH to securely log in to remote machines. And they have done for ages. But that’s actually a problem, because OpenSSH has been around for so long that some of the security decisions made earlier in the project’s history no longer match current best practices. Here are a few things which would probably be the default if OpenSSH was starting out today, but which – for sound backward-compatibility reasons – you’ll need to arrange for yourself.

ssh-add -c

One criticism of the ssh-agent system is that when using it, you lose visibility of exactly when you’re signing for things. One way to mitigate this is, when using ssh-add to add local private keys to the agent, adding the -c option. This makes ssh-agent ask for confirmation (on console or in a pop-up dialog) every time a private-key operation is requested; unexpected pop-ups could be a sign that nefarious software is trying to use your key.

This also somewhat mitigates the risk, when using the agent-forwarding (ssh -A) feature, of attacks by a malicious actor who has root on the remote computer.

ssh-add -D” on screen lock

On most reasonable systems, you have to give your local Unix password to unlock the screen once the screensaver has kicked-in. But if you feel that your SSH private key is more valuable than your local password – which you probably do, otherwise you wouldn’t’ve bothered encrypting your SSH private key in the first place – then that’s effectively a privilege-escalation attack: if you’ve left a ssh-agent session running, then knowing only your local password gives an attacker login ability using your SSH private key.

The obvious way to mitigate that, is to empty all saved keys from the ssh-agent session, every time the screen locks. Honestly it’s a bit surprising that KDE and Gnome don’t already have this built-in – but (at least in KDE) there’s a hook that lets you do just that.

In KDE “System Settings”, go to “Notifications” then next to “Applications:” click “Configure...”. Scroll down to “Screen Saver” and click “Configure Events...”. In the resulting pop-up window, choose “Screen locked” then below tick “Run command” and enter:

/usr/bin/ssh-add -D

Alternatively, manually add the following to ~/.config/ksmserver.notify:

[Event/locked]
Action=Execute
Execute=/usr/bin/ssh-add -D
Logfile=
Sound=
TTS=

Under Gnome it’s less straightforward; there’s a script available called lockheed, but as it stands it only listens for unlock events; perhaps it would be possible to modify it to listen for lock events too. (Or perhaps wiping the keys from ssh-agent only on unlock is actually okay.)

ssh-keygen -a 1000

Over the years, OpenSSH has gone through a few design iterations on how to encrypt SSH private keys – in other words, what the passphrase you give to unlock the key actually does. If your private key file (usually ~/.ssh/id_rsa) starts with

-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
...

then it is encrypted in a very obsolete way using PKCS#1; such passphrases are vulnerable to brute-forcing, should an attacker get their hands on the file. If, alternatively, it starts with:

-----BEGIN ENCRYPTED PRIVATE KEY-----
MII...

then it’s PKCS#8, which is better but still not great. What you hope to see is the OpenSSH key format, which allows specifying the number of rounds of key-derivation function to use (i.e., how much work an attacker would have to do per guess in order to brute-force the passphrase):

-----BEGIN OPENSSH PRIVATE KEY-----

This particular recommendation slightly undermines the title of the blog post, because the OpenSSH format, which you probably want, is the default when creating new keys nowadays – but only since the OpenSSH 7.8 release of 2018-Aug-24, and OpenSSH does not upgrade from one format to another automatically. (Which is the Right Answer of course, as doing so would break compatibility with older versions if an upgrade ever had to be rolled-back.) This means that if your key was created with an older version of OpenSSH, the encryption used was likely to be, and is likely to still remain, one of the weaker forms.

Fortunately, there’s an OpenSSH facility to update just the encryption of any private key to the newest, most secure format, without altering the actual key:

ssh-keygen -p -f id_rsa -a 1000

This command upgrades the passphrase protection. It asks for the old passphrase (to decrypt the key) and then twice for the new passphrase (to re-encrypt it). If you’re confident that your old private key hasn’t leaked anywhere, you can re-use the same passphrase. (If you aren’t confident of that, you probably need to generate all-new keys anyway.) The -a 1000 sets the number of rounds to 1,000 – up from the default of 16. On this oldish Core-i7 machine, the setting of 1,000 makes checking the passphrase take about ten seconds. (Whether successful or unsuccessful!) This is a slight annoyance for you, but each time you’re waiting those ten seconds you can be thinking about how any attacker trying to brute-force your passphrase will be using up all that CPU on every single wrong guess they make.

Unfortunately, although the number of rounds is stored unencrypted in the key file, there appears to be no straightforward way of reading it out again directly. Following the directions given in a stackoverflow answer, you can use this command to get a hex dump of the base64-decoded encrypted key structure:

cat id_rsa | head -n -1 | tail -n +2 | base64 -d | hexdump -C | head

On a key I had lying around, this produced the bytes shown below. What you’re looking for is the string “bcrypt”, followed by 24 bytes you don’t care about (the KDF-descriptor length 0x0000_0018, the salt length 0x0000_0010, and the salt itself as 16 random bytes) followed by a 32-bit big-endian value which is the number of rounds:

00000000  6f 70 65 6e 73 73 68 2d  6b 65 79 2d 76 31 00 00  |openssh-key-v1..|
00000010  00 00 0a 61 65 73 32 35  36 2d 63 74 72 00 00 00  |...aes256-ctr...|
00000020  06 62 63 72 79 70 74 00  00 00 18 00 00 00 10 1c  |.bcrypt.........|
00000030  d1 ab a0 6b cd 50 a7 8e  01 8c 9a f7 98 32 a6 00  |...k.P.......2..|
00000040  00 00 10 00 00 00 01 00  00 00 33 00 00 00 0b 73  |..........3....s|

In this file it’s 16 (0x0000_0010), the default. Once I’d run the ssh-keygen command on it, it instead appears as 1,000 (0x0000_03e8):

00000000  6f 70 65 6e 73 73 68 2d  6b 65 79 2d 76 31 00 00  |openssh-key-v1..|
00000010  00 00 0a 61 65 73 32 35  36 2d 63 74 72 00 00 00  |...aes256-ctr...|
00000020  06 62 63 72 79 70 74 00  00 00 18 00 00 00 10 cb  |.bcrypt.........|
00000030  d4 4a be 47 b2 26 c9 15  d4 8d 0d d0 36 4c 62 00  |.J.G.&......6Lb.|
00000040  00 03 e8 00 00 00 01 00  00 00 33 00 00 00 0b 73  |..........3....s|

Whenever you need to generate a new SSH key, the ssh-keygen command accepts the -a 1000 option in that case too.

Bonus fourth SSH thing while you’re here

The encfs-agent script lets you set up EncFS encrypted filesystems in such a way that you can use ssh-agent signing operations to unlock (mount) them – with no need to remember or enter a separate passphrase. Use it with ssh-add -c!

Note that the filesystem then remains mounted/decrypted until manually unmounted; ssh-agent is only consulted during the mount operation. The ssh-add -D command does not unmount the filesystem (analogously, it doesn’t close existing SSH sessions either). If you want screen-locking to umount these filesystems, consider the command:

umount -a -t fuse.encfs

That’s a slightly dangerous thing to do, though; it’s not guaranteed to work if any process is holding the filesystem open (by holding a file on it open, or having a current-directory inside it). Even if it does work, running processes might get very confused – perhaps, for instance, by writing important files into the unencrypted mount-point directory outside the EncFS, instead of inside the EncFS where you wanted them.

LATER EDIT: Extra bonus fifth SSH thing

TIL you can put

AddKeysToAgent confirm

in your .ssh/config and it’ll automatically do the equivalent of “ssh-add -c” for any key whose passphrase you supply to an ordinary ssh invocation. Semi-life-changing given the number of times I think my key is in ssh-agent, but it isn’t, and I end up having to enter the passphrase all over again...

Monday 24 April 2023

Things I learned writing an SSDP implementation in Rust

Previously on #rust:

     
A while ago I wrote Chorale, a collection of open-source media-library and networking software, mostly based on the UPnP (Universal Plug-’n’-Play) AV standards (but including other things as well, such as support for Rio Receiver and Empeg car-player devices, whose firmware I’d worked on at those companies, an even longer while ago).

Chorale never got much interest (or, as the tagline on this very blog puts it, “Waits for audience applause ... not a sossinge”); this is no doubt partly due to (closed) cloud streaming platforms for music hugely overtaking in popularity all dealings with (open) local media libraries, but no doubt also due to the completely zero amount of effort I actually spent promoting Chorale (or even, really, mentioning its existence).

So I’ve resolved to be a bit more talkative about open-source things I do from here on in. Inspired by a post by Simon Willison, I’m trying to adopt his mantra that anything you do, at least in the software world, isn’t finished until you’ve told people that you’ve done it – especially if part of the reason you’re doing it in the first place, is in the hope of spreading useful information and good practices.

In this post I’ll tell you what I did, when I went to implement one of the first, lowest-level parts of the UPnP protocol suite in Rust: SSDP. This is the backstory of the cotton-ssdp crate.

Background: a Simple Service Discovery Protocol

SSDP, the Simple Service Discovery Protocol, is aimed at making it easy for separately-purchased and separately-designed devices to discover each others’ resources over a local network, so that they can later interact using higher-level protocols such as UPnP itself. A resource might be a streaming-media server, or a router, or a network printer, or anything else that someone might want to search for or enumerate on a network. Like most protocols, there are two basic rôles involved: a server advertises the availability of a resource, and a client discovers available resources.

What precisely is advertised, or discovered, is, for each resource, a unique identifier for that particular resource (Unique Service Name, USN), an identifier for the type of resource (Notification Type, NT), and the location of the resource in the form of a URL.

SSDP is mainly used by UPnP systems, such as for media libraries and local streaming of music and video – it is how diskless media players such as the Roku Soundbridge discover the available music servers – but the mechanism is quite generic, and could as easily be used for any type of device or resource that must be discoverable over a network. Effectively it enables the automatic self-assembly of (locally-) distributed systems, including in ad hoc settings which don’t necessarily have expert network administrators close at hand to configure them manually.

For a learning project in Rust, SSDP is ideal as it’s about as simple as a network protocol could possibly be without being uninteresting. SSDP servers send out advertisements over UDP multicast to a “well-known” multicast group, which can be picked up passively by clients in that group; and clients can also actively send out search requests, also over UDP multicast, to which servers will respond.

There is no Internet RFC as such for SSDP – merely some expired drafts. The protocol is, instead, documented in the UPnP Device Architecture documents.

SSDP is very similar in motivation, and (it turns out) fairly similar in implementation, to the Multicast DNS (mDNS) protocol – but is not directly compatible.

Particular goals of the cotton-ssdp crate

In no specific order:

  1. Produce a useful SSDP implementation, which other Rust users who need SSDP might reasonably hear about and decide to use;
  2. Personally learn more about Rust and about the packaging, distributing, and publicising of Rust crates;
  3. Implement both client and server;
  4. Support use both in full-size/desktop/hosted environments, and in embedded/freestanding/no_std environments;
  5. Do the Right Thing on multi-homed hosts (ones with multiple active network interfaces), which were the source of a lot of complexity in Chorale and in proprietary network protocols which I’ve worked on in my day job;
  6. Follow, and indeed embody, if not evangelise, software engineering best practices: lots of documentation, lots of tests.

Things I learned about

  1. Effective techniques for getting good test coverage
  2. Automate your quality checks as far as possible
  3. Not picking a lane when it comes to third-party ecosystems
  4. Effective techniques for multi-homed hosts
  5. The journey to no_std
  6. Using the remarkable free tools available

1. Effective techniques for getting good test coverage

Coming from a C++ background, I was very impressed with the “batteries-included” nature of the Rust (Cargo) test setup. It’s as easy to start writing tests as it is to start writing the code in the first place – which is a Good Thing, as it means that every project can start out on the right side of history by including good testing from the very beginning.

But even here, there are better and worse ways to write tests, and it’s worth learning how to structure your tests in order to be most effective. As well as including the test harness itself, the Rust/Cargo system has baked into it the difference between unit tests and integration tests: outside the Rust world, this is a distinction about which philosophical debates can be had, but in Rust these two terms refer to two very specific and distinct things. When I started out in Rust, I was bundling all the tests into a mod tests in the same .rs file as the implementation: these are Rust’s unit tests. And I soon got in a pickle, because on the one hand I wanted to mock out (or dependency-inject) fake versions of some of the dependencies (specifically, system calls such as setsockopt whose failure is hard to arrange for in a test, but needs to be handled if it does happen in reality), but on the other hand I didn’t want the hassle of reimplementing the actual functionality of the real calls when testing the success case. The Rust mocking library mockall, for instance, replaces the real dependency completely when compiled in cfg(test) mode, as is done when compiling unit tests.

But Rust was way ahead of me here, and had already solved that problem. Tests for the success case – tests which can be written against the public API of the library – can be written as Rust integration tests, in the crate’s tests directory. Those are linked against the real version of the crate, compiled without cfg(test), so they get the actual, non-mock dependencies.

(As it stands, cotton-ssdp does not use mockall, but that’s because I had difficulty with “false negatives” in code-coverage, where it said not all regions were covered, but then didn’t show any actually non-covered regions when drilling-down into the source file. I suspect, but can’t prove, that this is because the coverage tool gets confused when a piece of code is monomorphised twice, once for real and once for mock, and can’t always correctly match up equivalent regions between the two.)

1.1 An example of dependency injection

Here’s the code I first wrote for creating a new UDP socket and setting the particular socket options that cotton-ssdp will need:

fn new_socket(port: u16) -> Result<mio::net::UdpSocket, std::io::Error> {
    let socket = socket2::Socket::new(
        socket2::Domain::IPV4,
        socket2::Type::DGRAM,
        None,
    )?;
    socket.set_nonblocking(true)?;
    socket.set_reuse_address(true)?;
    let addr =
        std::net::SocketAddrV4::new(std::net::Ipv4Addr::UNSPECIFIED, port);
    socket.bind(&socket2::SockAddr::from(addr))?;
    setsockopt(socket.as_raw_fd(), Ipv4PacketInfo, &true)?;
    let socket: std::net::UdpSocket = socket.into();
    Ok(mio::net::UdpSocket::from_std(socket))
}

Now how does one test the error code paths from set_nonblocking and friends – all those question-marks? Well, one way is to use dependency injection – that is, to arrange that in unit-test builds, a “fake” version of the system-call is made (one which, for testing purposes, can be arranged to fail on demand), yet in the finished crate the actual, non-artificially-failing system-call is of course used. That change looks like this:

type NewSocketFn = fn() -> std::io::Result<socket2::Socket>;
type SockoptFn = fn(&socket2::Socket, bool) -> std::io::Result<()>;
type RawSockoptFn =
    fn(&socket2::Socket, bool) -> Result<(), nix::errno::Errno>;
type BindFn =
    fn(&socket2::Socket, std::net::SocketAddrV4) -> std::io::Result<()>;
 
fn new_socket_inner(
    port: u16,
    new_socket: NewSocketFn,
    nonblocking: SockoptFn,
    reuse_address: SockoptFn,
    bind: BindFn,
    ipv4_packetinfo: RawSockoptFn,
) -> std::io::Result<mio::net::UdpSocket> {
    let socket = new_socket()?;
    nonblocking(&socket, true)?;
    reuse_address(&socket, true)?;
    bind(
        &socket,
        std::net::SocketAddrV4::new(std::net::Ipv4Addr::UNSPECIFIED, port),
    )?;
    ipv4_packetinfo(&socket, true)?;
    Ok(mio::net::UdpSocket::from_std(socket.into()))
}
 
fn new_socket(port: u16) -> Result<mio::net::UdpSocket, std::io::Error> {
    new_socket_inner(
        port,
        || {
            socket2::Socket::new(
                socket2::Domain::IPV4,
                socket2::Type::DGRAM,
                None,
            )
        },
        socket2::Socket::set_nonblocking,
        socket2::Socket::set_reuse_address,
        |s, a| s.bind(&socket2::SockAddr::from(a)),
        |s, b| setsockopt(s.as_raw_fd(), Ipv4PacketInfo, &b),
    )
}

Now all the tests can call new_socket_inner, passing versions of new_socket, nonblocking, and so on which are as fallible as can be – but users of the real library are non the wiser; indeed, the API they see has not changed. (The actual tests are visible in commit d6cf9af2; they subsequently moved around a bit as later refactorings were done.)

This is also a standard dependency-injection technique in C++, but one of the hidden wonders of Rust is that the language design allows for a compiler to make lots of optimisations that, through technicalities, might be harder to prove correct in C++ (it’s not like I’ve quizzed the designers on whether this was an explicit goal of the design – but things have worked out that way so often, that really it must have been). The standard compiler is very good at optimising away abstractions like new_socket_inner: it spots that (in non-cfg(test) builds) the only call site of new_socket_inner is in new_socket and it inlines it completely; it also inlines all the lambdas (closures) and turns them back into direct function calls. In this case I checked, and the two versions of the code shown above produce identical assembler: a literally zero-cost abstraction.

Good code coverage is not, of course, any indicator of good code or even of good testing. But poor code coverage is very much an indicator of poor testing – so, much like a compiler warning, it’s always something worth looking at just in case it indicates a real underlying problem of poor code.

2. Automate your quality checks as far as possible

Engineering is the art of the repeatable; art is the engineering of the unrepeatable. Or, put differently: if it costs you actual human effort to make sure an engineered product, such as a particular version of a piece of software, is proper and correct, then that effort itself is, in a sense, written on the wind – as it tells no-one anything about whether any other versions of that software are proper or correct. So effort is usually much better applied on automating quality-assurance work than it is on performing (rote) quality-assurance work; it’s both dehumanising and non-cost-effective to send a person to do the job of a machine. (Though of course some quality-assurance work isn’t automatable: such work is often known as exploratory testing and works best when viewed as an exercise in creatively finding corner-cases or oversights in the automated test suite, which can then be automated in their turn.)

But there’s more to quality-assurance than testing. It’s also a discipline of quality-assurance to run code through analysis tools, to pick up on any potential errors. In the C++ world it’s long been received wisdom to enable most or all available compiler warnings, across multiple alternative C++ compilers if possible: the small iatrogenic downside of avoiding false positives from compiler warnings, is outweighed by the benefits accrued to the project each time a warning pinpoints an actual mistake. In Rust, compiler warnings are usually very good and actionable (and peculiar idioms for avoiding them seem rare), and Clippy lints even more so. The “cargo clippy” command could equally aptly have been called “cargo clear-up-my-misconceptions-about-how-this-language-works”, although “clippy” is definitely easier to type.

The principle of “the more automation the better” applies of course to continuous integration (automatic building and testing) too. An earlier post on this blog described how to set up a local CI system, though for packages that aren’t aimed at embedded systems, CI services are also available in the cloud (and some SaaS engineers speak as if the term “CI” implies a cloud-based service, having seen no other design). Projects hosted in public Github repositories can take advantage of Github’s CI infrastructure (and should do, even if only as a quality filter on incoming pull requests); more on that later on.

Once a crate’s functionality starts to be split up into Cargo “features”, configuring CI to test all feature combinations can be a chore. The cargo-all-features crate can help with this, by running “cargo build” (or check, or test) on all combinations of features in one command. This also helps to surface any issues where features are interdependent, but where that interdependency isn’t spelled out in Cargo.toml. (For instance, in cotton-ssdp, the “async” feature doesn’t compile without the “std” feature enabled, so Cargo.toml specifies that dependency.) Running “cargo test-all-features” takes, in theory, exponential time in the number of features – but, at least in cotton-ssdp’s case, the great bulk of the work is done just once, and the 2n-1 additional builds are all very fast.

At the top of cotton-ssdp’s lib.rs are the lines

#![warn(missing_docs)]
#![warn(rustdoc::missing_crate_level_docs)]
which automate some quality checks on the documentation. Also assisting the documentation are these lines
#![cfg_attr(nightly, feature(doc_auto_cfg))]
#![cfg_attr(nightly, feature(doc_cfg_hide))]
#![cfg_attr(nightly, doc(cfg_hide(doc)))]
which, between them, arrange for those “only available with feature blah” labels on optional structs and methods in the docs.rs documentation – see later on for how to get docs.rs documentation builds to set the “nightly” attribute, as it isn’t built-in.

One last quality check, if you can call it that, is provided by Rust’s borrow-checker and ownership rules. A common refrain from Rust developers is, “Oh, I’m currently fighting the borrow-checker”. But what this reminds me of most, is an anecdote about a famous bull-fighter being asked what exercises he does to keep in shape for his toreadoring: “Exercises? There is something you don’t understand, my friend. I don’t wrestle the bull.” The anecdote appears in DeMarco and Lister’s “Peopleware” (and, slightly worryingly, apparently nowhere else on the whole of Google) – where the authors use it to suggest that if some activity feels like you’re wrestling a bull, you’re probably not going about it in the most efficient or masterful way. Hand-to-hand combat with the borrow-checker can sometimes be reframed as not heeding its suggestion that the design you’re pursuing could be done differently with more clarity around memory-safety, and thus better reliability or maintainability. The borrow-checker is like a compiler warning, but for your design instead of your implementation, and it’s actionable in the same way: it’s telling you that you might be able to barrel on regardless, but you should really stop and think about whether you’re doing things the best way.

Though admittedly, by that theory, it’s a shame that the borrow-checker only offers that suggestion at quite a late stage of the design process (viz., during actual coding); it’s a bit like being told, “You can’t lay that next brick there.” – “Why not?” – “Because the house that you’re building is going to be ugly.” It would have been better if the ugliness of the house had been spotted at the blueprint stage! There’s a gap in the market, and indeed in the discourse as a whole, for a book on Rust design patterns which would help stave off these issues at an earlier conceptual stage.

3. Not picking a lane when it comes to third-party ecosystems

There are two, not very compatible, ways of writing Rust networking code: synchronous and asynchronous. When hoping to write a library that’s useful to people (one of the stated goals), it makes most sense to me to try and be useful on both sides of this divide. After all, SSDP itself can be boiled down to, packets arrive, other packets go out in response (or in response to a timer expiring). That’s a usefully narrow-sounding interface. So what cotton-ssdp does, is hide all the actual SSDP work away in a struct called Engine, and offer two as-thin-as-possible wrappers over it: one for the synchronous (mio) case (Service) and a second for the asynchronous (tokio) case (AsyncService).

This also involved abstracting over the difference between tokio::net::UdpSocket and mio::net::UdpSocket, using traits which are implemented for both and over which the relevant methods of Engine are generic:

    /// Notify the `Engine` that data is ready on one of its sockets
    pub fn on_data<SCK: udp::TargetedSend + udp::Multicast>(
        &mut self,
        buf: &[u8],
        socket: &SCK,
        wasto: IpAddr,
        wasfrom: SocketAddr,
    ) 

Both the Engine and UDP abstractions will hopefully also turn out to be exactly what’s needed when porting SSDP to embedded platforms using smoltcp or a similar embedded networking stack. (So in fact, it’s not quite right to say that the SSDP implementation is “hidden” away in Engine – Engine is part of the public library API.)

4. Effective techniques for multi-homed hosts

Networking applications that run on multi-homed hosts – machines with more than one TCP/IP-capable network interface – often need to take particular care over which networks services are offered to: it’s commonplace for a host such as a firewall to have interfaces onto networks with different security policies. (This is so rarely dealt with well, that it’s also commonplace for such setups to resort to containerisation to ensure that certain services are only offered on certain interfaces – which does have the advantage that all such containers or services can then have their network access configured in the same way, rather than with service-specific ways.)

But supporting multi-homed hosts well in a networking server isn’t a huge challenge, so long as it’s thought-about early in the design process. The fundamental idea is always to keep track of which network interface is being used to talk to any given client; Linux, for instance, assigns long-lived “interface indexes” (smallish integers) to each available hardware interface. Knowing which interface packets arrived on (especially in the UDP case) gives the server enough information to know how to send its reply – and in the particular case of a discovery protocol such as SSDP, can inform the content of the reply.

For example, if the SSDP resource being advertised is, from the server’s point-of-view, http://127.0.0.1:3333/test, and its Ethernet IP address is 192.168.1.3, and its wifi IP address is 10.0.4.7, then anyone listening to SSDP on Ethernet should be told that the resource is at http://192.168.1.3:3333/test and anyone listening on wifi should instead be told http://10.0.4.7:3333/test.

In a UDPv4 service such as SSDP, the Linux IP_PKTINFO facility can be used to obtain, at recvmsg time, the interface index on which a packet arrived; similarly, sendmsg can be told which interface to send on. (Fortunately, unlike SO_BINDTODEVICE, this does not require CAP_SYS_RAW, although that restriction on SO_BINDTODEVICE was lifted in kernel 5.7.) The SSDP server can then determine which of the host’s IP addresses that particular client should use, and rewrite the resource URL accordingly when replying.

With care, this can even be used to do the Right Thing when more than one network interface has the same IP network and netmask, just so long as they still have different IP addresses – a situation which is particularly likely to arise if RFC3927 “Zeroconf” networking is in use on multiple physical networks.

Although cotton-ssdp is currently IPv4-only, it appears that IPv6 has noticed and ameliorated the same problem, with APIs taking IPv6 link-local addresses also taking a “Scope ID” which is intended to be used very similarly to this use of the interface-index.

(Every time I see “multi-homed” written down, even if I wrote it myself, I read it the first time as “multi-horned”, which is a metaphor so much more apposite that it makes me wonder whether “homed” was just a typo that stuck.)

5. The journey to no_std

Rust draws a formal distinction between (what in the C world are called) hosted and freestanding implementations: hosted implementations are servers, desktops, and mobile platforms with the usual wide range of standard library facilities, and freestanding implementations are embedded systems where even standard library facilities are not necessarily available. (In fact C draws this distinction too, but as the standard C library is itself so low-level, only the most deeply-embedded systems are freestanding by C’s definition.)

Rust target platforms indicate their freestanding nature with “-none-” as the OS part of their target triple, such as in “thumbv7em-none-eabi”; Rust crates targeting such platforms indicate that by a crate-level #![no_std] attribute. In a no_std crate, much of the standard library is still available (very core parts such as Result and Option are in a crate called core, which works even if no heap is available; facilities that require an allocator live in a second crate called alloc), but anything which requires system-calls is not available.

As cotton-ssdp is aimed at both hosted and embedded users, it needs to be possible (but optional) to compile it in no_std mode. The standard idiom for this, is to have a (Cargo) feature called std, which is usually enabled by default, but can be disabled to produce a no_std-compatible crate. This is taken care of in the code by adding, at the top of lib.rs, the line...

#![cfg_attr(not(feature = "std"), no_std)]

...and then in Cargo.toml, declaring std as a feature and making any non-no_std-compatible dependencies optional and dependent on std; the Cargo.toml changes are in commit 47a0290d, which also marks the examples and integration-tests as depending on std.

The issues that might arise when enabling no_std mode in a crate revolve around: system types whose names (libraries) are different; system types which aren’t available at all; dependencies which themselves need to have their no_std build enabled; and dependencies which have no no_std mode, which must be either encapsualated away or worked around altogether. All of those issues cropped up in cotton-ssdp.

The entire journey to no_std support can be seen in the pull request, though it might look a bit odd because in fact I only created the PR retrospectively, after the branch was already merged, for the purposes of this blog post; that seemed the easiest way to get all the commits listed in one place. The eleven commits on that branch are (in chronological order):

  1. 364e7c5 Split UDP implementations into core, std, mio, tokio – once again, something seemed a bit tiresome (annotating everything in udp.rs with cfg(feature = "std")) but that tiresomeness actually represented a “compiler warning” about the design; the UDP traits belong in a separate source file to the std-using implementations, and they in turn belong in separate source files to each other. Now only the mod declarations need to have cfg annotations attached!
  2. e590562 Make message.rs no_std-compatible – some of the std facilities previously used here are available in core or alloc, but HashMap is not (because it relies on a random number generator), and nor is io::Cursor (as it uses io::Error, which is in std because it needs to be OS-specific in order to represent system-call errors). Fortunately, reimplementing Cursor from scratch is not hard.
  3. 0380edf Make UDP errors more ergonomic (and also more no_std-compatible) – rather than forcing SSDP errors into the io::Error template, use a custom error class (preparatory to cfg-ing out io::Error altogether). This actually makes the crate’s errors easier to use, by following the suggestions in in a recent blog post I read about modularising errors.
  4. 47a0290 Featureise std, sync, and async – this is the above-mentioned commit which actually introduces the cargo-level features. As well as enabling no_std, making “async” a feature means that clients who need only the synchronous version, do not need to pull in the sizeable tokio dependencies. This commit also adds the no_std_net crate, which usefully provides the common IP address types in no_std builds, while being a pass-through for the standard-library ones in std builds. (A future release of Rust will move those types to core, but I didn’t want to wait.)
  5. fc344f0 Update dependencies – fixes test breakage from the previous commit, as cotton-ssdp’s tests require cotton-netif’s sync feature, not just its std one. Fortunately, using cargo test-all-features made this issue show up straightaway.
  6. 0005fe0 Factor RefreshTimer out of Engine – it’s desirable for Engine to compile in no_std mode, because that’s intended to be the basis even of embedded implementations of SSDP. But Engine used std::time::Instant, which (especially in the now method) relies on system-specific clock facilities (std::time::Duration does not rely on those facilities, and is available in no_std builds as core::time::Duration). In practice, most embedded platforms support some variant of rtic_monotonic::Monotonic for clock facilities (a list of implementations of that trait for common microcontrollers can be found on crates.io), but abstracting over both that and the standard library felt like an abstraction too far, so instead this commit moves the whole refresh-timer mechanism out into a separate struct, so that an embedded implementation depending on Monotonic can be provided later and just slotted-in.
  7. 6899933 Actually set no_std for non-feature-"std" – this is where the crate is first built with no_std set, which necessitated use of the alloc crate to find String and Format, as well as cfg’ing off some uses of std that couldn’t be ported. In particular, the variant of the udp::Error enum which wraps an underlying system-call error is not, of course, no_std. The [#non_exhaustive] tag tells the Rust compiler that more variants (perhaps one encapsulating a smoltcp error) may be added in future.
  8. 5ac330e Remove "url" crate from public API – this crate, a hard dependency used within Engine (for rewriting resource URLs on a per-IP-address basis, see above) which does not compile under no_std. Fortunately only a tiny fraction of the crate’s functionality is actually used in cotton-ssdp – functionality which can, as the least-worst option, just be reimplemented from scratch. But the first thing to do is eliminate the url::Url type from the public interface; this is a breaking change which would cause a semver major change, if it weren’t for the fact that the semver major already needed changing due to other API changes since 0.0.1.
  9. 38694d1 Remove usage of "url" crate – this is where the url.set_ip_host call is replaced with a simpler equivalent. Notably this new code has a bug, only fixed later in b92ded1, which causes it to do the Wrong Thing with any URL containing an explicit port (such as, er, the example ones discussed above where the need for the rewriting is explained).
  10. 261a6bb Replace HashMap usage with BTreeMap – as mentioned, HashMap is not available in no_std, but BTreeMap is, and has essentially the same interface.
  11. 0faacd5 Actually compile ssdp including Engine with no_std – finally cargo build --no-default-features works (including Engine and Message). Moreover, so does cargo build --no-default-features --target thumbv7em-none-eabi – that’s a freestanding target with no std support whatsoever.
Oh, plus there’s another commit that wasn’t on that branch, but should have been:
  1. b92ded1 Fix rewrite_host bogusly stripping ":port" from URLs – fixes a bug introduced when removing the “url” crate. It seems pretty likely that the only forms of URL that make any sense as SSDP locations start with http://127.0.0.1..., not even HTTPS – but the Principle of Least Surprise suggests that the code should at least not flake out if given one in a different form.

Between them, those commits hopefully tell the whole story of taking a crate initially written in hosted Rust, and making it work in both hosted and freestanding applications.

6. Using the remarkable free tools available

Most of the anecdotes in this post are hopefully applicable to both proprietary and open-source software development. But in the open-source case in particular, it’s worth looking into the freely-available online tools that support such development efforts. Even getting started in Rust development introduces everyone to crates.io and to docs.rs, but it’s also worth looking into:

Github Actions cloud CI
Continuous-integration servers run by Github, and automatically triggered on pushes to Github repositories (if your project contains YAML files for workflows, perhaps a bit like cotton’s one).
docs.rs
Documentation pages on docs.rs are built automatically for all crates published to crates.io. The build is always done with a nightly toolchain, so you can use nightly-only features such as doc_auto_cfg – just make sure you leave your crate buildable by stable and other non-nightly toolchains, perhaps by guarding the doc_auto_cfg attribute like this:
#![cfg_attr(nightly, feature(doc_auto_cfg))]
and then adding this to your Cargo.toml such that docs.rs builds always enable the “nightly” cfg:
[package.metadata.docs.rs]
all-features = true
rustdoc-args = ["--cfg", "nightly"]
deps.rs
Checks the dependencies in your Cargo.toml, and warns you about out-of-date versions (and warns you strongly about ones with known security problems). It’s wired up to Github, so can be used on any project published there. (For Cotton, that means there’s one page for the whole monorepo, not one page per crate, but it’s subdivided into the dependencies of each crate.)
codecov.io
Code-coverage as a service. You have to sign up for an account, but it’s free for public repositories. Support can be incorporated into Github CI workflows, which is how it works for cotton-ssdp.

The neat thing is that many of these services provide “badges”: little status icons that you can add to README.md on Github, or to any web page. The statuses below aren’t just a pretty picture: each one is the live status, on most of them you can click-through to the underlying data, and as you’re reading them right now, you might be noticing an issue before I do:

Cotton project: GitHub last commit GitHub Repo stars GitHub contributors
cotton-netif:
cotton-ssdp:

Tuesday 7 February 2023

Network firewall on a Raspberry Pi using OpenWRT

Previously on #homelab:

     
For a while I’ve been using a TP-Link TL-R860 router as the main firewall and switch for my home ethernet network. This has worked well and reliably (unlike the Linksys WRT54-G which I eventually put on a Christmas lights timer to reboot it every night), but it’s old now and is no longer getting security updates (if it ever did). Its HTTPS configuration UI understands nothing newer than SSL3.0, and I wouldn’t even know where to find a browser prepared to lower itself to that these days. As it’s the final line of defence for my home network, I wanted to upgrade to something a bit newer.

I’ve still got a handful of Raspberry Pi 3 devices knocking around, so I thought I’d press one of them into service as the new ethernet-to-ethernet firewall – plus, OpenWRT is open-source and widely used, and so probably more trustworthy than a proprietary router that’s (searches “Amazon” folder in Evolution) over ten years old now. And gigabit switches are cheap these days, which might provide a handy speedup, at least for internal traffic, from the 10/100 switch in the TP-Link. That means that the goal is for the network to look like this:

So here’s a complete list of parts:

  • Raspberry Pi 3B (or a Raspberry Pi 4 if you can get one)
  • Branded Raspberry Pi power supply
  • A USB flash drive to be the root filesystem (it doesn’t need to be high-performance; I used a Sandisk one which is literally “Amazon’s Choice” for the search “flash drive”)
  • A USB ethernet adaptor supported by Linux (I used an Amazon Basics one)
  • A gigabit ethernet switch (any one will do; I picked a Zyxel one mostly because I wanted one with the LEDs on the front and the sockets on the back)

Just as when setting up a home server, if you’re using a Raspberry Pi 3 you’ll need to enable boot-from-USB; the process is described in the linked post and you’ll need to boot once from a microSD card to be able to make the configuration change. The Raspberry Pi 4, though, can boot from USB out-of-the-box.

Alternatively, you could just stick with a microSD card for the whole process, but it’s persistently rumoured that running a Raspberry Pi from a microSD card is bad for long-term reliability, and that running from USB is better.

This blog post will cover:

  1. Installing OpenWRT
  2. Adding a second ethernet interface via USB
  3. The security setup
  4. Conclusion

The security setup will, as in previous posts, follow the principle of minimising potential attack surface. This post is as much about writing down for my own benefit what I did here, as it is for any wider purpose!

1. Installing OpenWRT

OpenWRT is at heart just a very custom Linux distribution, and it’s installed onto the Raspberry Pi like any other Linux. The OpenWRT folks have a specific page on using the Raspberry Pi. Download the latest “Firmware OpenWRT Install” img.gz file from the Installation section there; when I did it, that was version 22.03.3.

Install it on your flash drive (or microSD card) using rpi-imager (on Ubuntu, sudo apt install rpi-imager). Click “Choose OS” then “Use custom” then select the img.gz file you just downloaded. The click “Choose storage” and select your USB flash drive (or microSD card); be careful because other disks that you don’t want wiped may also be listed. Then click “Write”.

The default image contains a boot partition that assumes that the root filesystem will be on the microSD card (/dev/mmcblk0p2); if you’re using a USB flash drive then that’s not going to work. So in that case you need to mount the boot partition from the flash drive, and edit cmdline.txt to change the part of the command-line that specifies the root partition. Find the part that says root=/dev/mmcblk0p2 and change it to root=/dev/sda2. Don’t alter the rest of the command-line. Unmount the partition and unplug the flash drive.

With that flash drive, your Raspberry Pi is now able to run OpenWRT. Initially the new OpenWRT install will set itself up with IP address 192.168.1.1 on the Raspberry Pi ethernet interface; you can connect to this directly with an ethernet cable from a laptop (or desktop), setting the laptop’s IP address to 192.168.1.2. You can then use SSH from the laptop to log into the Raspberry Pi:

ssh root@192.168.1.1

Now would be a good time, as OpenWRT itself now suggests, to change the root password from the default blank one.

The Raspberry Pi isn’t yet a very useful firewall, though, because it only has the one ethernet port. It does additionally have wifi, but that’s not enabled by default and anyway I’m specifically after an ethernet-to-ethernet firewall here. So it’s time to add the second ethernet interface.

2. Adding a second ethernet interface via USB

The goal is for the Raspberry Pi’s built-in ethernet to be the inside of the firewall, connected to the switch, and for the USB ethernet to be the outside, connected to my ISP’s router.

Parts of this section are based on Vladimír Záhradník’s article on Medium (also here) – but, like me, you’ll need to go off-piste from that script a little if it’s awkward to connect the firewall to the internet straightaway.

And it’s likely that the firewall won’t be on the internet straightaway, because although the laptop can use SSH or HTTP to talk to the Raspberry Pi, the laptop probably won’t be set up to forward packets, so the Raspberry Pi itself will not be able to access the internet.

The first thing to do is work out what Linux driver is needed for your chosen USB etthernet adaptor; the easiest way to do that, is just plug it in to an existing Linux box and see what module gets auto-loaded; in my case it was ax88179_178a.

You need to find the equivalent driver package starting from downloads.openwrt.org: I clicked on “OpenWRT 22.03.3” then “bcm27xx” then “bcm2710” (that’s for Raspberry Pi 3, for a 4 it would be “bcm2711”), then “packages”, then I looked along the packages starting with “kmod-usb-net” until I found one with “ax88179” in the name: this turned out to be kmod-usb-net-asix-ax88179_5.10.161-1_aarch64_cortex-a53.ipk.

Now from a fully-operational OpenWRT install, I could just install that using “opkg install kmod-usb-net-asix-ax88179” and it would fetch and install it directly; for this initial setup, though, I had to download it manually onto the laptop, then scp it to the firewall, then install locally:

on the laptop
scp kmod-usb-net-asix-ax88179_5.10.161-1_aarch64_cortex-a53.ipk root@192.168.1.1:
ssh root@192.168.1.1
now on the Pi
opkg install ./kmod-usb-net-asix-ax88179_5.10.161-1_aarch64_cortex-a53.ipk

And the opkg command failed: it needed some further modules, some dependencies. An online opkg would fetch and install dependencies automatically, but I had to do that manually, by repeating the download-and-scp process for the following:

  • kmod-libphy_5.10.161-1_aarch64_cortex-a53.ipk
  • kmod-mii_5.10.161-1_aarch64_cortex-a53.ipk
  • kmod-usb-net_5.10.161-1_aarch64_cortex-a53.ipk

Once those were installed, the main ax88179 package could also be successfully installed.

When I did this, the ethernet adaptor was still not being detected, and it took me far too long to realise that it was still plugged into the laptop from when I was working out what driver it needed. As soon as I plugged it back into the Raspberry Pi, it worked fine and appeared in OpenWRT’s web UI as eth1 (with eth0 being the built-in ethernet).

3. The security setup

Almost everything else can now be configured from the web UI. Under the “Network” menu, choose “Interfaces”, then “Add new interface”, call it “wan”, choose “DHCP client”, and assign it to eth1.

Under “Network” and “Firewall” the default will probably already be correct: LAN-to-WAN traffic gets forwarded, WAN-to-LAN traffic gets rejected except that masquerading (i.e., NAT) is enabled.

Under “System” and “Administration” you can set up SSH access: have it listen on the LAN interface only and give it your SSH public key. Once you’re sure that SSH public-key authentication is working, you can disable SSH password authentication. (In the worst-case scenario, you can still log in to the firewall on console, using a keyboard and monitor.)

Under “HTTP(S) Access”, make sure “Redirect to HTTPS” is turned off; this is less secure but we’ll be re-securing it a bit later.

Back under “Network” and “Interfaces”, choose “Edit” for the LAN interface and you can set the IP address and configure the DHCP server. You get a warning message when changing the settings, because that’s the interface from which you’re using the web UI, but if you’re confident in your settings then make the change; worst case you might need to unplug and re-plug the network cable in order for the laptop to realise it needs to renegotiate its own IP address on ethernet.

You can now connect the firewall’s WAN interface to the upstream (ISP) router and use SSH to log in to the firewall (via its LAN interface) under its final LAN IP address. The firewall now has internet connectivity via the upstream router, so now is a good time to log in and install any additional packages you might need:

opkg install nano

Now is also a good time to set up anything required on the upstream router – for instance a static DHCP address, or any port forwarding.

I made one final security-related configuration change. By default the web UI listens on all interfaces, on both IPv4 and IPv6. That’s not terrible, because a further login is still required to actually do anything, and because firewall rules on the WAN interface will drop connection requests anyway. But just to reduce attack surface, I changed that to only listen for connections from localhost. To do that, SSH into the firewall and edit /etc/config/httpd. As first installed, it will have lines a bit like these:

/etc/config/uhttpd
...
	list listen_http	0.0.0.0:80
	list listen_http	[::]:80
	list listen_https	0.0.0.0:443
        list listen_https	[::]:443
...

And to change it to listen on localhost only, replace those four lines with this one (leaving the rest of the file unchanged):

/etc/config/uhttpd
...
        list listen_http '127.0.0.1:80'
...

Now the only way to access the web UI is via SSH tunnelling – in other words, only an attacker with SSH credentials can change anything. So I use a script to run the SSH command, in order to avoid remembering the command parameters (“donk” is the hostname of the firewall):

ssh-donk
#!/bin/bash

# 8001: openwrt

exec ssh -t \
     -L 8001:localhost:80 \
     root@donk

And with that running on the laptop, the OpenWRT web UI is accessible in the laptop’s web browser as http://localhost:8001.

4. Conclusion

Setting up a Raspberry Pi 3 as a firewall was relatively straightforward. OpenWRT is well-supported and gets ongoing security maintenance, and is much less likely to contain unexpected backdoors (or even bugs) than an ancient proprietary firewall. And its performance is good – or at least, better than the dusty old TP-Link thing. From inside the firewall, my laptop now gets 94Mbits/s download, 20Mbits/s upload – up from 28 down and 8 up when using the previous firewall. Quite possibly a Raspberry Pi 4 could do even better; 94Mbits/s is very close to saturating the Raspberry Pi 3’s onboard 10/100 ethernet interface, whereas the Raspberry Pi 4 has gigabit ethernet onboard.

Friday 27 January 2023

Self-hosted CI for Rust and C++ using Laminar

Previously on #homelab:

     
I’ve been a keen user of the Jenkins continuous-integration server (build daemon) since the days when it was called Hudson. I set it up on a shadow IT basis under my desk at Displaylink, and it was part of Electric Imp’s infrastructure from the word go. But on the general principle that you don’t appreciate a thing unless you know what the alternatives are like, I’ve recently been looking at Laminar for my homelab CI setup.

Laminar is a much more opinionated application, mostly in the positive sense of that term, than Jenkins. If Jenkins (as its icon suggests) is an obsequious English butler or valet, then Laminar is the stereotype of the brusque New Yorker: forgot to mark your $JOB.init script as executable? “Hey pal. I’m walking here.”

But after struggling occasionally with the complexity and sometimes opacity of Jenkins (which SSH key is it using?) the simplicity and humility of Laminar comes as a relief. Run a sequence of commands? That’s what a shell is for; it’s not Laminar’s job; Laminar just runs a single script. Run a command on a timer? That’s what cron (or anacron) is for; it’s not Laminar’s job; Laminar provides a trigger command that you can add to your crontab.

So what does it provide? Mostly sequencing, monitoring, statistics-gathering, artifact-gathering, and a web UI. (Unlike Jenkins, the web UI is read-only – but as it exposes the contents of all packages built using it, it’s still best to keep it secure.) I have mine set up to, once a week, do a rustup update and then check that all my projects still build and pass their tests with the newest nightly build (and beta, and stable, and the oldest supported version). It’s very satisfying to glance at the Laminar page and be reassured that everything still builds and works, even if I’ve been occupied with other things that week. (And conversely, on the rare occasions when a new nightly breaks something, at least I find out about it early, as opposed to it suddenly being in my way at a time when I’m filled with the urge to be writing some new thing.)

This blog post will cover:

  1. Installing Laminar
  2. CI for Chorale, a C++ package
  3. CI for Cotton, a Rust package
  4. Setting up Git to build on push
  5. CI for rustup

You should probably skim at least the first part of the C++ section even if you’re mostly interested in Rust, as it introduces some basic Laminar concepts and techniques.

By the way, it’s reasonable to wonder whether, or why, self-hosted CI is even a thing, considering that Github Actions offer free CI for open-source projects (up to a certain, but very generous, usage limit). One perfectly adequate answer is that the hobby of #homelab is all about learning how things work – learning which doesn’t happen if someone else’s cloud service is already doing all the work. But there are other good answers too: eventually (but not in this blog post) I’m going to want CI to run tests on embedded systems, STM32s and RP2040s and the like – real physical hardware, which is attached to servers here but very much not attached to Github’s CI servers. (Emulation can cover some of this, but not for instance driver work, where the main source of bugs is probably misconceptions about how the actual hardware works.) Yet a third reason is trust: for a released open source project there’s, by definition, no point in keeping the source secret. But part of the idea of these blog posts is to document best practices which commercial organisations, too, might wish to adopt – and they might have very different opinions on uploading their secret sauce to third-party services, even ones sworn to secrecy by NDA. And even a project determined to make the end result open-source, won’t necessarily be making all their tentative early steps out in the open. Blame Apple, if you like, for that attitude; blame their habit of saying, “By the way, also, this unexpected thing. And you can buy it today!”

1. Installing Laminar

This is the part where it becomes clear that Laminar is quite a niche choice of CI engine. It is packaged for both Debian and Ubuntu, but there is a bug in both the Debian and Ubuntu packages – it’s not upstream, it’s in the Debian patchset – which basically results in nothing working. So you could either follow the suggestions in the upstream bug report of using a third-party Ubuntu PPA or the project’s own binary .deb releases, or you could do what I did and install the broken Ubuntu package anyway (to get the laminar user, the systemd scripts, etc. set up), then build Laminar 1.2 from upstream sources and install it over the top.

Either way, if you navigate to the Laminar web UI (on port 8080 of the server) and see even the word “Laminar” on the page, your installation is working and you’ve avoided the bug.

The default install sets the home directory for the laminar user to /var/lib/laminar; this is the Debian standard for system users, but to make things less weird for some of the tools I’m expecting Laminar to run (e.g., Cargo), I changed it (in /etc/passwd) to be /home/laminar.

2. CI for Chorale, a C++ package

I use Laminar to do homelab builds of Chorale, a C++ project comprising a UPnP media-server, plus some related bits and pieces. For the purposes of this blog post, it’s a fairly typical C++ program with a typical (and hopefully pretty sensible) Autotools-based build system.

Laminar is configured, in solid old-school Unix fashion, by a collection of text files and shell scripts. These all live (in the default configuration) under /var/lib/laminar/cfg, which you should probably chown -R to you and also check into a Git repository, to track changes and keep backups. (The installation process sets up a user laminar, which should remain a no-login user.)

All build jobs execute in a context, which allows sophisticated build setups involving multiple build machines and so on; for my purposes everything executes in a simple context called local:

/var/lib/laminar/cfg/contexts/local.conf
EXECUTORS=1
JOBS=*

This specifies that only one build job can be run at a time (but it can be any job), overriding Laminar’s default context which allows for up to six executors: it’s just a Raspberry Pi, after all, we don’t want to overstress it.

2.1 C++: building a package

When it comes to the build directories for its build jobs, Laminar is much more disciplined (or, again, opinionated) than Jenkins: it keeps a rigid distinction between (a) the build directory itself, which is temporary, (b) a persistent directory shared between runs of the same job, the workspace, and (c) a persistent directory dedicated to each run, the archive. So the usual pattern is to keep the git checkout in the workspace (to save re-downloading the whole repo each time), then each run can do a git pull, make a copy of the sources into the build directory, do the build (leaving the workspace with a clean checkout), and finally store its built artifacts into the archive. All of which is fine except for the very first build, which needs to do the git clone. In Laminar this is dealt with by giving the job an init script (remember to mark it executable!):

/var/lib/laminar/cfg/jobs/chorale.init
#!/bin/bash -xe

git clone /home/peter/git/chorale.git .

as well as its normal run script (which also needs to be marked executable):

/var/lib/laminar/cfg/jobs/chorale.run
#!/bin/bash -xe

(
    flock 200
    cd $WORKSPACE/chorale
    git pull --rebase
    cd -
    cp -al $WORKSPACE/chorale chorale
) 200>$WORKSPACE/lock

cd chorale
libtoolize -c
aclocal -I autotools
autoconf
autoheader
echo timestamp > stamp-h.in
./configure
make -j4 release
cp chorale*.tar.bz2 $ARCHIVE/
laminarc queue chorale-package

There’s a few things going on here, so let’s break it down. The business with flock is a standard way, suggested in Laminar’s own documentation, of arranging that only one job at a time gets to execute the commands inside the parentheses – this isn’t necessarily likely to be an issue, as we’ve set EXECUTORS=1, but git would get in such a pickle if it happened that it’s a sensible precaution anyway. These protected commands update the repository from upstream (here, a git server on the same machine), then copy the sources into the build directory (via hard-linking, cp’s -l, to save time and space).

Once that’s done, we can proceed to do the actual build; the commands from libtoolize as far as make are the standard sequence for bootstrapping an Autotools-based C++ project from bare sources. (It’s not exactly Joel Test #2 compliant, mostly for Autotools reasons, although at least any subsequent builds from the same tree would be single-command.)

Chorale, as is standard for C++ packages, is released and distributed as a source tarball, which in this case is produced by the release target in the Makefile. The final cp command copies this tarball to the Laminar archive directory corresponding to this run of this job. (The archive directory will have a name like /var/lib/laminar/archive/chorale/33, where the “33” is a sequential build number.)

The final command, laminarc for “laminar client”, queues-up the next job in the chain, testing the contents of the Chorale package. (The bash -xe at the top, ensures that if the build process produces any errors, the script will terminate with an error and not get as far as kicking off the test job.)

That’s all that’s needed to set up a simple C++ build job – Laminar doesn’t have any concept of registering or enrolling a job; just the existence of the $JOB.run file is enough for the job to exist. To run it (remembering that the web UI is read-only), execute laminarc queue chorale and you should see the web UI spring into life as the job gets run. Of course, it will fail if any of the prerequisites (gcc, make, autoconf, etc.) are missing from the build machine; add them either manually (sudo apt-get install ...) or perhaps using Chef, Ansible or similar. Once the build succeeds (or fails) you can click around in the web UI to find the logs or perhaps download the finished tarball.

2.2 C++: running tests

The next job in the chain, chorale-package, tests that the packaging process was successful (and didn’t leave out any important files, for instance); it replicates what the user of Chorale would do after downloading a release. This time the run script gets the sources not from git, but from the package created by (the last successful run of) the chorale job, so no init script is needed:

/var/lib/laminar/cfg/jobs/chorale-package.run
#!/bin/bash -xe

PACKAGE=/var/lib/laminar/archive/chorale/latest/chorale-*.tar.bz2
tar xf $PACKAGE

cd chorale*
./configure
make -j4 EXTRA_CCFLAGS=-Werror
make -j4 EXTRA_CCFLAGS=-Werror check
laminarc queue chorale-gcc12 chorale-clang

Like a user of Chorale, the script just untars the package and expects configure and make to work. The build fails if that doesn’t happen. This job also runs Chorale’s unit-tests using make check. This time, we build with the C++ compiler’s -Werror option, to turn all compiler warnings into hard errors which will fail the build.

If everything passes, it’s clear that everything is fine when using the standard Ubuntu C++ compiler. The final two jobs, kicked-off whenever the chorale-package job succeeds, build with alternative compilers just to get a second opinion on the validity of the code (and to avoid unpleasant surprises when the standard compiler is upgraded in subsequent Ubuntu releases):

/var/lib/laminar/cfg/jobs/chorale-gcc12.run
#!/bin/bash -xe

PACKAGE=/var/lib/laminar/archive/chorale/latest/chorale-*.tar.bz2
tar xf $PACKAGE

GCC_FLAGS="-Werror"

cd chorale*
./configure CC=gcc-12 CXX=g++-12
make -j4 CC=gcc-12 EXTRA_CCFLAGS="$GCC_FLAGS"
make -j4 CC=gcc-12 EXTRA_CCFLAGS="$GCC_FLAGS" GCOV="gcov-12" check

New compiler releases sometimes introduce new, useful warnings; this script is a good place to evaluate them before adding them to configure.ac. Similarly, the chorale-clang job checks that the sources compile with Clang, a compiler which has often found issues that G++ misses (and vice versa). Clang also has some useful extra features, the undefined-behaviour sanitiser and address sanitiser, which help to detect code which compiles but then can misbehave at runtime:

/var/lib/laminar/cfg/jobs/chorale-clang.run
#!/bin/bash -xe

PACKAGE=/var/lib/laminar/archive/chorale/latest/chorale-*.tar.bz2
tar xf $PACKAGE

cd chorale*
./configure CC=clang CXX=clang++

# -fsanitize=thread incompatible with gcov
# -fsanitize=memory needs special libc++
#
for SANE in undefined address ; do
    CLANG_FLAGS="-Werror -fsanitize=$SANE -fno-sanitize-recover=all"
    make -j4 CC=clang EXTRA_CCFLAGS="$CLANG_FLAGS"
    make -j4 CC=clang EXTRA_CCFLAGS="$CLANG_FLAGS" GCOV="llvm-cov gcov" tests
    make clean
done

If the Chorale code passes all of these hurdles, then it’s probably about as ready-to-release as it’s possible to programmatically assess.

3. CI for Cotton, a Rust package

All the tools and dependencies required to build a typical C++ package are provided by Ubuntu packages and are system-wide. But Rust’s build system encourages the use of per-user toolchains and utilities (as well as per-project dependencies). So before we do anything else, we need to install Rust for the laminar user – i.e., as the laminar user – which requires a moment’s thought, as we carefully set up laminar to be a no-login user. So we can’t just su to laminar and run rustup-init normally; we have to use su to execute one command at a time from a normal user account.

So start by downloading the right rustup-init binary for your system – here, on a Raspberry Pi, that’s the aarch64-unknown-linux-gnu one. But then execute it (and then use it to download extra toolchains) as the laminar user (bearing in mind that rustup-init’s careful setup of the laminar user’s $PATH will not be in effect):

$
$
$
$
$
$
sudo -u laminar /home/peter/rustup-init
sudo -u laminar /home/laminar/.cargo/bin/rustup toolchain install beta
sudo -u laminar /home/laminar/.cargo/bin/rustup toolchain install nightly
sudo -u laminar /home/laminar/.cargo/bin/rustup toolchain install 1.56
sudo -u laminar /home/laminar/.cargo/bin/rustup +nightly component add llvm-tools-preview
sudo -u laminar /home/laminar/.cargo/bin/cargo install rustfilt

The standard rustup-init installs the stable toolchain, so we just need to add beta, nightly, and 1.56 – that last chosen because it’s Cotton’s “minimum supported Rust version” (MSRV), which in turn was selected because it was the first version to support the 2021 Edition of Rust, and that seemed to be as far back as it was reasonable to go. We also install llvm-tools-preview and rustfilt, which we’ll be using for code-coverage later.

So to the $JOB.run scripts for Cotton. What I did here was notice that I’ve actually got a few different Rust packages to build, and they all need basically the same things doing to them. So I took advantage of the Laminar /var/lib/laminar/cfg/scripts directory, and made all the infrastructure common among all the Rust packages. When running a job, Laminar arranges that the scripts directory is on the shell’s $PATH (and note that it’s in the cfg directory, so will be captured and versioned if you set that up as a Git checkout). This means that, as far as Cotton is concerned – after an init script that’s really just like the C++ one:

/var/lib/laminar/cfg/jobs/cotton.init
#!/bin/bash -xe

git clone /home/peter/git/cotton.git .

– the other build scripts come in pairs: one that’s Cotton-specific but really just runs a shared script which is generic across projects, and then the generic one which does the actual work. We’ll look at the specific one first:

/var/lib/laminar/cfg/jobs/cotton.run
#!/bin/bash -xe

BRANCH=${BRANCH-main}
do-checkout cotton $BRANCH
export LAMINAR_REASON="built $BRANCH"
laminarc queue cotton-doc BRANCH=$BRANCH \
         cotton-grcov BRANCH=$BRANCH \
         cotton-1.56 BRANCH=$BRANCH \
         cotton-beta BRANCH=$BRANCH \
         cotton-nightly BRANCH=$BRANCH

The assignment to BRANCH is a Bash-ism which means, “use the variable $BRANCH if it exists, but if it doesn’t exist, default to main”. This is usually what we want (in particular, a plain laminarc queue cotton will build main), but making it flexible will come in handy later when we build the Git push hook. All the actual building is done by the do-checkout script, and then on success (remembering that bash -xe means the script gets aborted on any failures) we go on to queue all the downstream jobs. Note that when parameterising jobs using laminarc’s VAR=VALUE facility, each VAR applies only to one job, not to all the jobs named.

The do-checkout script is very like the one for Chorale, including the flock arrangement to serialise the git operations, and differing only in that it takes the project and branch to build as command-line parameters – and of course includes the usual Rust build commands instead of the C++/Autotools ones. (This time we can take advantage of rustup-init’s (Cargo’s) $PATH setup, but only if we source the environment file directly.)

/var/lib/laminar/cfg/scripts/do-checkout
#!/bin/bash -xe

PROJECT=$1
BRANCH=$2
# WORKSPACE is predefined by Laminar itself

(
    flock 200
    cd $WORKSPACE/$PROJECT
    git fetch
    git checkout $BRANCH
    git pull --rebase
    cd -
    cp -al $WORKSPACE/$PROJECT $PROJECT
) 200>$WORKSPACE/lock

source $HOME/.cargo/env
rustup default stable

cd $PROJECT
cargo build --all-targets
cargo test

Notice that this job explicitly uses the stable toolchain, minimising the chance of version-to-version breakage. We also want to test on beta, nightly, and MSRV though, which is what three of those downstream jobs are for. Here I’ll just show the setup for nightly, because the other two are exactly analogous. Again there’s a pair of scripts; firstly, there’s the specific one:

/var/lib/laminar/cfg/jobs/cotton-nightly.run
#!/bin/bash -xe

exec do-buildtest cotton nightly ${BRANCH-main}

Really not much to see there. All the work, as before, is done in the generic script, which is parameterised by project and toolchain:

/var/lib/laminar/cfg/scripts/do-buildtest
#!/bin/bash -xe

PROJECT=$1
RUST=$2
BRANCH=$3
SOURCE=/var/lib/laminar/run/$PROJECT/workspace

(
    flock 200
    cd $SOURCE/$PROJECT
    git checkout $BRANCH
    cd -
    cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock

source $HOME/.cargo/env
rustup default $RUST

cd $PROJECT
cargo build --all-targets --offline
cargo test --offline

Here we lock the workspace again, just to avoid any potential clashes with a half-finished git update, but we don’t of course do another git update – we want to build the same version of the code that we just built with stable. For similar reasons, we run Cargo in offline mode, just in case anyone published a newer version of a dependency since we last built.

That’s the cotton-beta, cotton-nightly, and cotton-1.56 downstream jobs dealt with. There are two more: cotton-doc and cotton-grcov, which deal with cargo doc and code coverage respectively. The documentation one is the more straightforward:

/var/lib/laminar/cfg/jobs/cotton-doc.run
#!/bin/bash -xe

exec do-doc cotton ${BRANCH-main}

And even the generic script (parameterised by project) is quite simple:

/var/lib/laminar/cfg/scripts/do-doc
#!/bin/bash -xe

PROJECT=$1
BRANCH=$2
SOURCE=/var/lib/laminar/run/$PROJECT/workspace

(
    flock 200
    cd $SOURCE/$PROJECT
    git checkout $BRANCH
    cd -
    cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock

source $HOME/.cargo/env
rustup default stable

cd $PROJECT
cargo doc --no-deps --offline
cp -a target/doc $ARCHIVE

It much resembles the normal build, except for running cargo doc instead of a normal build. On completion, though, it copies the finished documentation into Laminar’s $ARCHIVE directory, which makes it accessible from Laminar’s web UI afterwards.

The code-coverage scripts are more involved, largely because I couldn’t initially get grcov to work, and ended up switching to using LLVM’s own coverage tools instead. (But the scripts still have “grcov” in the names.) Once more the per-project script is simple:

/var/lib/laminar/cfg/jobs/cotton-grcov.run
#!/bin/bash -xe

exec do-grcov cotton ${BRANCH-main}

And the generic script does the bulk of it (I cribbed this recipe from the rustc book, q.v.; I didn’t come up with it all myself):

/var/lib/laminar/cfg/scripts/do-grcov
#!/bin/bash -xe

PROJECT=$1
BRANCH=$2
SOURCE=/var/lib/laminar/run/$PROJECT/workspace

(
    flock 200
    cd $SOURCE/$PROJECT
    git checkout $BRANCH
    cd -
    cp -al $SOURCE/$PROJECT $PROJECT
) 200>$SOURCE/lock

source $HOME/.cargo/env
rustup default nightly

cd $PROJECT

export RUSTFLAGS="-Cinstrument-coverage"
export LLVM_PROFILE_FILE="$PROJECT-%p-%m.profraw"
cargo test --offline --lib
rustup run nightly llvm-profdata merge -sparse `find . -name '*.profraw'` -o cotton.profdata
rustup run nightly llvm-cov show \
    $( \
      for file in \
        $( \
            cargo test --offline --lib --no-run --message-format=json \
              | jq -r "select(.profile.test == true) | .filenames[]" \
              | grep -v dSYM - \
        ); \
      do \
        printf "%s %s " -object $file; \
      done \
    ) \
  --instr-profile=cotton.profdata --format=html --output-dir=$ARCHIVE \
  --show-line-counts-or-regions --ignore-filename-regex='/.cargo/' \
  --ignore-filename-regex='rustc/'

Honestly? Bit of a mouthful. But it does the job. Notice that the output directory is set to Laminar’s $ARCHIVE directory so that, again, the results are viewable through Laminar’s web UI. (Rust profiling doesn’t produce branch coverage as such, but “Region coverage” – which counts what a compiler would call basic blocks – amounts to much the same thing in practice.) The results will look a bit like this:

Why yes, that is very good coverage, thank you for noticing!

4. Setting up Git to build on push

So far in our CI journey, we have plenty of integration, but it’s not very continuous. What’s needed is for all this mechanism to swing into action every time new code is pushed to the (on-prem) Git repositories for Chorale or Cotton.

Fortunately, this is quite straightforward – or, at least, good inspiration is available online. Pushes to the Git repository for Cotton can be hooked by adding a script as hooks/post-receive under the Git server’s cotton.git directory (the hooks directory is probably already there). In one of those Git features that at first makes you think, “this is a bit over-engineered”, but then makes you realise, “wait, this couldn’t actually be made any simpler while still working in full generality”, the Git server passes to this script, on its standard input, a line for every Git “ref” being pushed – for these purposes, refs are mostly branches – along with the Git revisions at the old tip and new tip of the branch.

Laminar comes with an example hook which builds every commit on every branch pushed. I admire this but don’t follow it; it’s great for preserving bisectability, but seems like it would lead to a lot of interactive rebasing every time a feature branch is rebased on a later main – not to mention a lot of building by the CI server. So the hook I actually use just builds the tip of each branch:

git/cotton.git/hooks/post-receive
#!/bin/bash -ex

while read oldrev newrev ref
do
    if [ "${ref:0:11}" == "refs/heads/" ];
    then
     export BRANCH=${ref:11}
     export LAMINAR_REASON="git push $BRANCH"
     laminarc queue cotton BRANCH=$BRANCH
    fi
done

The LAMINAR_REASON appears in the web UI and indicates which branch each run is building:

5. CI for rustup

The final piece of the puzzle, at least for today, is the continuous integration of Rust itself. As new nightlies, and betas, and even stable toolchains come out, I’d like it to be a computer’s job, not that of a person, to rebuild everything with the new version. (Especially if that person would be me.)

This too, however, is straightforward with all the infrastructure put in place by the rest of this blog post. All that’s needed is a new job file which runs rustup-update:

/var/lib/laminar/cfg/jobs/rustup-update.run
#!/bin/bash -ex

export LAMINAR_REASON="rustup update"
source $HOME/.cargo/env
rustup update
laminarc queue cotton assay sparkle

The rustup update command updates all the toolchains; once that is done, the script queues-up builds of all the Rust packages I build. I schedule a weekly build in the small hours of Thursday morning, using cron:

edit crontab using “crontab -e”
0 3 * * 4 LAMINAR_REASON="Weekly rustup" laminarc queue rustup-update

With a bit of luck, this means that by the time I sit down at my desk on Thursday morning, all the jobs have run and Laminar is showing a clean bill of health. As I’ve been playing with Rust for quite a long elapsed time, but really only taking it seriously in quite occasional bursts of energy, having everything kept up-to-date automatically during periods when I’m distracted by a different shiny thing is a real pleasure.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.