Previously on #rust:
Chorale never got much interest (or, as the tagline on this very blog puts it, “Waits for audience applause ... not a sossinge”); this is no doubt partly due to (closed) cloud streaming platforms for music hugely overtaking in popularity all dealings with (open) local media libraries, but no doubt also due to the completely zero amount of effort I actually spent promoting Chorale (or even, really, mentioning its existence).
So I’ve resolved to be a bit more talkative about open-source things I do from here on in. Inspired by a post by Simon Willison, I’m trying to adopt his mantra that anything you do, at least in the software world, isn’t finished until you’ve told people that you’ve done it – especially if part of the reason you’re doing it in the first place, is in the hope of spreading useful information and good practices.
In this post I’ll tell you what I did, when I went to implement one of the first, lowest-level parts of the UPnP protocol suite in Rust: SSDP. This is the backstory of the cotton-ssdp crate.
Background: a Simple Service Discovery Protocol
SSDP, the Simple Service Discovery Protocol, is aimed at making it easy for separately-purchased and separately-designed devices to discover each others’ resources over a local network, so that they can later interact using higher-level protocols such as UPnP itself. A resource might be a streaming-media server, or a router, or a network printer, or anything else that someone might want to search for or enumerate on a network. Like most protocols, there are two basic rôles involved: a server advertises the availability of a resource, and a client discovers available resources.
What precisely is advertised, or discovered, is, for each resource, a unique identifier for that particular resource (Unique Service Name, USN), an identifier for the type of resource (Notification Type, NT), and the location of the resource in the form of a URL.
SSDP is mainly used by UPnP systems, such as for media libraries and local streaming of music and video – it is how diskless media players such as the Roku Soundbridge discover the available music servers – but the mechanism is quite generic, and could as easily be used for any type of device or resource that must be discoverable over a network. Effectively it enables the automatic self-assembly of (locally-) distributed systems, including in ad hoc settings which don’t necessarily have expert network administrators close at hand to configure them manually.
For a learning project in Rust, SSDP is ideal as it’s about as simple as a network protocol could possibly be without being uninteresting. SSDP servers send out advertisements over UDP multicast to a “well-known” multicast group, which can be picked up passively by clients in that group; and clients can also actively send out search requests, also over UDP multicast, to which servers will respond.
There is no Internet RFC as such for SSDP – merely some expired drafts. The protocol is, instead, documented in the UPnP Device Architecture documents.
SSDP is very similar in motivation, and (it turns out) fairly similar in implementation, to the Multicast DNS (mDNS) protocol – but is not directly compatible.
Particular goals of the cotton-ssdp crate
In no specific order:
- Produce a useful SSDP implementation, which other Rust users who need SSDP might reasonably hear about and decide to use;
- Personally learn more about Rust and about the packaging, distributing, and publicising of Rust crates;
- Implement both client and server;
- Support use both in full-size/desktop/hosted environments, and in embedded/freestanding/no_std environments;
- Do the Right Thing on multi-homed hosts (ones with multiple active network interfaces), which were the source of a lot of complexity in Chorale and in proprietary network protocols which I’ve worked on in my day job;
- Follow, and indeed embody, if not evangelise, software engineering best practices: lots of documentation, lots of tests.
Things I learned about
- Effective techniques for getting good test coverage
- Automate your quality checks as far as possible
- Not picking a lane when it comes to third-party ecosystems
- Effective techniques for multi-homed hosts
- The journey to no_std
- Using the remarkable free tools available
1. Effective techniques for getting good test coverage
Coming from a C++ background, I was very impressed with the “batteries-included” nature of the Rust (Cargo) test setup. It’s as easy to start writing tests as it is to start writing the code in the first place – which is a Good Thing, as it means that every project can start out on the right side of history by including good testing from the very beginning.
But even here, there are better and worse ways to write tests, and it’s worth learning how to structure your tests in order to be most effective. As well as including the test harness itself, the Rust/Cargo system has baked into it the difference between unit tests and integration tests: outside the Rust world, this is a distinction about which philosophical debates can be had, but in Rust these two terms refer to two very specific and distinct things. When I started out in Rust, I was bundling all the tests into a mod tests in the same .rs file as the implementation: these are Rust’s unit tests. And I soon got in a pickle, because on the one hand I wanted to mock out (or dependency-inject) fake versions of some of the dependencies (specifically, system calls such as setsockopt whose failure is hard to arrange for in a test, but needs to be handled if it does happen in reality), but on the other hand I didn’t want the hassle of reimplementing the actual functionality of the real calls when testing the success case. The Rust mocking library mockall, for instance, replaces the real dependency completely when compiled in cfg(test) mode, as is done when compiling unit tests.
But Rust was way ahead of me here, and had already solved that problem. Tests for the success case – tests which can be written against the public API of the library – can be written as Rust integration tests, in the crate’s tests directory. Those are linked against the real version of the crate, compiled without cfg(test), so they get the actual, non-mock dependencies.
(As it stands, cotton-ssdp does not use mockall, but that’s because I had difficulty with “false negatives” in code-coverage, where it said not all regions were covered, but then didn’t show any actually non-covered regions when drilling-down into the source file. I suspect, but can’t prove, that this is because the coverage tool gets confused when a piece of code is monomorphised twice, once for real and once for mock, and can’t always correctly match up equivalent regions between the two.)
1.1 An example of dependency injection
Here’s the code I first wrote for creating a new UDP socket and setting the particular socket options that cotton-ssdp will need:
fn new_socket(port: u16) -> Result<mio::net::UdpSocket, std::io::Error> { let socket = socket2::Socket::new( socket2::Domain::IPV4, socket2::Type::DGRAM, None, )?; socket.set_nonblocking(true)?; socket.set_reuse_address(true)?; let addr = std::net::SocketAddrV4::new(std::net::Ipv4Addr::UNSPECIFIED, port); socket.bind(&socket2::SockAddr::from(addr))?; setsockopt(socket.as_raw_fd(), Ipv4PacketInfo, &true)?; let socket: std::net::UdpSocket = socket.into(); Ok(mio::net::UdpSocket::from_std(socket)) } |
Now how does one test the error code paths from set_nonblocking and friends – all those question-marks? Well, one way is to use dependency injection – that is, to arrange that in unit-test builds, a “fake” version of the system-call is made (one which, for testing purposes, can be arranged to fail on demand), yet in the finished crate the actual, non-artificially-failing system-call is of course used. That change looks like this:
type NewSocketFn = fn() -> std::io::Result<socket2::Socket>; type SockoptFn = fn(&socket2::Socket, bool) -> std::io::Result<()>; type RawSockoptFn = fn(&socket2::Socket, bool) -> Result<(), nix::errno::Errno>; type BindFn = fn(&socket2::Socket, std::net::SocketAddrV4) -> std::io::Result<()>; fn new_socket_inner( port: u16, new_socket: NewSocketFn, nonblocking: SockoptFn, reuse_address: SockoptFn, bind: BindFn, ipv4_packetinfo: RawSockoptFn, ) -> std::io::Result<mio::net::UdpSocket> { let socket = new_socket()?; nonblocking(&socket, true)?; reuse_address(&socket, true)?; bind( &socket, std::net::SocketAddrV4::new(std::net::Ipv4Addr::UNSPECIFIED, port), )?; ipv4_packetinfo(&socket, true)?; Ok(mio::net::UdpSocket::from_std(socket.into())) } fn new_socket(port: u16) -> Result<mio::net::UdpSocket, std::io::Error> { new_socket_inner( port, || { socket2::Socket::new( socket2::Domain::IPV4, socket2::Type::DGRAM, None, ) }, socket2::Socket::set_nonblocking, socket2::Socket::set_reuse_address, |s, a| s.bind(&socket2::SockAddr::from(a)), |s, b| setsockopt(s.as_raw_fd(), Ipv4PacketInfo, &b), ) } |
Now all the tests can call new_socket_inner, passing versions of new_socket, nonblocking, and so on which are as fallible as can be – but users of the real library are non the wiser; indeed, the API they see has not changed. (The actual tests are visible in commit d6cf9af2; they subsequently moved around a bit as later refactorings were done.)
This is also a standard dependency-injection technique in C++, but one of the hidden wonders of Rust is that the language design allows for a compiler to make lots of optimisations that, through technicalities, might be harder to prove correct in C++ (it’s not like I’ve quizzed the designers on whether this was an explicit goal of the design – but things have worked out that way so often, that really it must have been). The standard compiler is very good at optimising away abstractions like new_socket_inner: it spots that (in non-cfg(test) builds) the only call site of new_socket_inner is in new_socket and it inlines it completely; it also inlines all the lambdas (closures) and turns them back into direct function calls. In this case I checked, and the two versions of the code shown above produce identical assembler: a literally zero-cost abstraction.
Good code coverage is not, of course, any indicator of good code or even of good testing. But poor code coverage is very much an indicator of poor testing – so, much like a compiler warning, it’s always something worth looking at just in case it indicates a real underlying problem of poor code.
2. Automate your quality checks as far as possible
Engineering is the art of the repeatable; art is the engineering of the unrepeatable. Or, put differently: if it costs you actual human effort to make sure an engineered product, such as a particular version of a piece of software, is proper and correct, then that effort itself is, in a sense, written on the wind – as it tells no-one anything about whether any other versions of that software are proper or correct. So effort is usually much better applied on automating quality-assurance work than it is on performing (rote) quality-assurance work; it’s both dehumanising and non-cost-effective to send a person to do the job of a machine. (Though of course some quality-assurance work isn’t automatable: such work is often known as exploratory testing and works best when viewed as an exercise in creatively finding corner-cases or oversights in the automated test suite, which can then be automated in their turn.)
But there’s more to quality-assurance than testing. It’s also a discipline of quality-assurance to run code through analysis tools, to pick up on any potential errors. In the C++ world it’s long been received wisdom to enable most or all available compiler warnings, across multiple alternative C++ compilers if possible: the small iatrogenic downside of avoiding false positives from compiler warnings, is outweighed by the benefits accrued to the project each time a warning pinpoints an actual mistake. In Rust, compiler warnings are usually very good and actionable (and peculiar idioms for avoiding them seem rare), and Clippy lints even more so. The “cargo clippy” command could equally aptly have been called “cargo clear-up-my-misconceptions-about-how-this-language-works”, although “clippy” is definitely easier to type.
The principle of “the more automation the better” applies of course to continuous integration (automatic building and testing) too. An earlier post on this blog described how to set up a local CI system, though for packages that aren’t aimed at embedded systems, CI services are also available in the cloud (and some SaaS engineers speak as if the term “CI” implies a cloud-based service, having seen no other design). Projects hosted in public Github repositories can take advantage of Github’s CI infrastructure (and should do, even if only as a quality filter on incoming pull requests); more on that later on.
Once a crate’s functionality starts to be split up into Cargo “features”, configuring CI to test all feature combinations can be a chore. The cargo-all-features crate can help with this, by running “cargo build” (or check, or test) on all combinations of features in one command. This also helps to surface any issues where features are interdependent, but where that interdependency isn’t spelled out in Cargo.toml. (For instance, in cotton-ssdp, the “async” feature doesn’t compile without the “std” feature enabled, so Cargo.toml specifies that dependency.) Running “cargo test-all-features” takes, in theory, exponential time in the number of features – but, at least in cotton-ssdp’s case, the great bulk of the work is done just once, and the 2n-1 additional builds are all very fast.
At the top of cotton-ssdp’s lib.rs are the lines
#![warn(missing_docs)] #![warn(rustdoc::missing_crate_level_docs)] |
#![cfg_attr(nightly, feature(doc_auto_cfg))] #![cfg_attr(nightly, feature(doc_cfg_hide))] #![cfg_attr(nightly, doc(cfg_hide(doc)))] |
One last quality check, if you can call it that, is provided by Rust’s borrow-checker and ownership rules. A common refrain from Rust developers is, “Oh, I’m currently fighting the borrow-checker”. But what this reminds me of most, is an anecdote about a famous bull-fighter being asked what exercises he does to keep in shape for his toreadoring: “Exercises? There is something you don’t understand, my friend. I don’t wrestle the bull.” The anecdote appears in DeMarco and Lister’s “Peopleware” (and, slightly worryingly, apparently nowhere else on the whole of Google) – where the authors use it to suggest that if some activity feels like you’re wrestling a bull, you’re probably not going about it in the most efficient or masterful way. Hand-to-hand combat with the borrow-checker can sometimes be reframed as not heeding its suggestion that the design you’re pursuing could be done differently with more clarity around memory-safety, and thus better reliability or maintainability. The borrow-checker is like a compiler warning, but for your design instead of your implementation, and it’s actionable in the same way: it’s telling you that you might be able to barrel on regardless, but you should really stop and think about whether you’re doing things the best way.
Though admittedly, by that theory, it’s a shame that the borrow-checker only offers that suggestion at quite a late stage of the design process (viz., during actual coding); it’s a bit like being told, “You can’t lay that next brick there.” – “Why not?” – “Because the house that you’re building is going to be ugly.” It would have been better if the ugliness of the house had been spotted at the blueprint stage! There’s a gap in the market, and indeed in the discourse as a whole, for a book on Rust design patterns which would help stave off these issues at an earlier conceptual stage.
3. Not picking a lane when it comes to third-party ecosystems
There are two, not very compatible, ways of writing Rust networking code: synchronous and asynchronous. When hoping to write a library that’s useful to people (one of the stated goals), it makes most sense to me to try and be useful on both sides of this divide. After all, SSDP itself can be boiled down to, packets arrive, other packets go out in response (or in response to a timer expiring). That’s a usefully narrow-sounding interface. So what cotton-ssdp does, is hide all the actual SSDP work away in a struct called Engine, and offer two as-thin-as-possible wrappers over it: one for the synchronous (mio) case (Service) and a second for the asynchronous (tokio) case (AsyncService).
This also involved abstracting over the difference between tokio::net::UdpSocket and mio::net::UdpSocket, using traits which are implemented for both and over which the relevant methods of Engine are generic:
/// Notify the `Engine` that data is ready on one of its sockets pub fn on_data<SCK: udp::TargetedSend + udp::Multicast>( &mut self, buf: &[u8], socket: &SCK, wasto: IpAddr, wasfrom: SocketAddr, ) |
Both the Engine and UDP abstractions will hopefully also turn out to be exactly what’s needed when porting SSDP to embedded platforms using smoltcp or a similar embedded networking stack. (So in fact, it’s not quite right to say that the SSDP implementation is “hidden” away in Engine – Engine is part of the public library API.)
4. Effective techniques for multi-homed hosts
Networking applications that run on multi-homed hosts – machines with more than one TCP/IP-capable network interface – often need to take particular care over which networks services are offered to: it’s commonplace for a host such as a firewall to have interfaces onto networks with different security policies. (This is so rarely dealt with well, that it’s also commonplace for such setups to resort to containerisation to ensure that certain services are only offered on certain interfaces – which does have the advantage that all such containers or services can then have their network access configured in the same way, rather than with service-specific ways.)
But supporting multi-homed hosts well in a networking server isn’t a huge challenge, so long as it’s thought-about early in the design process. The fundamental idea is always to keep track of which network interface is being used to talk to any given client; Linux, for instance, assigns long-lived “interface indexes” (smallish integers) to each available hardware interface. Knowing which interface packets arrived on (especially in the UDP case) gives the server enough information to know how to send its reply – and in the particular case of a discovery protocol such as SSDP, can inform the content of the reply.
For example, if the SSDP resource being advertised is, from the server’s point-of-view, http://127.0.0.1:3333/test, and its Ethernet IP address is 192.168.1.3, and its wifi IP address is 10.0.4.7, then anyone listening to SSDP on Ethernet should be told that the resource is at http://192.168.1.3:3333/test and anyone listening on wifi should instead be told http://10.0.4.7:3333/test.
In a UDPv4 service such as SSDP, the Linux IP_PKTINFO facility can be used to obtain, at recvmsg time, the interface index on which a packet arrived; similarly, sendmsg can be told which interface to send on. (Fortunately, unlike SO_BINDTODEVICE, this does not require CAP_SYS_RAW, although that restriction on SO_BINDTODEVICE was lifted in kernel 5.7.) The SSDP server can then determine which of the host’s IP addresses that particular client should use, and rewrite the resource URL accordingly when replying.
With care, this can even be used to do the Right Thing when more than one network interface has the same IP network and netmask, just so long as they still have different IP addresses – a situation which is particularly likely to arise if RFC3927 “Zeroconf” networking is in use on multiple physical networks.
Although cotton-ssdp is currently IPv4-only, it appears that IPv6 has noticed and ameliorated the same problem, with APIs taking IPv6 link-local addresses also taking a “Scope ID” which is intended to be used very similarly to this use of the interface-index.
(Every time I see “multi-homed” written down, even if I wrote it myself, I read it the first time as “multi-horned”, which is a metaphor so much more apposite that it makes me wonder whether “homed” was just a typo that stuck.)
5. The journey to no_std
Rust draws a formal distinction between (what in the C world are called) hosted and freestanding implementations: hosted implementations are servers, desktops, and mobile platforms with the usual wide range of standard library facilities, and freestanding implementations are embedded systems where even standard library facilities are not necessarily available. (In fact C draws this distinction too, but as the standard C library is itself so low-level, only the most deeply-embedded systems are freestanding by C’s definition.)
Rust target platforms indicate their freestanding nature with “-none-” as the OS part of their target triple, such as in “thumbv7em-none-eabi”; Rust crates targeting such platforms indicate that by a crate-level #![no_std] attribute. In a no_std crate, much of the standard library is still available (very core parts such as Result and Option are in a crate called core, which works even if no heap is available; facilities that require an allocator live in a second crate called alloc), but anything which requires system-calls is not available.
As cotton-ssdp is aimed at both hosted and embedded users, it needs to be possible (but optional) to compile it in no_std mode. The standard idiom for this, is to have a (Cargo) feature called std, which is usually enabled by default, but can be disabled to produce a no_std-compatible crate. This is taken care of in the code by adding, at the top of lib.rs, the line...
#![cfg_attr(not(feature = "std"), no_std)]
...and then in Cargo.toml, declaring std as a feature and making any non-no_std-compatible dependencies optional and dependent on std; the Cargo.toml changes are in commit 47a0290d, which also marks the examples and integration-tests as depending on std.
The issues that might arise when enabling no_std mode in a crate revolve around: system types whose names (libraries) are different; system types which aren’t available at all; dependencies which themselves need to have their no_std build enabled; and dependencies which have no no_std mode, which must be either encapsualated away or worked around altogether. All of those issues cropped up in cotton-ssdp.
The entire journey to no_std support can be seen in the pull request, though it might look a bit odd because in fact I only created the PR retrospectively, after the branch was already merged, for the purposes of this blog post; that seemed the easiest way to get all the commits listed in one place. The eleven commits on that branch are (in chronological order):
- 364e7c5 Split UDP implementations into core, std, mio, tokio – once again, something seemed a bit tiresome (annotating everything in udp.rs with cfg(feature = "std")) but that tiresomeness actually represented a “compiler warning” about the design; the UDP traits belong in a separate source file to the std-using implementations, and they in turn belong in separate source files to each other. Now only the mod declarations need to have cfg annotations attached!
- e590562 Make message.rs no_std-compatible – some of the std facilities previously used here are available in core or alloc, but HashMap is not (because it relies on a random number generator), and nor is io::Cursor (as it uses io::Error, which is in std because it needs to be OS-specific in order to represent system-call errors). Fortunately, reimplementing Cursor from scratch is not hard.
- 0380edf Make UDP errors more ergonomic (and also more no_std-compatible) – rather than forcing SSDP errors into the io::Error template, use a custom error class (preparatory to cfg-ing out io::Error altogether). This actually makes the crate’s errors easier to use, by following the suggestions in in a recent blog post I read about modularising errors.
- 47a0290 Featureise std, sync, and async – this is the above-mentioned commit which actually introduces the cargo-level features. As well as enabling no_std, making “async” a feature means that clients who need only the synchronous version, do not need to pull in the sizeable tokio dependencies. This commit also adds the no_std_net crate, which usefully provides the common IP address types in no_std builds, while being a pass-through for the standard-library ones in std builds. (A future release of Rust will move those types to core, but I didn’t want to wait.)
- fc344f0 Update dependencies – fixes test breakage from the previous commit, as cotton-ssdp’s tests require cotton-netif’s sync feature, not just its std one. Fortunately, using cargo test-all-features made this issue show up straightaway.
- 0005fe0 Factor RefreshTimer out of Engine – it’s desirable for Engine to compile in no_std mode, because that’s intended to be the basis even of embedded implementations of SSDP. But Engine used std::time::Instant, which (especially in the now method) relies on system-specific clock facilities (std::time::Duration does not rely on those facilities, and is available in no_std builds as core::time::Duration). In practice, most embedded platforms support some variant of rtic_monotonic::Monotonic for clock facilities (a list of implementations of that trait for common microcontrollers can be found on crates.io), but abstracting over both that and the standard library felt like an abstraction too far, so instead this commit moves the whole refresh-timer mechanism out into a separate struct, so that an embedded implementation depending on Monotonic can be provided later and just slotted-in.
- 6899933 Actually set no_std for non-feature-"std" – this is where the crate is first built with no_std set, which necessitated use of the alloc crate to find String and Format, as well as cfg’ing off some uses of std that couldn’t be ported. In particular, the variant of the udp::Error enum which wraps an underlying system-call error is not, of course, no_std. The [#non_exhaustive] tag tells the Rust compiler that more variants (perhaps one encapsulating a smoltcp error) may be added in future.
- 5ac330e Remove "url" crate from public API – this crate, a hard dependency used within Engine (for rewriting resource URLs on a per-IP-address basis, see above) which does not compile under no_std. Fortunately only a tiny fraction of the crate’s functionality is actually used in cotton-ssdp – functionality which can, as the least-worst option, just be reimplemented from scratch. But the first thing to do is eliminate the url::Url type from the public interface; this is a breaking change which would cause a semver major change, if it weren’t for the fact that the semver major already needed changing due to other API changes since 0.0.1.
- 38694d1 Remove usage of "url" crate – this is where the url.set_ip_host call is replaced with a simpler equivalent. Notably this new code has a bug, only fixed later in b92ded1, which causes it to do the Wrong Thing with any URL containing an explicit port (such as, er, the example ones discussed above where the need for the rewriting is explained).
- 261a6bb Replace HashMap usage with BTreeMap – as mentioned, HashMap is not available in no_std, but BTreeMap is, and has essentially the same interface.
- 0faacd5 Actually compile ssdp including Engine with no_std – finally cargo build --no-default-features works (including Engine and Message). Moreover, so does cargo build --no-default-features --target thumbv7em-none-eabi – that’s a freestanding target with no std support whatsoever.
- b92ded1 Fix rewrite_host bogusly stripping ":port" from URLs – fixes a bug introduced when removing the “url” crate. It seems pretty likely that the only forms of URL that make any sense as SSDP locations start with http://127.0.0.1..., not even HTTPS – but the Principle of Least Surprise suggests that the code should at least not flake out if given one in a different form.
Between them, those commits hopefully tell the whole story of taking a crate initially written in hosted Rust, and making it work in both hosted and freestanding applications.
6. Using the remarkable free tools available
Most of the anecdotes in this post are hopefully applicable to both proprietary and open-source software development. But in the open-source case in particular, it’s worth looking into the freely-available online tools that support such development efforts. Even getting started in Rust development introduces everyone to crates.io and to docs.rs, but it’s also worth looking into:
- Github Actions cloud CI
- Continuous-integration servers run by Github, and automatically triggered on pushes to Github repositories (if your project contains YAML files for workflows, perhaps a bit like cotton’s one).
- docs.rs
- Documentation pages on docs.rs are built
automatically for all crates published to crates.io. The
build is always done with a nightly toolchain, so you can use
nightly-only features such as
doc_auto_cfg – just make sure you leave your crate
buildable by stable and other non-nightly toolchains, perhaps by
guarding the doc_auto_cfg attribute like this:
#![cfg_attr(nightly, feature(doc_auto_cfg))]
[package.metadata.docs.rs] all-features = true rustdoc-args = ["--cfg", "nightly"]
- deps.rs
- Checks the dependencies in your Cargo.toml, and warns you about out-of-date versions (and warns you strongly about ones with known security problems). It’s wired up to Github, so can be used on any project published there. (For Cotton, that means there’s one page for the whole monorepo, not one page per crate, but it’s subdivided into the dependencies of each crate.)
- codecov.io
- Code-coverage as a service. You have to sign up for an account, but it’s free for public repositories. Support can be incorporated into Github CI workflows, which is how it works for cotton-ssdp.
The neat thing is that many of these services provide “badges”: little status icons that you can add to README.md on Github, or to any web page. The statuses below aren’t just a pretty picture: each one is the live status, on most of them you can click-through to the underlying data, and as you’re reading them right now, you might be noticing an issue before I do: