Not in fact any relation to the famous large Greek meal of the same name.

Monday, 2 December 2024

Things I learned writing a USB host stack in Rust for RP2040

Previously on #rust:

     
“I have of late – but wherefore I know not –” had it occur to me to get to know USB a bit better by implementing USB host from scratch. At a former employer, where an embedded USB host stack was going to be needed, I remember one grizzled developer saying to me, “Why are we licensing one in? Just read the spec and write one!” The company licensed one in (briefly, before canning the entire project) and, speaking as someone who has, over the past month or so, just read the spec and written one: they were probably right to do so.

But for learning about USB – or, put differently, when it’s the journey that’s important, not the destination – writing a USB host stack turns out to be not a completely ludicrous thing to attempt, especially with simplifying assumptions such as USB1.1 and 2.0 only, and no power-delivery or battery-charging. This blog post is the origin story of the cotton-usb-host crate, and details some of the things I learned when writing it.

Wiring-up async/await to IRQ handlers

I won’t repeat here the good bits of this excellent article which finally cleared up for me how the very high-level concept of awaiting on a Future meets up with the very low-level concept of an IRQ handler (and which you should just read), but I’ll give the gist of it.

A Future in Rust gets polled, and when that happens it can either detect that the thing it’s waiting for has happened – and return Ready – or it can return Pending. The key thing it must do, is that if it does return Pending, it needs to set in motion events that will cause the calling task to be reawoken when the Future’s status has changed. Most Future tutorials deal with networking servers of various sorts, where the task gets reawoken when a file descriptor becomes readable; in the embedded world, things are a bit lower-level, and the impetus to resume probably comes in the form of a hardware interrupt, handled by an IRQ handler routine.

The way that these are tied together, is that Future::poll gets passed a Context object, which contains a Waker object. Waker is Clone, Send, and Sync, so it can be freely passed through the system to where it’s needed: in this case, most likely stored in a static variable so that it can be accessed from the IRQ handler. So this is exactly what the Future::poll method for, say, Rp2040ControlEndpoint does. The RTIC 2 framework provides a wordily-named but vital component here, CriticalSectionWakerRegistration. This object is capable of safely storing a Waker using internal mutability, making it ideal for keeping in a global static. (It’s also completely independent of the rest of RTIC, so it’s just as usable in Embassy-based applications.)

A bit of care is needed to avoid race conditions – interrupts slipping through the gaps and going unanswered – but the general pattern is:

fn poll() {
    if ...hardware condition is met... {
        Poll::Ready(result)
    } else {
        ...store the waker away...
        ...ensure the interrupt is enabled...
        Poll::Pending
    }
}

Then all the IRQ handler has to do, is figure out (by inspecting peripheral registers) exactly what caused the IRQ, i.e. which of the various successes and error conditions has occurred, and call wake() on the correct Waker. In cotton-usb-host, you can see this mechanism in action in UsbShared::on_irq().

Again a small amount of care is needed to avoid both lost interrupts (when the interrupt is accidentally not enabled at the time it actually occurs) and interrupt storms (when the IRQ handler doesn’t acknowledge or clear the hardware interrupt, so it’s immediately re-entered and nothing gets done). It’s vital that the IRQ handler, and not Future::poll(), acknowledges the interrupt: Future::poll() is not run in IRQ context, but in normal thread context, and if the IRQ handler keeps getting re-entered then Future::poll() will never get to run! The interrupt handler for RP2040 USB does its best to acknowledge interrupts without disabling them, but for those which canot be acknowledged without consuming important data, it just disables them in the interrupt-enable register and relies on the corresponding Future::poll() to re-enable the ones it needs, as in this pattern:

fn irq_handler() {
    if ...hardware condition is met... {
        waker.wake();
        ...acknowledge interrupt if possible...
    }
    ...disable any interrupts that are still pending...
}

Unit tests of async code needn’t themselves be async

This is something I wish I’d learned before writing some of the cotton-netif and cotton-ssdp unit tests. Asynchronous Rust exhibits something a bit like wave/particle duality, which in this case might be called async/poll duality. (The duality is so close that you can define async methods in traits, and then in the impl block write them as normal functions returning impl Future – or vice versa.) You can unit-test asynchronous code “in async-land” with the #[tokio::test] attribute and explicit awaits, but you can also test it “in poll-land” (biało czerwoni 🇵🇱) by observing that under the hood, the result of making a non-async call to an async function isn’t a compiler error or any sort of magic woo, it’s just a Future (or perhaps a Stream), and you can quite happily call the async function from non-async code and then just poll the future manually to drive it through its states to complete the test.

Typically, to get good unit-test coverage, unit tests of async code come in threes: the case where poll succeeds, the case where it fails, and the case where it pends. But all three can be tested in non-async tests; here’s a unit test from UsbBus that checks that UsbBus::set_address returns pending when the underlying USB transfer returns pending:

#[test]
fn set_address_pends() {
    do_test(
        |hc| {
            hc.expect_control_transfer()
                .times(1)
                .withf(is_set_address::<5>)
                .returning(control_transfer_pending);
        },
        |mut f| {
            let mut fut = pin!(f.bus.set_address(unaddressed_device(), 5));
            let poll = fut.as_mut().poll(&mut f.c);
            assert!(poll.is_pending());
            let poll = fut.as_mut().poll(&mut f.c);
            assert!(poll.is_pending());
        }
    );
}

I should mention do_test(): when writing the unit-tests for UsbBus (there’s about 100), I found there was a lot of boilerplate which kept getting repeated. So I abstracted it away into a function, leaving two closures to represent the unique bits of each test: one to set up the expectations (using the excellent Mockall crate), and then one to run the test (using a Fixture struct containing everything it might need) and verify the outcomes:

fn do_test<
    SetupFn: FnMut(&mut MockHostControllerInner),
    TestFn: FnMut(Fixture),
>(
    mut setup: SetupFn,
    mut test: TestFn,
) {
    let w = Waker::from(Arc::new(NoOpWaker));
    let mut c = core::task::Context::from_waker(&w);

    let mut hc = MockHostController::default();

    setup(&mut hc.inner);

    let f = Fixture {
        c: &mut c,
        hub_state: HubState::default(),
        bus: UsbBus::new(hc),
    };

    test(f);
}

This cleaned the tests up so much, that I’ve started adopting this pattern in other crates too.

Rust unit tests don’t have to be in the same file

The central UsbBus structure in cotton-usb-host, has about 700 lines of code in its implementation (according to test coverage); it also has about 3,500 lines of unit tests. Keeping both in the same file, which is a pattern encouraged by Rust, made it unwieldy to read the implementation or to search for things in it. But that pattern isn’t compulsory: the tests need to be a submodule of the usb_bus module (in order to have white-box-test access to the innards of UsbBus), but submodules can always be in separate files. The alternative pattern, which I adopted, is to put the tests – not just for usb_bus.rs, but for every module – in a tests/ subdirectory; the file src/usb_bus.rs ends like this:

#[cfg(all(test, feature = "std"))]
#[path = "tests/usb_bus.rs"]
mod tests;
and all the unit tests are in src/tests/usb_bus.rs.

This has the secondary advantage that, for some reason, code-coverage tools ignore the coverage of files in a tests directory – meaning that you don’t get annoying region-coverage misses for test code (which you don’t care about) bringing down your average for production code (which you do care about).

USB is a bus, and USB hubs are hubs (not switches)

Sometimes the clue is right there in the name. But, being a perennial encapsulator, an ardent hider of messy details from client code, I’d initially hoped to implement USB hubs as self-contained entities that each operate independently from the rest of the bus.

But that’s just not how USB works. In particular, when a new USB device is detected and in the Default state, it’s listening on address zero (and is surely expecting to immediately receive a SetAddress request to give it a proper unique address in the range 1-127). And, well, USB is a bus. So (give or take some considerations about operating at different speeds), all devices receive all packets and simply ignore the ones that aren’t addressed to them. Which means that only one device on the whole bus can be in the Default state (listening as device zero) at any one time – which, in turn, means there needs to be a bus-wide hub state machine that ensures that no port on any hub can be reset while any port on any other hub is being reset. This is workable, but does slightly offend my sense of proper encapsulation: it could have been avoided if SetAddress (which is the only thing that devices in the Default state need to receive) was instead an port-directed message to the hub instead of to the device itself.

(How the bus-wide state machine works in UsbBus::device_events is a bit subtle: once a hub interrupt packet has been received – indicating the presence of a newly-detected device – that newly-detected device is processed from Powered to Default all the way to Address without returning to the top of the function to check for fresh interrupt packets.)

The zerocopy crate is bad for code-coverage, but bytemuck is better

Both bytemuck and zerocopy set out to solve the same problem: to make it easier to soundly convert between Rust #[repr(C)] structures, and byte slices. Such conversions are commonplace in cases where structured data is transferred over a byte-stream connection: networking, of course, but also USB. Jack Wrenn, one of the maintainers of zerocopy, wrote an excellent summary of why this matters, and what subtleties are lurking in that, seemingly straightforward, constrait of “soundness”. (That post also introduces a third solution, TransmuteFrom, currently in nightly and hoping one day to land in the standard library.)

Unfortunately, zerocopy’s marker traits work, in part, by having a hidden trait function which is implemented in the auto-derived implementations of those traits. This function is never called – it’s just there to police against non-auto-derived implementations, which the zerocopy soundness guarantees rely on – and so it appears as untested in code-coverage analysis. The alternative bytemuck crate does not use this mechanism, and so does not pollute code-coverage analysis with unfixable false negatives.

Many heads are better than one

Especially, it turns out, when they’re the heads of receive queues. As well as the RP2040 host-controller implementation in cotton-usb-host, I’ve been looking at supporting the STM32 USB host peripheral – specifically, the Synopsys DWC one in STM32F7 and other “F-series” STM32 parts.

And actually the Synopsys one is decidedly more awkward to deal with. On RP2040, interrupt endpoints are polled in hardware (you tell it the polling interval in milliseconds/SOFs), and if a packet arrives, it appears in a memory-mapped buffer specific to that endpoint. On STM32F7, however, there is a single receive queue for all endpoints, plus a register that tells you which endpoint the current head-of-queue packet was received on.

That’s actually a bit unfortunate, as it means that, for interrupt endpoints especially, the application must be prepared to receive data on any endpoint at any time – if a task opens an interrupt endpoint but then waits on anything else happening, a packet arriving on that interrupt endpoint will jam all USB traffic until somebody gets round to reading it. That feels like an unacceptable constraint on how clients of cotton-usb-host must be written – and indeed, cotton-usb-host’s own USB hub support would fall foul of such a restriction. So it’s likely that the implementation will have to read the packet queue in the IRQ handler and store any interrupt packets somewhere in RAM (in global statics) to be read later, just to unblock the queue so that subsequent packets can be read and tasks can make progress.

The USB peripheral in STM32H5 and STM32C0 is a different one, which does have per-endpoint receive buffers and so doesn’t suffer from head-of-line blocking problems; it should be easier to deal with, especially in RAM-constrained systems.

There isn’t an ArrayOfOptionStream stream, but you can just write one

The USB hub support in cotton-usb-host needs to listen out for interrupt-endpoint packets from all hubs on the bus simultaneously. The futures-util crate contains several functions for composing streams and futures into other streams and futures, including select() and select_all(), but none quite fit the bill (select_all() comes closest, but requires alloc, which nothing else in cotton-usb-host does).

But, in a similar epiphany to the one I had about unit-tests, I realised that by working in “poll-land” I could just write one. (And after all, someone just wrote select_all(), there’s no magic compiler support for it.) So cotton-usb-host now includes a HubStateStream, containing an array of Option<Stream>, and in its poll_next method just polling all the array elements which are Some.

Except that I got halfway through writing it and realised that I couldn’t. In order to call poll_next on the contained Stream, it needed to be a Pin<&mut Stream>. But, even though inside its own poll_next, HubStateStream must surely itself be pinned, meaning that its directly-contained streams must implicitly also be pinned – the compiler didn’t know that. This problem is referred to as “pin projection”, and there are crates that help deal with it, but again none seemed to quite fit the bill. And then I realised that select_all() and friends impose an additional constraint on the streams they work with: they must be Unpin, which means they support the Stream::poll_next_unpin() method, which can be called on a plain &mut Stream. Just so long as the interrupt-endpoint pipes I was using could be made Unpin (by declaring in trait HostController that associated type InterruptPipe must have Unpin as a supertrait), I wouldn’t need any pin-projection crates.

Not every Stream, of course, is Unpin. A Stream that is the result of an async block (with an anonymous type) might not be Unpin, as such streams typically embed internal pointers – but streams of named types, written in poll-land, are usually Unpin, and indeed the one which I’d already written for RP2040 already was. So it didn’t seem inappropriate to impose that constraint on other host-controller implementors.

No comments:

Post a Comment

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.