Not in fact any relation to the famous large Greek meal of the same name.

Monday 9 September 2024

Rustifying Lakos: Avoiding circular dependencies in Rust

Previously on #rust:

     

It would be unfair to John Lakos to gloss his monumental book Large-Scale C++ Software Design as basically just saying “Don’t have circular dependencies”. The book contains lots of hard-earned useful advice on the architecting of C/C++ systems, and if the words that make up its title, large-scale C++ software design, form any part of your daily life then I can only suggest that you just go and read it. (I... won’t wait. It’s a hefty tome. I confess I haven’t read any of his ongoing Peter-Jackson-esque project to expand it across three volumes.)

But it is fair to say that much of it consists of very C++ solutions to very C++ problems, and by no means all of it is relevant to large-scale software systems in other languages. Amongst all the pieces of good advice imparted, though, “Don’t have circular dependencies” is perhaps the most widely-applicable.

Rust’s cargo package manager effectively rules out circular dependencies between crates (i.e., libraries). But there’s no such built-in barrier to creating circular dependencies between modules within a crate. This blog post explains the third-party tooling that can be used to erect such a barrier.

But why, though?

Some of the benefits of non-circularity, or levelisability, lauded in the Lakos book apply more to C++ than to Rust: compilation-speed improvements, for instance, where in C++ the translation unit is the source file but in Rust it is the crate. But others still very much apply:

  • Integrity of structural induction: By “structural induction” I mean the idea that unit-testing can be seen as proving that your code is correct; the “proofs” in question aren’t always quite as rigorous as what a mathematician would recognise as a proof, but mathematics does provide a useful structure and language for reasoning about these “proofs”. For instance, a mathematician would say, if you’ve proven that certain pieces of code A and B do their jobs correctly – and that another piece of code C(x,y) does its own job correctly, just so long as the dependencies x and y that it gets given each do their own jobs correctly – then all told, you’ve proven that the overall combination C(A,B) is itself correct. With this mental model, you can use unit-testing to go about making a tree of proofs: first that each lowest-level component in your system does its job correctly, then that the components one level higher (with lowest-level components as dependencies) do their jobs correctly, and so on all the way up to your system as a whole. If your system’s dependency “tree” isn’t a tree at all but has loops or cycles, this proof falls to pieces – or, at the very best, you have to awkwardly test the whole loop of components as a single entity.

    If you didn’t do Test-Driven Development (perhaps you did some “Spike-Driven Exploration” first), and are only just now adding unit-tests to an existing crate, then this is exactly the information you need in order to direct your test-writing efforts where they’re most useful: write tests for the leaves of the dependency graph first, which are usually the foundational parts where failings would cause hard-to-understand cascading failures throughout the crate, then work your way up to the more complex modules.

  • Reusability: Since the days of CORBA and COM, efforts to foster a wide ecosystem of reusable software components have either failed, or succeeded in ways that were worse than failure. But reusability within an organisation or a project is still important, and it’s eased when components can be cleanly separated, without false dependencies on huge swathes of the surrounding code. (Extra “extremely online” points if you guessed what story that link would lead to before clicking on it.)
  • Ease of understanding: Just like an induction proof or like a reuse attempt, any attempt to understand the operation of the code – by any “new” developer, which includes future-you-who’s-forgotten-why-you-did-that – is easier if it can proceed on a component-by-component basis, as opposed to needing a whole interlinked cycle of components to be comprehended in one go.
  • Cleanliness, and its proximity to godliness: As I wrote on this very blog more than thirteen years ago, cyclic dependencies are a “design warning” – like a compiler warning, but for your design instead of your implementation. And just like a compiler warning, sometimes there’s a false positive – but keeping your code warning-free is still helpful, because it makes sure that when a true positive crops up, it’s breaking news, and isn’t lost in the noise.

But how, though?

#!/bin/bash -xe
#
# usage:
#    bin/do-deps
#    (browse to target/deps/index.html)

PACKAGES=`cargo metadata --no-deps --format-version 1 | jq '.packages[].name' --raw-output`

mkdir -p target/deps
echo "" > target/deps/index.htm

for PKG in $PACKAGES ; do
    echo "<img src=$PKG-deps.png><pre>" >> target/deps/index.htm
    cargo modules dependencies --package $PKG --lib \
          --no-externs --no-sysroot --no-fns --no-traits --no-types \
          --layout dot > target/$PKG-deps.dot
    sed -E -e 's/\[constraint=false\]//' -e 's/splines="line",//' \
        -e 's/rankdir=LR,//' \
        -e 's/label="(.*)", f/label="{\1}", f/' \
        < target/$PKG-deps.dot > target/$PKG-deps2.dot
    tred < target/$PKG-deps2.dot > target/$PKG-deps3.dot 2>> target/deps/index.htm
    dot -Tpng -Gdpi=72 < target/$PKG-deps3.dot > target/deps/$PKG-deps.png
    echo "</pre><hr/>" >> target/deps/index.htm
done

First of all, the (built-in to Cargo) cargo metadata command is used to list all the crates in the current workspace (usefully, if invoked in a non-workspace crate, it just returns that crate’s name).

Then the very excellent cargo modules command does most of the work. I’m only using a tiny part of its functionality here; you should read its documentation to learn about the rest, which might be of especial interest if you’d like to model your project’s dependencies in a different way from that presented here. That page also describes how to install it (but it’s just “cargo install cargo-modules”).

By default, cargo modules shows all the individual functions, traits and types that link the dependency graph together; for all but the smallest project, that’s too much clutter for these purposes, as we just want to see which modules depend on which other ones. Temporarily removing the --no-fns --no-traits --no-types arguments will show the whole thing, which can be helpful if it’s not obvious exactly why one module is reported as depending on another. (Currently, cargo modules has some issues around reporting impl X for Y blocks and around static data items – but issues caused by those are hopefully rare, and it’s still enormously better than nothing.)

The sed command has the effect of making the dependency graph run top-to-bottom rather than left-to-right; that seemed easier to read, especially in the blog-post format used here.

Each graph-definition (*.dot) file is drawn as a PNG image, and a simple HTML page is generated that just inlines all the images. This script is run in CI and the HTML page (with its images) is included in the output artifacts of that job.

Before and after

The (as yet unpublished) cotton-net crate exhibited several dependency cycles when analysed with that script; in this image there are three upward-pointing arrows, each indicating a cycle. (The colours represent visibility: almost everything here is green for “public visibility”.)

warning: %1 has cycle(s), transitive reduction not unique
cycle involves edge cotton_net::arp -> cotton_net

The graph can’t, of course, really indicate which components should be seen as “at fault” here – though, for each cycle, at least one of the components in that cycle must be changed for the cycle to be broken. Instead, determining which of the components is “at fault” is a very human endeavour, usually involving the train of thought, “Wait, why does thing-I-think-of-as-low-level depend on thing-I-think-of-as-higher-level?” In this particular case, some very low-level abstractions indeed (IPv4 address, MAC address) were in the crate root “for convenience”; the convenience factor is real but it can be achieved in a levelisable way by defining everything in its right place, and then exporting them again from the top-level (“pub use ethernet::MacAddress”) for library-users’ convenience. (Such exports of course count as a dependency of the root on the low-level module, which is the direction we don’t mind.)

Sometimes a cycle can be broken by separating out a trait from a contentious structure: in this case, the new interface_data trait was extracted from interface, which meant that all the individual protocols (TCP, UDP) could depend on the trait, not on the implementation. This is a bit like Lakos’s “Dumb Data” refactoring from “Large Scale C++ Software Design”, section 5.5; indeed, the whole of Chapter 5 of that book presents refactorings that can help disentangle cycles in Rust just as well as they can in C++.

Because of the way Graphviz’s tred tool works – indeed, because of the way the mathematical “transitive reduction” operation inherently works – it’s not actually very important how many cycles appear in the graph, only whether there’s some or none. In this case, as well as the three cyclic dependencies via the crate root, the modules interface and tcp depended directly on each other in a pathologically-small cycle; the graph doesn’t depict the tcpinterface arrow directly, as it’s implied by the larger cycle via tcpethernet→root→dhcpinterface. So in fact there were more than three issues with the original cotton-net (which, in my defence, was one of the first things I tried to write in Rust; some of the commits in there are over eight years old) – the fixing of dependency cycles is an iterative process, and each time you fix one you should re-check for the existence of further ones until there are none left.

No comments:

Post a Comment

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.