Not in fact any relation to the famous large Greek meal of the same name.

Saturday 12 September 2009

Recursive Make Is A Time-Waster

Now you’ve arranged to divide your project into libraries, how do you go about compiling them?

Clearly you want to be able to do the whole thing in one go: to do otherwise would be to fail the Joel test. And you want all the dependencies to be checked every time: there’s no bug harder to find than one that no longer actually exists in the source you’re looking at. (And stale objects, not updated following a header change, can lead to ODR violations: in other words, to random spookiness and borkage that the compiler and linker aren’t required to warn you about, and in many cases can’t even theoretically do so.)

GNU Automake is one answer to the problem, and makefiles in the same spirit as the ones it generates are popular even in projects that don’t use Automake. Such makefiles express inter-module dependencies, such as that of an application on a library, by recursively invoking make in the library directory before invoking it again in the application directory.

For a variety of reasons, documented once and for all in the famous paper Recursive make Considered Harmful, this is a bad idea. The paper suggests that for some systems, it’s even a bad idea for correctness reasons — that incorrect builds with inadequately-followed dependencies can result. But the example given is that of a system whose module dependencies can’t be serialised; this naturally means Automake’s sequential run over the sub-Makefiles can’t do the Right Thing. However, if the module dependencies can’t be serialised, that means there’s a cycle and they don’t form a true hierarchy; that’s a bad situation for more reasons than just Automake’s — bad for reusability, bad for unit-testability, and bad because it reveals that the system designers haven’t really thought through what the rôles of the modules are.

So if that doesn’t apply to you, if your modules are properly factored into a neat hierarchy, does that mean there’s less incentive to ditch a recursive-make scheme and take the time to write a non-recursive one? Less, perhaps, but decidedly not none — because there are substantial performance benefits from whole-project makefiles on modern systems.

This effect is, to be fair, mentioned in Recursive make Considered Harmful (7.1), but the author didn’t draw it out on pretty graphs, nor quantify the improvements, so there is scope for it to be better-documented.

Suppose your project has ten components (nine libraries and the application, say), each with between four and twenty source files. These source files won’t, naturally, all take exactly the same length of time to compile — in fact, typically there’ll be quite a wide range of compilation times, especially if only certain files include certain rather profligate library headers. And further suppose that the machine you’re compiling on has eight CPU cores or threads (such machines are desktop-class these days), so you use make -j8.

If your makefiles are like Automake’s, what happens is that make will make the first library using up to eight CPUs, then, once that’s done, make the second library using up to eight CPUs, and so on until it’s done all of them in the serialised order you used in the top-level Makefile. Which is correct, but it’s not efficient, because make needlessly serialises the entire of each library build against the entire of the next one, when typically a library’s components will each depend on only some of the previous one’s targets.

In graphical form, your CPU utilisation would look like the following diagram, where time runs from left to right, each horizontal row represents a different CPU, and blocks of the same colour belong to the same library; the top half of the diagram shows CPU utilisation during a recursive-make-driven recompilation, and the bottom half the same recompilation done with a whole-project Makefile.


click for full-size

Using one Makefile has allowed make -j8 to keep all eight CPUs busy the whole time — as, whenever a CPU comes free, any pending work from anywhere else in the project can be scheduled on it. By contrast, the top half of the diagram has lots of whitespace, where all the CPUs must wait each time until the last one in that library has finished.

In this example, the build takes scarcely 70% of the time a recursive build would do. If you have more cores, perhaps in a multi-machine build farm, the benefit goes up: building on four cores sees a reeduction only to 88%, but 12 cores sees the whole-project build take just 59% as long as the recursive version, and on 16 cores the time is better than halved. If, in the limit, you work for Dr. Evil, your entire company is basically one big compute surface and you can use “make -j one-meelion”, then the improvement factor is essentially the same as the number of libraries — in this case, 10x, though in practice if you went down that route you’d find communication overheads starting to bite into your scalability.

No comments:

Post a Comment

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.