Not in fact any relation to the famous large Greek meal of the same name.

Wednesday 17 June 2009

Type Erasure, boost::mutex, And The617

The boost::mutex class is certainly not considered harmful. It’s jolly good, especially the way it, and its companion boost::condition, work properly on Win32. But it doesn’t half include a lot of header files to obtain all that jolly goodness.

Let’s get GCC to tell us what header files it includes. To do this, we need GCC’s dependency generation features, the ones normally used for generating Makefile fragments listing all the files a translation unit depends on.

$ echo "#include <boost/thread/mutex.hpp>" \
      | g++ -xc++ -E - -MP -M | grep ^/usr | sort
...long list...
OK, that’s indeed quite a lot of headers. How many exactly? (using GCC 4.3.3, amd64-linux, Boost 1.39.0)
$ echo "#include <boost/thread/mutex.hpp>" \
      | g++ -xc++ -E - -MP -M | grep ^/usr | sort | wc -l
618
Yes, a source file that includes nothing but <boost/thread/mutex.hpp> has actually also read six hundred and seventeen other headers by the time it’s done.

Now I’d hate to be one of those bloggers who does nothing but sit around and grouse about the people who are actually doing useful work. But I humbly submit that having to include 618 C++ headers just to get hold of a mutex class, is, in an objective sense, rather a lot.

Really what’s wanted is one of the “complete insulation” or type-erasure techniques from Lakos section 6.4, so that instead of the situation on the left — where every file in the project which needs a mutex, includes the Boost header and its 617 dependencies — we get the situation on the right, where only the encapsulating file needs to know the details, and everyone else can just include the simple leaf header my_mutex.h.

The problem is, every encapsulation technique discussed by Lakos has an inevitable cost in efficiency. Whichever way you slice it — whether you use a mutex protocol class plus a factory, or whether you use the handle/body or pimpl pattern — you tend to end up needing to call new and delete, not just every time you create or destroy a mutex, but every time you create or destroy a mutex scoped-lock object. Especially for rarely-contended locks, that overhead can easily swamp the actual implementation time.

The issue is, that to end up with a boost::mutex, client code must either construct one itself — depending on boost::mutex “in size” is Lakos’s term — or call other code which hands one back, which latter scheme necessarily requires an allocation.

So how can we solve this impasse? How can we depend on boost::mutex “in size”, without depending on all its headers?

Well, we can cheat. We can populate my_mutex.h with classes whose objects are the same size as boost::mutexes — and which, within the file my_mutex.cpp, are boost::mutexes — but whose actual type is invisible to other clients of my_mutex.h. The idea is to behave a bit like this:

class Mutex
{
    char m_data[sizeof(boost::mutex)];

public:
    Mutex()
    {
        new(m_data) boost::mutex;
    }
    ~Mutex()
    {
        ((boost::mutex*)m_data)->~mutex();
    }
};
though of course that doesn’t yet help as-is, as you still need to include the 617 to get sizeof(boost::mutex) to compile.

To eliminate the dependencies, you need to know sizeof(boost::mutex) ahead of time. Probably the best way to do this, is to get your configury to work it out, by adding lines like these to your configure.ac:

AC_LANG_PUSH([C++])
AC_CHECK_SIZEOF([boost::mutex],, [[#include <boost/thread/mutex.hpp>]])
AC_CHECK_SIZEOF([boost::condition],, [[#include <boost/thread/condition.hpp>]])
AC_CHECK_SIZEOF([boost::mutex::scoped_lock],, [[#include <boost/thread/mutex.hpp>]])
AC_LANG_POP
which will leave you with lines like this in your config.h:
#define SIZEOF_BOOST__CONDITION 88
#define SIZEOF_BOOST__MUTEX 40
#define SIZEOF_BOOST__MUTEX__SCOPED_LOCK 16
giving you exactly what you need to write my_mutex.h. (One day when you want to see something deliciously evil, go and look at the way Autoconf determines such size information when cross-compiling, without depending on any particular compiler or linker.) The result looks like this:
/* my_mutex.h */
#include "config.h"

class Mutex
{
    char m_data[SIZEOF_BOOST__MUTEX];

public:
    Mutex();
    ~Mutex();
};


/* my_mutex.cpp */
#include "my_mutex.h"
#include <boost/thread/mutex.hpp>

Mutex::Mutex()
{
    new(m_data) boost::mutex;
}

Mutex::~Mutex()
{
    ((boost::mutex*)m_data)->~mutex();
}
which is indeed good enough to free client code from the burden of the 617.

But if you’re doing this wrapping for more than one class — Chorale wanted at least boost::mutex, boost::mutex::scoped_lock, and boost::condition — you start to realise you’re writing the same thing many times over, and that it really ought to be wrapped up in a template. (An attempt is also made to avoid alignment issues, a worry which might already have been nagging at you after the above snippets; an assertion checks that the configury got the right answer for the size.) Here it is:

template <class T>
struct WrappedType;

/** Wrap up a type so that even clients who depend on it in size,
 * don't have to see its declaration.
 */
template <class T, unsigned int sz>
class Wrapper
{
    union {
        char m_data[sz];
        void *m_align;
    };

    /** Use a nested class, rather than using WrappedType<T>::type
     * directly, so that we can be sure that its destructor is called
     * "~Wrapped" -- if T1 is a typedef-name, its destructor won't be
     * called "~T1".
     */
    class Wrapped: public WrappedType<T>::type
    {
    public:
        Wrapped() {}

        template <class Arg>
        explicit Wrapped(Arg& arg) : WrappedType<T>::type(arg) {}
    };

public:
    Wrapper()
    {
        BOOST_STATIC_ASSERT(sizeof(Wrapped) == sz);
        new (m_data) Wrapped;
    }

    template <class Arg>
    explicit Wrapper(Arg& arg)
    {
        BOOST_STATIC_ASSERT(sizeof(Wrapped) == sz);
        new (m_data) Wrapped(arg);
    }

    ~Wrapper()
    {
        ((Wrapped*)m_data)->~Wrapped();
    }

    /** Calls to Unwrap() will only compile following a definition of
     * WrappedType<T> -- not in client code.
     */
    Wrapped& Unwrap() { return *(Wrapped*)m_data; }
};
Armed with this, my_mutex.h can be very straightforward:
#include "config.h"
#include "wrapper.h"

/** Wrap a boost::mutex so we don't include so very many headers.
 */
class Mutex: public Wrapper<Mutex, SIZEOF_BOOST__MUTEX>
{
public:
    Mutex();
    ~Mutex();

    class Lock: public Wrapper<Lock, SIZEOF_BOOST__MUTEX__SCOPED_LOCK>
    {
    public:
        Lock(Mutex&);
        ~Lock();
    };
};

class Condition: public Wrapper<Condition, SIZEOF_BOOST__CONDITION>
{
public:
    Condition();
    ~Condition();

    bool Wait(Mutex::Lock&, unsigned int sec);

    void NotifyOne();
    void NotifyAll();
};
and my_mutex.cpp not much less straightforward; note that WrappedType<T> is used like a traits class, in that the intended wrapped type is “revealed” by specialising WrappedType for the particular wrapper type — it’s only following such a specialisation, that the constructor, destructor, or Unwrap() calls will compile:
#include "my_mutex.h"
#include <boost/thread/mutex.hpp>
#include <boost/thread/condition.hpp>

template<>
struct WrappedType<Mutex>
{
    typedef boost::mutex type;
};

Mutex::Mutex()
{
}

Mutex::~Mutex()
{
}

template<>
struct WrappedType<Mutex::Lock>
{
    typedef boost::mutex::scoped_lock type;
};

Mutex::Lock::Lock(Mutex& mutex)
    : Wrapper<Lock,SIZEOF_BOOST__MUTEX__SCOPED_LOCK>(mutex.Unwrap())
{
}

Mutex::Lock::~Lock()
{
}

template<>
struct WrappedType<Condition>
{
    typedef boost::condition type;
};

Condition::Condition()
{
}

Condition::~Condition()
{
}

bool Condition::Wait(Mutex::Lock& lock, unsigned int sec)
{
    return Unwrap().timed_wait(lock.Unwrap(), boost::posix_time::seconds(sec));
}

void Condition::NotifyAll()
{
    Unwrap().notify_all();
}

void Condition::NotifyOne()
{
    Unwrap().notify_one();
}

So this technique is neat if a bit icky (with the casting and the explicit size-checking). Is it worthwhile? For Chorale, the answer was certainly “yes”. Chorale is vigorously multi-threaded, and many parts of the system use mutexes. Using a script a bit like the GCC-based dependency counter from the very top of this post, it turns out that Chorale’s 250ish source files and 250ish headers, had between them nearly 240,000 dependencies. Wrapping up just boost::mutex and boost::condition reduced that number to 115,000 — meaning that more than half of all header files compiled while compile the whole of Chorale, a fairly large and complex program, get compiled solely to satisfy the dependencies of boost::mutex. This startling figure is also borne out by the total time taken to compile (with a cold pagecache):

Before wrappingAfter wrapping
real    4m0.304s
user    10m33.296s
sys     1m38.870s
     
real    2m39.516s
user    6m10.523s
sys     1m9.452s
...a 42% improvement. It is just possible that there’s a run-time efficiency impact of this wrapping, as the mutex operations now all get inlined into my_mutex.cpp and not directly into the client code. However, as is the way with these things, it’s really just as likely that this gives a performance improvement — some of these Boost calls inline a lot of code, especially under Win32.

About Me

Cambridge, United Kingdom
Waits for audience applause ... not a sossinge.
CC0 To the extent possible under law, the author of this work has waived all copyright and related or neighboring rights to this work.