Tuesday, 4 February 2014

Ad-hoc RAII

RAII is great, but it's not always the most convenient to use for one-off cases. Let's say we want to open a file:

bool f(const char* filename) {
    FILE* file = fopen(filename, "rb");
    if (!file) {
        LOG("File %d could not be opened", filename);
        return false;
    }

    // Read from file

    fclose(file);
    return true;
}

A seasoned programmer should see that and worry that the file won't be closed properly if an exception is thrown or some other kind of early return occurs while reading the file. A non-C++ programmer may reach for 'finally':

bool f(const char* filename) {
    FILE* file = fopen(filename, "rb");
    if (!file) {
        LOG("File %d could not be opened", filename);
        return false;
    }

    try {
        // Read from file
    }
    finally {
        fclose(file);
    }

    return true;
}

C++ doesn't have finally (not in strictly compliant compilers anyway), so this is not an option. This construct also introduces undesirable nesting (especially if the 'read from file' bit is also deeply nested) and increases the visual distance between the code for the setup and shutdown, making it more likely that the two get out of sync as the function grows.

Instead, C++ has destructors and, in true RAII fashion, you would encapsulate the file resource in a class which opens the file on construction, closes it on destruction, and instantiate this class on the stack, so that the file is automatically closed when the object goes out of scope, no matter how the function ends.

You're likely to have such a RAII file class kicking around your codebase, but there are plenty of one-off cases where pulling in a whole library for a single class is overkill, or where the work is very specific to the task at hand and doesn't warrant defining a whole new class for this purpose. You also don't really want to have to define a class elsewhere in your file or project, away from the context of whatever it is you're cleaning up.

In these cases you can perform a little bit of ad-hoc RAII by defining and instantiating a local class with a destructor only. Let's stick with the file example:

bool f(const char* filename) {
    FILE* file = fopen(filename, "rb");
    if (!file) {
        LOG("File %d could not be opened", filename);
        return false;
    }

    struct fclose_t {
        ~fclose_t() {
            fclose(file);
        }

        FILE* file;
    } fclose_obj = { file };

    // Read from file

    return true;
}

The definition and instantiation of the local class ensures that its destructor will be called however the function exits, and that's what closes the file. We don't need a dedicated file class with member functions, private state, invariants, etc. when we just want to close the file. We just use an aggregate which we initialise using curly bracket initialiser syntax. Note too that the code which closes the file remains adjacent to the code which opened it, no matter how large 'Read from file' gets. This aids maintenance.

The biggest problem with this kind of code, known as a scope guard, is that all this boilerplate is ugly and a pain to type each time you need to utilise it. We'll see what we can do about that in a future post.

Incidentally, it should be noted that, as per normal destructor rules, you shouldn't do anything in your scope guard destructor that may throw an exception, as this will terminate the program if the code is invoked during stack unwinding.

Saturday, 25 January 2014

Parenthesis removal

Let's say you have a macro like this:

#define DECLARE_VAR(type, name, init) type name init;

// Declare pi
DECLARE_VAR(const float, pi, = 3.14f)

// Declare file system object, default initialised
DECLARE_VAR(FileSystem, fs, )

There are several problems with this approach to variable declaration, but it should serve as an example. One of the problems is as follows:

template <typename A, typename B>
struct Pair {
    A a;
    B b;
};

// Error!
DECLARE_VAR(Pair<int, float>, my_pair, = { 10, 5.99f })

The problem is that macros have no respect for the underlying language, particularly templates and initialisers, and so your 'three' argument macro is actually a five argument macro because of those extra commas:

arg #0: Pair<int
arg #1: float>
arg #2: my_pair
arg #3: = { 10
arg #4: 5.99f }

The preprocessor quickly dismisses this nonsense.

To solve this in the usual case, we would delimit the arguments with brackets:

DECLARE_VAR((Pair<int, float>), my_pair, (= { 10, 5.99f }))

This solves the argument count problem, but since types and initialisers are not expressions, they can't be bracketed in the same way and produce illegal code:

// Still an error!
(Pair<int, float>) my_pair (= { 10, 5.99f });

So we need a way of removing the brackets. We can do this as follows:

#define JOIN_AND_EXPAND3(...) __VA_ARGS__
#define JOIN_AND_EXPAND2(a, ...) JOIN_AND_EXPAND3(a##__VA_ARGS__)
#define JOIN_AND_EXPAND(a, ...) JOIN_AND_EXPAND2(a, __VA_ARGS__)
#define RPIMPLRPIMPL
#define RPIMPL(...) RPIMPL __VA_ARGS__

#define REMOVE_PARENS(arg) JOIN_AND_EXPAND(RPIMPL,RPIMPL arg)

Now we can redefine our DECLARE_VAR macro and get the following:

#define DECLARE_VAR(type, name, init) \
    REMOVE_PARENS(type) name REMOVE_PARENS(init);

// const float pi = 3.14f;
DECLARE_VAR(const float, pi, = 3.14f)

// FileSystem fs;
DECLARE_VAR(FileSystem, fs, )

// Pair<int, float> my_pair = { 10, 5.99f };
DECLARE_VAR((Pair<int, float>), my_pair, (= { 10, 5.99f }))

Success! We don't need to bother using it on the name parameter because identifiers can't contain commas anyway and so you're unlikely to bracket it.

How does this work? Well, let's get some boilerplate out of the way first. First, JOIN_AND_EXPAND is simply a macro which forces its arguments to undergo expansion before concatenating them, and then forces a final extra expansion on the concatenated result. These are fairly commonplace constructs; more reading about it can be found in the Boost docs: BOOST_PP_CAT and BOOST_PP_EXPAND.

The magic happens in REMOVE_PARENS. If we pass a non-bracketed argument like FileSystem, it expands to this:

JOIN_AND_EXPAND(RPIMPL, RPIMPL FileSystem)

JOIN_AND_EXPAND then does its stuff, first concatenating the RPIMPLs together:

RPIMPLRPIMPL FileSystem

... then undergoing a final expansion, and since RPIMPLRPIMPL is defined as empty, we get:

FileSystem

If our argument is bracketed, we get a different effect:

JOIN_AND_EXPAND(RPIMPL, RPIMPL (Pair<int, float>))

Now the preprocessor sees the right hand side as an expansion of the parameterised RPIMPL macro, which it expands like this:

RPIMPL Pair<int, float>

... because it just expands the arguments unchanged with a RPIMPL token before it. Now the rest of the macro expansion happens much like before:

JOIN_AND_EXPAND(RPIMPL, RPIMPL Pair<int, float>)
RPIMPLRPIMPL Pair<int, float>
Pair<int, float>

This works because a parameterised macro token (RPIMPL in our case) is ignored by the preprocessor if it isn't followed by a bracketed argument list.

Note also that, because the preprocessor isn't recursive, this only does a single level parenthesis removal:

REMOVE_PARENS(int)     // int
REMOVE_PARENS((int))   // int
REMOVE_PARENS(((int))) // (int)

Here is a live example of it in action.