Saturday, 25 January 2014

Parenthesis removal

Let's say you have a macro like this:

#define DECLARE_VAR(type, name, init) type name init;

// Declare pi
DECLARE_VAR(const float, pi, = 3.14f)

// Declare file system object, default initialised
DECLARE_VAR(FileSystem, fs, )

There are several problems with this approach to variable declaration, but it should serve as an example. One of the problems is as follows:

template <typename A, typename B>
struct Pair {
    A a;
    B b;
};

// Error!
DECLARE_VAR(Pair<int, float>, my_pair, = { 10, 5.99f })

The problem is that macros have no respect for the underlying language, particularly templates and initialisers, and so your 'three' argument macro is actually a five argument macro because of those extra commas:

arg #0: Pair<int
arg #1: float>
arg #2: my_pair
arg #3: = { 10
arg #4: 5.99f }

The preprocessor quickly dismisses this nonsense.

To solve this in the usual case, we would delimit the arguments with brackets:

DECLARE_VAR((Pair<int, float>), my_pair, (= { 10, 5.99f }))

This solves the argument count problem, but since types and initialisers are not expressions, they can't be bracketed in the same way and produce illegal code:

// Still an error!
(Pair<int, float>) my_pair (= { 10, 5.99f });

So we need a way of removing the brackets. We can do this as follows:

#define JOIN_AND_EXPAND3(...) __VA_ARGS__
#define JOIN_AND_EXPAND2(a, ...) JOIN_AND_EXPAND3(a##__VA_ARGS__)
#define JOIN_AND_EXPAND(a, ...) JOIN_AND_EXPAND2(a, __VA_ARGS__)
#define RPIMPLRPIMPL
#define RPIMPL(...) RPIMPL __VA_ARGS__

#define REMOVE_PARENS(arg) JOIN_AND_EXPAND(RPIMPL,RPIMPL arg)

Now we can redefine our DECLARE_VAR macro and get the following:

#define DECLARE_VAR(type, name, init) \
    REMOVE_PARENS(type) name REMOVE_PARENS(init);

// const float pi = 3.14f;
DECLARE_VAR(const float, pi, = 3.14f)

// FileSystem fs;
DECLARE_VAR(FileSystem, fs, )

// Pair<int, float> my_pair = { 10, 5.99f };
DECLARE_VAR((Pair<int, float>), my_pair, (= { 10, 5.99f }))

Success! We don't need to bother using it on the name parameter because identifiers can't contain commas anyway and so you're unlikely to bracket it.

How does this work? Well, let's get some boilerplate out of the way first. First, JOIN_AND_EXPAND is simply a macro which forces its arguments to undergo expansion before concatenating them, and then forces a final extra expansion on the concatenated result. These are fairly commonplace constructs; more reading about it can be found in the Boost docs: BOOST_PP_CAT and BOOST_PP_EXPAND.

The magic happens in REMOVE_PARENS. If we pass a non-bracketed argument like FileSystem, it expands to this:

JOIN_AND_EXPAND(RPIMPL, RPIMPL FileSystem)

JOIN_AND_EXPAND then does its stuff, first concatenating the RPIMPLs together:

RPIMPLRPIMPL FileSystem

... then undergoing a final expansion, and since RPIMPLRPIMPL is defined as empty, we get:

FileSystem

If our argument is bracketed, we get a different effect:

JOIN_AND_EXPAND(RPIMPL, RPIMPL (Pair<int, float>))

Now the preprocessor sees the right hand side as an expansion of the parameterised RPIMPL macro, which it expands like this:

RPIMPL Pair<int, float>

... because it just expands the arguments unchanged with a RPIMPL token before it. Now the rest of the macro expansion happens much like before:

JOIN_AND_EXPAND(RPIMPL, RPIMPL Pair<int, float>)
RPIMPLRPIMPL Pair<int, float>
Pair<int, float>

This works because a parameterised macro token (RPIMPL in our case) is ignored by the preprocessor if it isn't followed by a bracketed argument list.

Note also that, because the preprocessor isn't recursive, this only does a single level parenthesis removal:

REMOVE_PARENS(int)     // int
REMOVE_PARENS((int))   // int
REMOVE_PARENS(((int))) // (int)

Here is a live example of it in action.