Tuesday, 4 February 2014

Ad-hoc RAII

RAII is great, but it's not always the most convenient to use for one-off cases. Let's say we want to open a file:

bool f(const char* filename) {
    FILE* file = fopen(filename, "rb");
    if (!file) {
        LOG("File %d could not be opened", filename);
        return false;
    }

    // Read from file

    fclose(file);
    return true;
}

A seasoned programmer should see that and worry that the file won't be closed properly if an exception is thrown or some other kind of early return occurs while reading the file. A non-C++ programmer may reach for 'finally':

bool f(const char* filename) {
    FILE* file = fopen(filename, "rb");
    if (!file) {
        LOG("File %d could not be opened", filename);
        return false;
    }

    try {
        // Read from file
    }
    finally {
        fclose(file);
    }

    return true;
}

C++ doesn't have finally (not in strictly compliant compilers anyway), so this is not an option. This construct also introduces undesirable nesting (especially if the 'read from file' bit is also deeply nested) and increases the visual distance between the code for the setup and shutdown, making it more likely that the two get out of sync as the function grows.

Instead, C++ has destructors and, in true RAII fashion, you would encapsulate the file resource in a class which opens the file on construction, closes it on destruction, and instantiate this class on the stack, so that the file is automatically closed when the object goes out of scope, no matter how the function ends.

You're likely to have such a RAII file class kicking around your codebase, but there are plenty of one-off cases where pulling in a whole library for a single class is overkill, or where the work is very specific to the task at hand and doesn't warrant defining a whole new class for this purpose. You also don't really want to have to define a class elsewhere in your file or project, away from the context of whatever it is you're cleaning up.

In these cases you can perform a little bit of ad-hoc RAII by defining and instantiating a local class with a destructor only. Let's stick with the file example:

bool f(const char* filename) {
    FILE* file = fopen(filename, "rb");
    if (!file) {
        LOG("File %d could not be opened", filename);
        return false;
    }

    struct fclose_t {
        ~fclose_t() {
            fclose(file);
        }

        FILE* file;
    } fclose_obj = { file };

    // Read from file

    return true;
}

The definition and instantiation of the local class ensures that its destructor will be called however the function exits, and that's what closes the file. We don't need a dedicated file class with member functions, private state, invariants, etc. when we just want to close the file. We just use an aggregate which we initialise using curly bracket initialiser syntax. Note too that the code which closes the file remains adjacent to the code which opened it, no matter how large 'Read from file' gets. This aids maintenance.

The biggest problem with this kind of code, known as a scope guard, is that all this boilerplate is ugly and a pain to type each time you need to utilise it. We'll see what we can do about that in a future post.

Incidentally, it should be noted that, as per normal destructor rules, you shouldn't do anything in your scope guard destructor that may throw an exception, as this will terminate the program if the code is invoked during stack unwinding.

Saturday, 25 January 2014

Parenthesis removal

Let's say you have a macro like this:

#define DECLARE_VAR(type, name, init) type name init;

// Declare pi
DECLARE_VAR(const float, pi, = 3.14f)

// Declare file system object, default initialised
DECLARE_VAR(FileSystem, fs, )

There are several problems with this approach to variable declaration, but it should serve as an example. One of the problems is as follows:

template <typename A, typename B>
struct Pair {
    A a;
    B b;
};

// Error!
DECLARE_VAR(Pair<int, float>, my_pair, = { 10, 5.99f })

The problem is that macros have no respect for the underlying language, particularly templates and initialisers, and so your 'three' argument macro is actually a five argument macro because of those extra commas:

arg #0: Pair<int
arg #1: float>
arg #2: my_pair
arg #3: = { 10
arg #4: 5.99f }

The preprocessor quickly dismisses this nonsense.

To solve this in the usual case, we would delimit the arguments with brackets:

DECLARE_VAR((Pair<int, float>), my_pair, (= { 10, 5.99f }))

This solves the argument count problem, but since types and initialisers are not expressions, they can't be bracketed in the same way and produce illegal code:

// Still an error!
(Pair<int, float>) my_pair (= { 10, 5.99f });

So we need a way of removing the brackets. We can do this as follows:

#define JOIN_AND_EXPAND3(...) __VA_ARGS__
#define JOIN_AND_EXPAND2(a, ...) JOIN_AND_EXPAND3(a##__VA_ARGS__)
#define JOIN_AND_EXPAND(a, ...) JOIN_AND_EXPAND2(a, __VA_ARGS__)
#define RPIMPLRPIMPL
#define RPIMPL(...) RPIMPL __VA_ARGS__

#define REMOVE_PARENS(arg) JOIN_AND_EXPAND(RPIMPL,RPIMPL arg)

Now we can redefine our DECLARE_VAR macro and get the following:

#define DECLARE_VAR(type, name, init) \
    REMOVE_PARENS(type) name REMOVE_PARENS(init);

// const float pi = 3.14f;
DECLARE_VAR(const float, pi, = 3.14f)

// FileSystem fs;
DECLARE_VAR(FileSystem, fs, )

// Pair<int, float> my_pair = { 10, 5.99f };
DECLARE_VAR((Pair<int, float>), my_pair, (= { 10, 5.99f }))

Success! We don't need to bother using it on the name parameter because identifiers can't contain commas anyway and so you're unlikely to bracket it.

How does this work? Well, let's get some boilerplate out of the way first. First, JOIN_AND_EXPAND is simply a macro which forces its arguments to undergo expansion before concatenating them, and then forces a final extra expansion on the concatenated result. These are fairly commonplace constructs; more reading about it can be found in the Boost docs: BOOST_PP_CAT and BOOST_PP_EXPAND.

The magic happens in REMOVE_PARENS. If we pass a non-bracketed argument like FileSystem, it expands to this:

JOIN_AND_EXPAND(RPIMPL, RPIMPL FileSystem)

JOIN_AND_EXPAND then does its stuff, first concatenating the RPIMPLs together:

RPIMPLRPIMPL FileSystem

... then undergoing a final expansion, and since RPIMPLRPIMPL is defined as empty, we get:

FileSystem

If our argument is bracketed, we get a different effect:

JOIN_AND_EXPAND(RPIMPL, RPIMPL (Pair<int, float>))

Now the preprocessor sees the right hand side as an expansion of the parameterised RPIMPL macro, which it expands like this:

RPIMPL Pair<int, float>

... because it just expands the arguments unchanged with a RPIMPL token before it. Now the rest of the macro expansion happens much like before:

JOIN_AND_EXPAND(RPIMPL, RPIMPL Pair<int, float>)
RPIMPLRPIMPL Pair<int, float>
Pair<int, float>

This works because a parameterised macro token (RPIMPL in our case) is ignored by the preprocessor if it isn't followed by a bracketed argument list.

Note also that, because the preprocessor isn't recursive, this only does a single level parenthesis removal:

REMOVE_PARENS(int)     // int
REMOVE_PARENS((int))   // int
REMOVE_PARENS(((int))) // (int)

Here is a live example of it in action.

Wednesday, 18 January 2012

Iteration

++health;

A typical container iteration loop tends to be written like this:

for (auto i = v.begin(); i != v.end(); ++i)
    i->f();

You see this kind of code all over the place. You can't get any more efficient than that, can you? Let's look at the generated code (Visual Studio 2010, x86, optimised, security tests disabled, and rewritten for readability):

mov  edi,dword ptr [v]
  mov  esi,dword ptr [edi]
  cmp  esi,dword ptr [edi+4]
  je   finish
  nop
repeat:
  call T::f
  add  esi,4
  cmp  esi,dword ptr [edi+4]
  jne  repeat
finish:

So what do we have here? First we load our container's address into edi and then read a dword from it. This will be the 'begin iterator'. Then we compare it with a dword read from edi+4 (the end iterator) and terminate if they're equal. Then we go into our loop, calling our member function, adding 4 to esi (incrementing the iterator) and then doing the same comparison to repeat the loop.

Let's do the same thing using an algorithm:

std::for_each(v.begin(), v.end(), [](T& t){
    t.f();
});

This is the code produced by that algorithm:

mov  eax,dword ptr [v]
  mov  esi,dword ptr [eax]
  mov  edi,dword ptr [eax+4]
  cmp  esi,edi
  je   finish
repeat:
  call T::f
  add  esi,4
  cmp  esi,edi
  jne  repeat
finish:

It's similar, but subtly different. First is that v's address is begin stored in eax instead of edi. No big deal; compilers' register allocation algorithms often results in irrelevant differences like this. The significant difference occurs when it does the end-of-loop test. The end iterator is being read into a register, and it is that register which is being compared against for our end-of-loop condition. Compare this to the previous code where the end iterator is being read from memory every time round the loop. Comparing two registers is faster than comparing a register with a memory location.

So the simple 'for' case has a performance penalty compared to the algorithm case! How can this be? Surely the compiler has to do more work to inline the algorithm and the lambda? It does have to do that work, but the problem is not one of inlining. The problem is aliasing.

Aliasing is the issue of multiple pointers/references pointing to the same memory. In the case of the 'for' loop, we are calling end() each time round the loop, which the compiler translates into a read from memory. Why doesn't the compiler cache the result in a register? Well, it can't, not without violating the rules of C++, because it doesn't know that the end iterator won't change each time round the loop. That is, it doesn't know that the container isn't being an aliased by some code executing within the call to T::f(), so it's forced to re-evaluate the function - hence the reread from memory. It's entirely possible that, if it could see enough of the loop body to know everything it touches, it could cache the data that it knows won't change, but it can't do that in general.

The algorithm produces the better code because end() is only called once to produce the argument to the std::for_each function. Once the function executes, it only operates on those iterators, not the container.

It's simple enough to change the original for loop to avoid the problem:

for (auto i = v.begin(), e = v.end(); i != e; ++i)
    i->f();

You'll find that this now produces the same quality of code as the algorithm.

Is this kind of optimisation really significant, in the grant scheme of things? Well, probably not in most cases, but it depends on what the loop is doing, how your data is organised and what the rest of your program is doing. It's a prime candidate for false sharing to kick in, where the extra unnecessary read on a large multiprocessed algorithm could become a significant problem.

Also, as C++ programmers, we've already conditioned ourselves to use pre-increment on loop iteration instead of a post-increment because of the performance concerns of creating temporary iterators, so it's not much of a stretch to apply this optimisation too.

And all of this will be done for us when the range-based for loop syntax becomes widespread.

Tuesday, 17 January 2012

Nested lambda captures on Visual Studio

Are you using the 'new' lambda support in Visual Studio 2010? Of course you are, they're very useful. However, like some other parts of VC2010, they aren't without problems.

One of those problems is nested lambda captures. What I mean by that is capturing a variable which is itself captured. Imagine you have a matrix defined as follows:

typedef std::vector<int> row_t;
typedef std::vector<row_t> matrix_t;

If you wanted to, say, increment all elements of your matrix by a given value, you might do something like this:

void inc(matrix_t& m, int n)
{
    std::for_each(m.begin(), m.end(), [n](row_t& r) {
        std::for_each(r.begin(), r.end(), [n](int& e) {
            e += n;
        });
    });
}

Unfortunately, this gives the following error:

error C3480: '`anonymous-namespace'::<lambda0>::n': a lambda capture variable must be from an enclosing function scope

This appears to be a known problem. The workaround is to introduce a new variable inside your outer lambda and capture that instead:

void inc(matrix_t& m, int n)
{
    std::for_each(m.begin(), m.end(), [n](row_t& r) {
        int n2 = n;
        std::for_each(r.begin(), r.end(), [n2](int& e) {
            e += n2;
        });
    });
}

Happy lambdaing!

Saturday, 14 January 2012

Construction failure

--health;
employed = false;
yearsMarried = 5;

As mentioned in my previous articles about default construction, a common pattern that you see is the use of the default constructor in order to create a 'zombie' state, followed by a post-construction initialisation function which 'constructs' it properly. This is often to do with deficient frameworks which want to create your objects for you and, because they don't generally know the constructor parameter lists of classes they don't define, they rely on default construction to create the object and let the class author deal with proper initialisation at a later time.

However, it's not always the case. It's often the case that a post-construction initialisation step is provided in order to return an error code which specifies whether or not initialisation was successful. The argument is that constructors don't provide return values so how can you possibly know if it succeeded?

ErrorCode::type MyResource::Initialise()
{
    handle_ = AllocateResource();
    if (!handle_)
        return ErrorCode::allocation_failure;

    return ErrorCode::ok;
}

Of course, the real method of signalling errors from constructors is through the use of exceptions. However, quite often the penalty for reporting errors in this way is not one which the executing program can afford. I'm used to the games programming world where exceptions are typically disabled because the cost of enabling them is too high. RTTI (run-time type information) also needs to be enabled, which means that your program footprint is being filled with type metadata, exception tables and stack unwinding hooks, which can be considerably expensive in terms of memory consumption. Isn't it better to use this memory for something players will appreciate?

We still want our class invariants to hold. That is an important thing to maintain for the sake of code quality and reasoning about the program, so we want to avoid a second initialisation step if we can. We can't throw an exception from our constructor so what do we do?

One option is to pass out an error code out from the constructor via a reference:

MyResource::MyResource(ErrorCode::type& error)
{
    handle_ = AllocateResource();
    error = handle_ ? ErrorCode::ok : ErrorCode::allocation_failure;
}

This works, but it's of little use; you still have a 'zombie' state which the class needs to consider, even if it's as simple as having the class handle the case where the user decides to use your object after ignoring the error code. We've failed to maintain a strict invariant.

On the other hand, what does it mean for a class to fail initialisation during execution of a game, or any other realtime system? If you're driving around a city and a landmark pops into view which is unique to the area you've just driven into, and you can't allocate the resources for that landmark, how do you deal with that? You can't just pop up an error and hope that the player forgives you for the grievous interruption of their game. You can't just swallow the error and stop the landmark appearing - imagine if you were driving around San Francisco and you couldn't allocate the resources for the Golden Gate Bridge. There would just be a big hole in the world that you'd fall through.

Pre-allocation is where it's at. By pre-allocating all of your resources at the start of your program, you can do all of that error checking once, in one place, and then you can start dishing them out as the code requires them. Your class constructor simply requests one of these resources:

#define MAX_RESOURCES 256
std::vector<ResourceHandle> g_handles;

int main()
{
    for (int i = 0; i != MAX_RESOURCES; ++i)
    {
        ResourceHandle h = AllocateResource();
        if (!h)
            exit(1); // Your game is never going to run if we can't allocate these
        g_handles.push_back(h);
    }

    // Run game
}

MyResource::MyResource()
{
    // We're assuming we have handles available to us
    assert(!g_handles.empty());

    // Steal a handle from the global vector
    handle_ = g_handles.pop_back();
}

MyResource::~MyResource()
{
    // Put it back again when we're finished with it
    g_handles.push_back(handle_);
}

Now our constructor is guaranteed to run without failure. That's assuming that there are still some handles available, of course, but this will typically be the case where the resource limitations are known up-front and the game code is deliberately constructed in such a way as to not exceed those limitations. We're still taking advantage of RAII by ensuring that the resource is returned to the pool on destruction, and our invariant is strong because the constructor running to completion means that the object has acquired the resource.

Monday, 9 January 2012

Default constructors (4)

Even the exalted STL isn't immune to the allure of default constructors.

In my first post, I mentioned that having a singular state weakens your class invariant, and it does. On the other hand, we have the case where STL explicitly allows singular states in the form of iterators. STL-compatible iterators which are default constructible are assumed to have singular state until assigned to with a non-singular state, before which all operations except assignment and destruction are illegal (from n3242.pdf):

Iterators can also have singular values that are not associated with any sequence. [Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example] Results of most expressions are undefined for singular values; the only exceptions are destroying an iterator that holds a singular value, the assignment of a non-singular value to an iterator that holds a singular value, and, for iterators that satisfy the DefaultConstructible requirements, using a value-initialized iterator as the source of a copy or move operation.

It is defined this way, of course, because iterators are a generalisation of pointers, which C++ inherited from C, which already work this way. The responsibility for correct iterator usage is then pushed onto iterator users, though considerate iterator authors could provide feedback on incorrect usage as a QOI point.

Still, it does mean that authors have slightly less to worry about, which is good because iterators are hard enough to define correctly at the best of times. And there is a slight efficiency gain to be had where default construction+assignment should be no less efficient than direct assignment; a valid concern in something as fundamental as iteration.

All of this works because C++ programmers are already conditioned to think this way about pointers and there isn't great leap of faith needed to think of iterators in the same way. If you are able to convince your users to think the same way about the classes you provide then perhaps this is another option for you. However, it's a harder sell if you don't have precedent to fall back on.

There is another area of STL where I've seen classes incorrectly crippled with a default constructor, and that's because of std::vector. When faced with a need to reduce the size of a vector, people inevitably reach for resize, which is defined as:

void resize(size_type sz, T c = T());

The defaulted argument is only used when you increase the size of a vector, and specifies the value of the new elements. When you are shrinking the vector, you are having to construct a temporary for no good reason, and that temporary is default constructed if you don't explicitly provide it. You can get away with only creating the temporary when necessary by splitting resize into two overloads:

void resize(size_type sz);
void resize(size_type sz, T c);

Here, the first overload can be written to only default construct a temporary when sz is greater than the vector's current size. That temporary is then used to fill up the space.

However, if your contained class doesn't provide a default constructor, you still won't work. C++ is a statically-typed language and the code still needs to be able to call a default constructor, even if it's not called. Attempting to do so will cause a compile error.

The answer to all of this is not to add a default constructor to your class. The answer is to use erase instead:

// Shrink vec by n:
vec.resize(vec.size() - n);           // Don't do this
vec.erase (vec.end() - n, vec.end()); // Do this instead

Sunday, 8 January 2012

Default constructors (3)

Last time, I mentioned two options that were available for maintaining your non-default-constructible class without needing to weaken its invariant.

Let's say you have the following class, as provided by Stroustrup:

class File_handle {
public:
    File_handle(const char* n, const char* rw)
    {
        f = fopen(n,rw);
        if (f==0)
            throw Open_failure(n);
    }

    ~File_handle()
    {
        fclose(f);
    }

private:
    File_handle(const File_handle&);
    File_handle& operator=(const File_handle&);

    FILE* f;
};

I've reformatted it slightly, as well as making it non-copyable (pretty important if you're writing it for real!). The class is a perfect example of RAII in action. If the object exists, the file it manages is open. This is a strong invariant.

Now, let's say I wanted to use make a new class which encapsulates a file, but that class had to fit into a framework which requires default construction:

class DefaultConstructible : public FrameworkClass
{
public:
    DefaultConstructible()
    : file_(???)
    {
    }

private:
    DefaultConstructible(const DefaultConstructible&);
    DefaultConstructible& operator=(const DefaultConstructible&);

    File_handle file_;
};

We have no arguments to construct the file_ member with. You may think to pre-set some global or class-static variables in advance of creating the DefaultConstructible object, but that would be a concurrency disaster waiting to happen, as well as proving yourself to be morally bankrupt. So we make use of the first of the two suggestions to delay creation of our member until a point where we are able to provide the arguments. boost::optional:

class DefaultConstructible : public FrameworkClass
{
public:
    DefaultConstructible()
    // File_handle is 'unconstructed' here
    {
    }

    void Initialise(const char* n, const char* rw)
    {
        file_ = boost::in_place(n, rw); // Construct the file_ properly here
    }

private:
    DefaultConstructible(const DefaultConstructible&);
    DefaultConstructible& operator=(const DefaultConstructible&);

    boost::optional<File_handle> file_;
};

If our File_handle class was copy/move constructible, we could have just used direct assignment to initialise it:

file_ = File_handle(n, rw);

However, it isn't, so we utilise another Boost utility called boost::in_place in order to allow the optional to directly construct the File_handle, rather than needing to a copy of a temporary one.

At this point you may ask, why not just 'new' the File_handle when we need it? For example:

class DefaultConstructible : public FrameworkClass
{
public:
    DefaultConstructible()
    // File_handle is 'unconstructed' here
    {
    }

    void Initialise(const char* n, const char* rw)
    {
        file_.reset(new File_handle(n, rw)); // Construct the file_ properly here
    }

private:
    DefaultConstructible(const DefaultConstructible&);
    DefaultConstructible& operator=(const DefaultConstructible&);

    std::auto_ptr<File_handle> file_;
};

Well, that's an equally valid option too. Generally, I am of the opinion that heap allocation is to be avoided where necessary. There's no need to allocate here, so I don't.

However, you may already be doing almost exactly this because you are employing what I mentioned as my second option: the Pimpl idiom. In which case, you're already most of the way there. All you need to do is ensure that you handle the error where your object isn't yet created before you forward on your calls to the implementation:

////////////////////////////
// DefaultConstructible.h //
////////////////////////////
class File_Handle;

class DefaultConstructible : public FrameworkClass
{
public:
    DefaultConstructible();
    ~DefaultConstructible();

    void Initialise(const char* n, const char* rw);

    void Read(const void* p, size_t bytes);

private:
    DefaultConstructible(const DefaultConstructible&);
    DefaultConstructible& operator=(const DefaultConstructible&);

    std::auto_ptr<File_handle> file_; // Or std::unique_ptr if you're now on C++11
};

//////////////////////////////
// DefaultConstructible.cpp //
//////////////////////////////
#include "File_handle.h"

DefaultConstructible:: DefaultConstructible() {}
DefaultConstructible::~DefaultConstructible() {}

void DefaultConstructible::Initialise(const char* n, const char* rw)
{
    assert(!file_.get());

    file_.reset(new File_handle(n, rw)); // Construct the file_ properly here
}

void DefaultConstructible::Read(const void* p, size_t bytes)
{
    assert(file_.get());

    file_->Read(p, bytes); // Assuming File_handle has a Read function
}

I have used assert here, but you can use whatever you like: relying on std::auto_ptr to do the check for you, error codes, exceptions or something more cunning.