Fuz's blog: January 2012

Wednesday, 18 January 2012

Iteration

++health;

A typical container iteration loop tends to be written like this:

for (auto i = v.begin(); i != v.end(); ++i)
    i->f();

You see this kind of code all over the place. You can't get any more efficient than that, can you? Let's look at the generated code (Visual Studio 2010, x86, optimised, security tests disabled, and rewritten for readability):

mov  edi,dword ptr [v]
  mov  esi,dword ptr [edi]
  cmp  esi,dword ptr [edi+4]
  je   finish
  nop
repeat:
  call T::f
  add  esi,4
  cmp  esi,dword ptr [edi+4]
  jne  repeat
finish:

So what do we have here? First we load our container's address into edi and then read a dword from it. This will be the 'begin iterator'. Then we compare it with a dword read from edi+4 (the end iterator) and terminate if they're equal. Then we go into our loop, calling our member function, adding 4 to esi (incrementing the iterator) and then doing the same comparison to repeat the loop.

Let's do the same thing using an algorithm:

std::for_each(v.begin(), v.end(), [](T& t){
    t.f();
});

This is the code produced by that algorithm:

mov  eax,dword ptr [v]
  mov  esi,dword ptr [eax]
  mov  edi,dword ptr [eax+4]
  cmp  esi,edi
  je   finish
repeat:
  call T::f
  add  esi,4
  cmp  esi,edi
  jne  repeat
finish:

It's similar, but subtly different. First is that v's address is begin stored in eax instead of edi. No big deal; compilers' register allocation algorithms often results in irrelevant differences like this. The significant difference occurs when it does the end-of-loop test. The end iterator is being read into a register, and it is that register which is being compared against for our end-of-loop condition. Compare this to the previous code where the end iterator is being read from memory every time round the loop. Comparing two registers is faster than comparing a register with a memory location.

So the simple 'for' case has a performance penalty compared to the algorithm case! How can this be? Surely the compiler has to do more work to inline the algorithm and the lambda? It does have to do that work, but the problem is not one of inlining. The problem is aliasing.

Aliasing is the issue of multiple pointers/references pointing to the same memory. In the case of the 'for' loop, we are calling end() each time round the loop, which the compiler translates into a read from memory. Why doesn't the compiler cache the result in a register? Well, it can't, not without violating the rules of C++, because it doesn't know that the end iterator won't change each time round the loop. That is, it doesn't know that the container isn't being an aliased by some code executing within the call to T::f(), so it's forced to re-evaluate the function - hence the reread from memory. It's entirely possible that, if it could see enough of the loop body to know everything it touches, it could cache the data that it knows won't change, but it can't do that in general.

The algorithm produces the better code because end() is only called once to produce the argument to the std::for_each function. Once the function executes, it only operates on those iterators, not the container.

It's simple enough to change the original for loop to avoid the problem:

for (auto i = v.begin(), e = v.end(); i != e; ++i)
    i->f();

You'll find that this now produces the same quality of code as the algorithm.

Is this kind of optimisation really significant, in the grant scheme of things? Well, probably not in most cases, but it depends on what the loop is doing, how your data is organised and what the rest of your program is doing. It's a prime candidate for false sharing to kick in, where the extra unnecessary read on a large multiprocessed algorithm could become a significant problem.

Also, as C++ programmers, we've already conditioned ourselves to use pre-increment on loop iteration instead of a post-increment because of the performance concerns of creating temporary iterators, so it's not much of a stretch to apply this optimisation too.

And all of this will be done for us when the range-based for loop syntax becomes widespread.

Tuesday, 17 January 2012

Nested lambda captures on Visual Studio

Are you using the 'new' lambda support in Visual Studio 2010? Of course you are, they're very useful. However, like some other parts of VC2010, they aren't without problems.

One of those problems is nested lambda captures. What I mean by that is capturing a variable which is itself captured. Imagine you have a matrix defined as follows:

typedef std::vector<int> row_t;
typedef std::vector<row_t> matrix_t;

If you wanted to, say, increment all elements of your matrix by a given value, you might do something like this:

void inc(matrix_t& m, int n)
{
    std::for_each(m.begin(), m.end(), [n](row_t& r) {
        std::for_each(r.begin(), r.end(), [n](int& e) {
            e += n;
        });
    });
}

Unfortunately, this gives the following error:

error C3480: '`anonymous-namespace'::<lambda0>::n': a lambda capture variable must be from an enclosing function scope

This appears to be a known problem. The workaround is to introduce a new variable inside your outer lambda and capture that instead:

void inc(matrix_t& m, int n)
{
    std::for_each(m.begin(), m.end(), [n](row_t& r) {
        int n2 = n;
        std::for_each(r.begin(), r.end(), [n2](int& e) {
            e += n2;
        });
    });
}

Happy lambdaing!

Saturday, 14 January 2012

Construction failure

--health;
employed = false;
yearsMarried = 5;

As mentioned in my previous articles about default construction, a common pattern that you see is the use of the default constructor in order to create a 'zombie' state, followed by a post-construction initialisation function which 'constructs' it properly. This is often to do with deficient frameworks which want to create your objects for you and, because they don't generally know the constructor parameter lists of classes they don't define, they rely on default construction to create the object and let the class author deal with proper initialisation at a later time.

However, it's not always the case. It's often the case that a post-construction initialisation step is provided in order to return an error code which specifies whether or not initialisation was successful. The argument is that constructors don't provide return values so how can you possibly know if it succeeded?

ErrorCode::type MyResource::Initialise()
{
    handle_ = AllocateResource();
    if (!handle_)
        return ErrorCode::allocation_failure;

    return ErrorCode::ok;
}

Of course, the real method of signalling errors from constructors is through the use of exceptions. However, quite often the penalty for reporting errors in this way is not one which the executing program can afford. I'm used to the games programming world where exceptions are typically disabled because the cost of enabling them is too high. RTTI (run-time type information) also needs to be enabled, which means that your program footprint is being filled with type metadata, exception tables and stack unwinding hooks, which can be considerably expensive in terms of memory consumption. Isn't it better to use this memory for something players will appreciate?

We still want our class invariants to hold. That is an important thing to maintain for the sake of code quality and reasoning about the program, so we want to avoid a second initialisation step if we can. We can't throw an exception from our constructor so what do we do?

One option is to pass out an error code out from the constructor via a reference:

MyResource::MyResource(ErrorCode::type& error)
{
    handle_ = AllocateResource();
    error = handle_ ? ErrorCode::ok : ErrorCode::allocation_failure;
}

This works, but it's of little use; you still have a 'zombie' state which the class needs to consider, even if it's as simple as having the class handle the case where the user decides to use your object after ignoring the error code. We've failed to maintain a strict invariant.

On the other hand, what does it mean for a class to fail initialisation during execution of a game, or any other realtime system? If you're driving around a city and a landmark pops into view which is unique to the area you've just driven into, and you can't allocate the resources for that landmark, how do you deal with that? You can't just pop up an error and hope that the player forgives you for the grievous interruption of their game. You can't just swallow the error and stop the landmark appearing - imagine if you were driving around San Francisco and you couldn't allocate the resources for the Golden Gate Bridge. There would just be a big hole in the world that you'd fall through.

Pre-allocation is where it's at. By pre-allocating all of your resources at the start of your program, you can do all of that error checking once, in one place, and then you can start dishing them out as the code requires them. Your class constructor simply requests one of these resources:

#define MAX_RESOURCES 256
std::vector<ResourceHandle> g_handles;

int main()
{
    for (int i = 0; i != MAX_RESOURCES; ++i)
    {
        ResourceHandle h = AllocateResource();
        if (!h)
            exit(1); // Your game is never going to run if we can't allocate these
        g_handles.push_back(h);
    }

    // Run game
}

MyResource::MyResource()
{
    // We're assuming we have handles available to us
    assert(!g_handles.empty());

    // Steal a handle from the global vector
    handle_ = g_handles.pop_back();
}

MyResource::~MyResource()
{
    // Put it back again when we're finished with it
    g_handles.push_back(handle_);
}

Now our constructor is guaranteed to run without failure. That's assuming that there are still some handles available, of course, but this will typically be the case where the resource limitations are known up-front and the game code is deliberately constructed in such a way as to not exceed those limitations. We're still taking advantage of RAII by ensuring that the resource is returned to the pool on destruction, and our invariant is strong because the constructor running to completion means that the object has acquired the resource.

Monday, 9 January 2012

Default constructors (4)

Even the exalted STL isn't immune to the allure of default constructors.

In my first post, I mentioned that having a singular state weakens your class invariant, and it does. On the other hand, we have the case where STL explicitly allows singular states in the form of iterators. STL-compatible iterators which are default constructible are assumed to have singular state until assigned to with a non-singular state, before which all operations except assignment and destruction are illegal (from n3242.pdf):

Iterators can also have singular values that are not associated with any sequence. [Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example] Results of most expressions are undefined for singular values; the only exceptions are destroying an iterator that holds a singular value, the assignment of a non-singular value to an iterator that holds a singular value, and, for iterators that satisfy the DefaultConstructible requirements, using a value-initialized iterator as the source of a copy or move operation.

It is defined this way, of course, because iterators are a generalisation of pointers, which C++ inherited from C, which already work this way. The responsibility for correct iterator usage is then pushed onto iterator users, though considerate iterator authors could provide feedback on incorrect usage as a QOI point.

Still, it does mean that authors have slightly less to worry about, which is good because iterators are hard enough to define correctly at the best of times. And there is a slight efficiency gain to be had where default construction+assignment should be no less efficient than direct assignment; a valid concern in something as fundamental as iteration.

All of this works because C++ programmers are already conditioned to think this way about pointers and there isn't great leap of faith needed to think of iterators in the same way. If you are able to convince your users to think the same way about the classes you provide then perhaps this is another option for you. However, it's a harder sell if you don't have precedent to fall back on.

There is another area of STL where I've seen classes incorrectly crippled with a default constructor, and that's because of std::vector. When faced with a need to reduce the size of a vector, people inevitably reach for resize, which is defined as:

void resize(size_type sz, T c = T());

The defaulted argument is only used when you increase the size of a vector, and specifies the value of the new elements. When you are shrinking the vector, you are having to construct a temporary for no good reason, and that temporary is default constructed if you don't explicitly provide it. You can get away with only creating the temporary when necessary by splitting resize into two overloads:

void resize(size_type sz);
void resize(size_type sz, T c);

Here, the first overload can be written to only default construct a temporary when sz is greater than the vector's current size. That temporary is then used to fill up the space.

However, if your contained class doesn't provide a default constructor, you still won't work. C++ is a statically-typed language and the code still needs to be able to call a default constructor, even if it's not called. Attempting to do so will cause a compile error.

The answer to all of this is not to add a default constructor to your class. The answer is to use erase instead:

// Shrink vec by n:
vec.resize(vec.size() - n);           // Don't do this
vec.erase (vec.end() - n, vec.end()); // Do this instead

Sunday, 8 January 2012

Default constructors (3)

Last time, I mentioned two options that were available for maintaining your non-default-constructible class without needing to weaken its invariant.

Let's say you have the following class, as provided by Stroustrup:

class File_handle {
public:
    File_handle(const char* n, const char* rw)
    {
        f = fopen(n,rw);
        if (f==0)
            throw Open_failure(n);
    }

    ~File_handle()
    {
        fclose(f);
    }

private:
    File_handle(const File_handle&);
    File_handle& operator=(const File_handle&);

    FILE* f;
};

I've reformatted it slightly, as well as making it non-copyable (pretty important if you're writing it for real!). The class is a perfect example of RAII in action. If the object exists, the file it manages is open. This is a strong invariant.

Now, let's say I wanted to use make a new class which encapsulates a file, but that class had to fit into a framework which requires default construction:

class DefaultConstructible : public FrameworkClass
{
public:
    DefaultConstructible()
    : file_(???)
    {
    }

private:
    DefaultConstructible(const DefaultConstructible&);
    DefaultConstructible& operator=(const DefaultConstructible&);

    File_handle file_;
};

We have no arguments to construct the file_ member with. You may think to pre-set some global or class-static variables in advance of creating the DefaultConstructible object, but that would be a concurrency disaster waiting to happen, as well as proving yourself to be morally bankrupt. So we make use of the first of the two suggestions to delay creation of our member until a point where we are able to provide the arguments. boost::optional:

class DefaultConstructible : public FrameworkClass
{
public:
    DefaultConstructible()
    // File_handle is 'unconstructed' here
    {
    }

    void Initialise(const char* n, const char* rw)
    {
        file_ = boost::in_place(n, rw); // Construct the file_ properly here
    }

private:
    DefaultConstructible(const DefaultConstructible&);
    DefaultConstructible& operator=(const DefaultConstructible&);

    boost::optional<File_handle> file_;
};

If our File_handle class was copy/move constructible, we could have just used direct assignment to initialise it:

file_ = File_handle(n, rw);

However, it isn't, so we utilise another Boost utility called boost::in_place in order to allow the optional to directly construct the File_handle, rather than needing to a copy of a temporary one.

At this point you may ask, why not just 'new' the File_handle when we need it? For example:

class DefaultConstructible : public FrameworkClass
{
public:
    DefaultConstructible()
    // File_handle is 'unconstructed' here
    {
    }

    void Initialise(const char* n, const char* rw)
    {
        file_.reset(new File_handle(n, rw)); // Construct the file_ properly here
    }

private:
    DefaultConstructible(const DefaultConstructible&);
    DefaultConstructible& operator=(const DefaultConstructible&);

    std::auto_ptr<File_handle> file_;
};

Well, that's an equally valid option too. Generally, I am of the opinion that heap allocation is to be avoided where necessary. There's no need to allocate here, so I don't.

However, you may already be doing almost exactly this because you are employing what I mentioned as my second option: the Pimpl idiom. In which case, you're already most of the way there. All you need to do is ensure that you handle the error where your object isn't yet created before you forward on your calls to the implementation:

////////////////////////////
// DefaultConstructible.h //
////////////////////////////
class File_Handle;

class DefaultConstructible : public FrameworkClass
{
public:
    DefaultConstructible();
    ~DefaultConstructible();

    void Initialise(const char* n, const char* rw);

    void Read(const void* p, size_t bytes);

private:
    DefaultConstructible(const DefaultConstructible&);
    DefaultConstructible& operator=(const DefaultConstructible&);

    std::auto_ptr<File_handle> file_; // Or std::unique_ptr if you're now on C++11
};

//////////////////////////////
// DefaultConstructible.cpp //
//////////////////////////////
#include "File_handle.h"

DefaultConstructible:: DefaultConstructible() {}
DefaultConstructible::~DefaultConstructible() {}

void DefaultConstructible::Initialise(const char* n, const char* rw)
{
    assert(!file_.get());

    file_.reset(new File_handle(n, rw)); // Construct the file_ properly here
}

void DefaultConstructible::Read(const void* p, size_t bytes)
{
    assert(file_.get());

    file_->Read(p, bytes); // Assuming File_handle has a Read function
}

I have used assert here, but you can use whatever you like: relying on std::auto_ptr to do the check for you, error codes, exceptions or something more cunning.

Friday, 6 January 2012

Default constructors (2)

As I was saying last post, the provision of a default constructor when a class doesn't naturally lend itself to being configurationless means that you'll end up having to support a weakened invariant which will both make your class more awkward to reason about and harder to maintain.

But sometimes it is necessary for a default constructor to be provided, as jogobom pointed out.

There are APIs out there which rely on being able to default construct objects. Qt has already been mentioned. COM, ATL and MFC are three others. I myself have written a framework in the past which required it, and then ended up having to invent workarounds for classes for which it would be prohibitive to provide one.

No matter how rude it is for those frameworks to require you to bend your classes to their will, and it is undoubtedly rude, the fact remains that your code is going to have to change before theirs does.

What are you able to do about it? Well, it's generally not a good idea to try and force a default state into a class that doesn't want it. What we want is to separate the 'real' class from the singular state. That way, the class can continue to be easily maintained and reasoned about, and the singularity can be solely responsible for ensuring it's not accidentally used.

There are a couple of easy ways of achieving that. First, you might consider using a boost::optional. If you're familiar with C# then it's closely related to a Nullable type. This way, you can defer construction of your 'real' object while the Optional ensure correct access to it.

Another option exists if you already employ the Pimpl idiom. This is the layer of indirection you need between construction of the client-facing object and the real implementation. You just need to defer allocation of the 'impl' and constrain access to it.

Examples coming soon.

Thursday, 5 January 2012

Default constructors

A great many classes I see have default constructors. A great many of those are implemented by zeroing out the class members for them to be 'initialised' properly at a later time.

Let's remind ourselves what constructors are for, from the inventor of C++ himself:

It is the job of every constructor to establish the class invariant, so that every member function can rely on it.

A default constructor is one that takes no arguments. That suggests that there is a state which can be arrived at without user configuration and which doesn't violate the class invariant. And for that to be the case, one of the following must be true:

You have a well-defined invariant which allows a naturally 'empty' state.
You have a weakened invariant which allows a singular state.

Containers are a good example of the first case. Their invariant is such that an empty container is valid and useful, and it can be arrived at easily without any configuration.

A class which zeroes its members in the constructor and requires a further initialisation step is an example of the second case. The majority of operations that may be done on this post-default-construction, pre-initialisation object are illegal, which makes this default state singular.

If this singular state is so useless, why not avoid it entirely by establishing useful states in all constructors and strengthening your class invariant? If a default constructor isn't capable of achieving this, don't provide one.

Wednesday, 4 January 2012

Down-to operator

Speaking of pseudo operators, I was amused recently when a colleague introduced me to the 'down-to' operator:

int n = 10;
while (n --> 0)
    printf("%d\n", n);

This time our pseudo operator is constructed from a post-decrement and a greater-than, causing the loop body to be invoked with values of n from 9 down to 0 inclusive.

Entertaining, but not something I'd advocate in real code.

Tuesday, 3 January 2012

Truth

A quick call out to the truthiness operator in C++, Ruby and Javascript (and probably other languages):

bool b = !!obj;

This will convert an object into its logically true or false value. For example, it will convert a pointer to true if it is non-null, or false if it is null. And it will convert a numerical object to true if it is non-zero, or false if it is zero. And similarly for nil/undefined in the aforementioned languages.

Of course, as anyone who knows about tokenisation in these languages will tell you, it's really just the ! operator applied twice. And the ! operator is really the 'falsiness' operator.

But it's a useful construct to learn, and it's the most succinct way of expressing what it does.

Monday, 2 January 2012

Equality

What does it mean for two objects to have the same state?

A naive answer could be that those objects have the same state if they have the same bit pattern in memory. While it's true that having the same bit pattern means objects have the same state, the converse isn't necessarily true. Think about two std::vector<int>s. They compare equal if they have the same contents, but those contents would exist in two different locations in memory, and so be referenced by different pointers inside each std::vector.

Another answer could be that the objects compare equal with ==. Again, that's true, but not every class defines an operator==, nor does it always make sense to provide one. What would it mean to compare two stream objects, for example?

A better way to think about state equality is in terms of substitutability. Wherever an object is constructed in a program, if it can be substituted with another object without changing the behaviour of the program then those objects have the same state.

Sunday, 1 January 2012

Parameterised error handlers

++year;

Last post I suggested using a default value in place of an exception to be returned when the function fails to return its proper result:

type from_ascii(const char* c, type resultOnFailure)
{
    for (int i = 0; i != sizeof(names)/sizeof(names[0]); ++i)
        if (!strcmp(c, names[i]))
            return (type)i;

    return resultOnFailure;
}

This is fine but its usage is limited to those occasions where you have an appropriate default in advance of calling the function. Also, sometimes the construction of the default value is non-trivial and isn't something you want to waste time doing (or worse, have the construction fail) if you don't even end up using it.

Enter parameterised error handling. The idea is to provide is a functor to be invoked when the return value cannot be generated:

type from_ascii(const char* c, std::function<type()> onError)
{
    for (int i = 0; i != sizeof(names)/sizeof(names[0]); ++i)
        if (!strcmp(c, names[i]))
            return (type)i;

    return onError();
}

Now we can do what we like to handle the error. We can throw an exception. We can return the default value we want. We can prompt the user for a value. We can read it from a file. We can log the error or inform the user before doing any of the above, especially if you forward the function arguments to the functor:

type from_ascii(const char* c, std::function<type(const char*)> onError)
{
    for (int i = 0; i != sizeof(names)/sizeof(names[0]); ++i)
        if (!strcmp(c, names[i]))
            return (type)i;

    return onError(c);
}

Calling this function is simply a matter of passing an appropriately-defined functor. C++11's lambda support makes this particularly easy:

auto e = Direction::from_ascii("banana", [](const char* c) {
    Log("Invalid Direction::type name: %s", c);
    return Up;
});

You can define your functor however you like, passing what you think is important, maybe an error code which tells you why the functor got invoked, or a flag which the functor can set to retry the operation.

Finally, if performance is a concern, you can always take the functor as a templated argument, then you can have your cake and eat it:

template <typename ErrorFunc>
type from_ascii(const char* c, ErrorFunc onError)
{
    for (int i = 0; i != sizeof(names)/sizeof(names[0]); ++i)
        if (!strcmp(c, names[i]))
            return (type)i;

    return onError(c);
}

This approach isn't appropriate for everything but it's another feather to your error handling cap.