Wednesday, December 15, 2010

"Inline" functions and you

There seems to be much confusion about inline functions in C++ and what, if anything, they offer. Many people believe that use of the 'inline' keyword and defining member functions inside class declarations actually cause the compiler to inline functions like macros so that a JMP call (and possibly more) can be avoided. Thus, in the name of efficiency people will define vast numbers of functions inside of headersand invent half-assed coding policies in order to gain the benefit of this optimization. Are they actually gaining anything or are they just causing themselves problems?

The first thing to realize about "inline" functions is that they don't have to be. The C++ standard clearly dictates that a compiler may completely ignore the request to inline a function. When a compiler does this, the only recourse it has is to create multiple definitions of that function, one for each translation unit (compiled source file) that it is used in. When this happens, not only has there been no speed gained through the inlining of the function, but there is now a size increase caused by there being more than one chunk of code that does exactly the same thing.

Of course, people who advocate use of inline functions for performance gain will point out that the linker will then come along and remove duplicates of the function if it was not actually inlined. Is this true? Maybe. The linker is of course free to do so if it is able to. The standard allows implementations to remove any chunk of code that's removal does not change the result of the program. Therefor unused functions and duplicates can be removed. It certainly does not HAVE to do this though.

There is also the point that program size is usually not as important as program speed. Well, this is often true and it is also sometimes false. Size vs. performance requires a cost/benefit analysis that cannot possibly dictate policy beyond any given project. A effort to produce a program on an embedded architecture, for example, will often have exactly the opposite requirement and optimization will be about making the program smaller and use less memory more than making it run faster.

So at this point we can safely say that putting your functions in headers as inline members or globals can increase performance (in which case it increases program space because inlined functions make the functions that use them larger) and may very well use up more space and not provide any performance benefit at all, depending on the implementation.

What about the other end though? It is an apparently wide held belief that not using inline functions makes it impossible for an implementation to perform an inline optimization. Even supposed experts make this very claim. This of course leads those worried about making their programs as fast as possible to spend an inordinate amount of time trying to make everything "inline" in order to gain this boost that may or may not ever occur. Worse, they spend all this effort long before ever actually testing the program to see if it even needs a performance boost at all or that the functions they are doing this with are bottlenecks and need to be inlined at all (something compilers are fairly good at deciding themselves these days).

But is it true that for a function to get inlined it must be in the header so that it is available to the compiler at the point where it is called? In other words, does a function definition have to be available within scope of the place it is called in order for an implementation to perform an inline optimization upon it? The answer to that question is nothing short but an astounding and certain maybe. You see, as with the case of code duplication caused by the inclusion of inlined functions that were not inlined being removed by the linker, today's linkers can also perform function inlining! That's right, an implementation is as free to inline functions at any point in the process as it is to remove duplicates or unused code.

What does your compiler do? Well, you'll just have to read the documentation to find out. The MS VC++ suite calls these linker optimizations "Whole Program Optimization" and you have to turn it on. In the GNU world it is called the standard link-time optimizer and is governed by the -flto switch.

When all is said and done, it is absolutely silly to dictate use of inline functions as policy. You're not necessarily gaining anything at all and can actually be causing bloat without purpose. What you should do, and it's really sad that so few people seem to understand this, is write your program however it seems makes it the most clear, maintainable, and understandable. THEN, if you notice that you're not getting the performance you want or need, you profile your program to find out WHERE the bottleneck is and fix it. This may or may not be answerable by using function inlining.

Too many people start with, "I want my program to be fast and so I must [INSERT ASSUMPTION HERE] in order to make that happen." They are, as in the case of using inline functions, almost certainly in error.

Friday, December 10, 2010

A pet peeve

Quite often on various help sites I frequent I'll run into someone confused about pointers that wonders something like, "I've got two pointers that point at the same thing, do I delete both? One? Which?" etc... I've yet to see anyone but myself correct their use of the terms and I think that's because people aren't really paying attention to exactly what is going on and why the common phrase, "...delete a pointer...," confuses the issue. Experts do know what's going on, but still use the terms incorrectly. I think it's time that we stopped and I bet that by correcting use and correcting those using them wrong, we could improve the understanding of noobs.

The first and foremost thing that all noobs fail to get is that pointers are values. A 'pointer to T' is not anything magical but for the green developer they always seem to be. The reason being is that we don't hammer into them that 'pointer to T' is nothing but another value type. In fact, the only reason that 'T' is involved at all is so that the compiler knows how far to increment it when we do math on the value. Besides that, 'pointer to T' is ENTIRELY unrelated to T or any instance of T. The 'pointer to T' type is simply another value type that holds a number and allows a few additional operators to manipulate it: *, ->, and delete.

The second thing that noobs seem to misunderstand is that they think 'pointer to T' somehow holds a T; this couldn't be further from the truth. The T that is 'pointed to' by a 'pointer to T' is exactly like any other T that you might create. It looks the same, acts the same, it IS the same. There doesn't even have to be any difference in location because 'pointer to T' can still 'point at' a T that was simply created as a value on the 'stack'. Another way of showing it is to point out that you can use new (or malloc) to create a T on the free-store (or heap) and not have any pointer anywhere: 'new T;'. That T exists just like it would with a pointer and is exactly the same shape as any other T created anywhere else. All that the value type 'pointer to T' does is tell you how to find it.

The next thing that noobs misunderstand, and this is the one I think experts should work harder to correct even in themselves, is that you do not delete (or free) pointers; you delete what they point at. Since a 'pointer to T' is a value type just like T is, it behaves just like T does. You don't delete them unless you created them with new. The only difference between 'pointer to T' and T itself (assuming T is not also a pointer type) is that it can be an operand of the delete operator. Passing a pointer into the delete function, by using the delete operator, tells that function to destroy the object pointed to by that pointer; you are NOT deleting the pointer. If that object is not on the free-store, or has been deleted already, then passing that pointer, a value that still exists and has the same value it always had, into the delete function results in very bad things. If you want to delete a pointer then you need to have a pointer to it and use THAT pointer as the delete operand; you will not be deleting the 'pointer to pointer to T' and hopefully the 'pointer to T' that 'pointer to pointer to T' points at was created on the free-store and still exists. In short, you NEVER delete value types that are not created on the free-store with the new operator and 'pointer to T' is not different in this regard.

So, the next time some noob asks something like:

I created an object with new, assigned it to ptr1 and then copied ptr1 into ptr2. Do I need to delete both pointers? Just one? Which one should I delete? What happens to the other one?...etc...


Please, when someone asks this, correct their terminology and your own. Explain to them that...

Neither. You don't delete the pointers, you want to delete what they point at. In order to delete an object you need to pass a value into the delete operator that contains a number representing the location of that object, a pointer. Since both of your pointers have the very same value, you can use either to delete the object they point to. Nothing happens to the pointers after that, and this is very important because they still contain a number representing a location that is no longer being occupied. You need to explicitly set those values to 0 so that you can tell again later or you need to destroy those pointers by allowing them to go out of scope.


Of course some people think that you shouldn't set a pointer to 0 because doing so hides "dual delete errors". Those people are retards though and are trying to debug through undefined behavior, which is just plain stupid.

At any rate, the point is to make sure that people understand that a pointer is simply another type of value. You do not delete it unless you've newed it and nothing is done to it by passing it as an operand to delete.

Wednesday, February 17, 2010

Using mpl::for_each to fill a tuple

This article is actually a paste of something I wrote to comp.lang.c++ quite a while ago.


A while back I posted a question either here or to the boost user list about how to iterate through a vector of strings and perform a lexical cast on them into elements of a tuple. I was working with sqlite3 and found myself repeatedly writing code like so:

t.get<0>() = boost::lexical_cast<double>(vect[0]);
t.get<1>() = boost::lexical_cast<int>(vect[1]);
...

For each query I would invent. Certainly there seemed there should be a way to use metaprogramming to approach this problem. I could get here:

t.get<N>() = boost::lexical_cast< boost::tuples::element<N, TUP>::type >(vect[N]);

However, I couldn't figure how to insert that into a metaprogram. I figured it would have something to do with for_each but couldn't figure it out. When I approached whatever list I posted my question to I was sent to some library...I forget which one but it wasn't the answer (blog note: it was the phoenix library). Well, last night a light in my brain turned on. It's actually very simple once it dawns on one how to do it:

template < typename TUPLE >
struct tuple_assign
{
TUPLE & t;
std::vector< std::string > const& data;

tuple_assign(TUPLE & to, std::vector const& from) : t(to), data(from) {}

template < typename T>
void operator() (T) // T must be an mpl::int_
{
boost::tuples::get<T::value>(t) =
boost::lexical_cast< typename boost::tuples::element<T::value, TUPLE>::type >(data[T::value]);
}
};

Your calling code looks like so:

boost::tuple<double, int, std::string> t;
std::vector<std::string> d;
d += "5.2","42","HELLO!";

boost::mpl::for_each< boost::mpl::range<0,3> >(tuple_assign< boost::tuple<double,int,std::string> >(t,d));

The range can also be derived through the template system like so:

boost::mpl::range< 0, boost::tuples::length<boost::tuple<double,int,std::string> > >

Much safety can be placed on this system. I haven't done so here. This problem solved though, there's nothing stopping a generic query interface that could be used something like so:

tie(x, y, z) = query.run();

as well as an iterative interface that provides a similar tuple interface.

blog note: you'd need to override the = operator between tuples and some type returned by run() in order to get the necessary information to perform the above technique.

Sunday, February 7, 2010

Implementing std::tr1::function - pt 1

This is the first of what will be a series of articles about implementing a general purpose function wrapper that can contain any callable entity that responds to a particular signature. These articles will start with a much simpler problem and slowly add complexity until a full implementation has been created. Some insight will be taken from boost::function, the singly most commonly used implementation.

We'll start with something simple, working with a single, basic signature: void (int).


// abstract invoker
struct impl
{
virtual ~impl() {}
virtual void invoke(int) = 0;
virtual impl * clone() const = 0;
};

// invoker for function pointers.
struct fun_ptr_impl : impl
{
typedef void (*fun_ptr_t)(int);

fun_ptr_impl(fun_ptr_t f) : fun(f) {}
void invoke( int i ) { fun(i); }
impl * clone() const { return new fun_ptr_impl(*this); }

private:
fun_ptr_t fun;
};

// invoker for functors.
template < typename T >
struct object_impl : impl
{
object_impl(T const& t) : object(t) {}

void invoke( int i ) { return object(i); }

impl * clone() const { return new object_impl(object); }

private:
T object;
};

#include <boost/scoped_ptr.hpp>
#include <cassert>
// The function wrapper itself.
struct inheritance_function
{
typedef void (*fun_ptr_t)(int);

// construct with function pointer.
inheritance_function(fun_ptr_t f) : pimpl(new fun_ptr_impl(f))
{}
// construct with functor
template < typename Object >
inheritance_function(Object const& o) : pimpl(new object_impl<Object>(o))
{}
// copy
inheritance_function(inheritance_function const& ifun) : pimpl(ifun.pimpl->clone())
{}

// invoke the stored function.
void operator()(int i) const { return pimpl->invoke(i); }

// assign from inheritance_function
inheritance_function & operator = (inheritance_function const& ifun)
{
pimpl.reset(ifun.pimpl->clone());
return *this;
}
// assign from function pointer
inheritance_function & operator = (fun_ptr_t f)
{
pimpl.reset(new fun_ptr_impl(f));
return *this;
}
// assign from functor
template < typename Object >
inheritance_function & operator = (Object ob)
{
pimpl.reset(new object_impl<Object>(ob));
return *this;
}

private:
boost::scoped_ptr<impl> pimpl;
};

// some test code
#include <iostream>
void function(int x) { std::cout << "function(" << x << ")" << std::endl; }

struct object
{
void operator() (int x) { std::cout << "object::operator()(" << x << ")" << std::endl; }
};

int main()
{
inheritance_function f1 = function;
inheritance_function f2 = object();

f1(5); f2(6);

f2 = function;
f1 = object();

f1(5); f2(6);

f2 = f1;

f1(5); f2(6);
}


Pretty basic so far. There's much that is missing though and I'll continue adding as I go forward. All this code does is replace an internal invoker class based upon the assignment or construction argument passed in. If we get a function pointer then we create an invoker able to handle function pointers. Anything else we assume is a functor object. Granted, this could be in error but then the user of the class is attempting to use it inappropriately.

Now, keep in mind that this does not resemble boost::function at all. This is, however, the most straight forward method of implementing this behavior. If you read the boost::function website you'll find them explain why they do it differently:

The use of virtual functions tends to cause 'code bloat' on many compilers. When a class contains a virtual function, it is necessary to emit an additional function that classifies the type of the object. It has been our experience that these auxiliary functions increase the size of the executable significantly when many boost::function objects are used.

The next article will show how to make the above implementation take any argument and return any type. A new impl will need to be created to account for member function pointers. The article following that will explain how boost::function implements this functionality without using virtual functions.

Saturday, February 6, 2010

iterating a boost::tuple

The boost::tuple class is a subclass of boost::tuples::cons, which is
a list type structure that can be recursed in a manner similar to LISP
lists. The following code is an example of how to accomplish this.


#include <boost/tuple/tuple.hpp>
#include <iostream>

template < typename X >
void print_data(X const& x)
{
std::cout << " Type of elem: " << typeid(X).name() << std::endl;
std::cout << "Value of elem: " << x << std::endl;
}

template < typename H, typename T >
struct do_recurse
{
static void apply(boost::tuples::cons<H,T> const& c)
{
typedef boost::tuples::cons<H,T> current_cons;
typedef typename current_cons::tail_type tail_cons;
typedef typename tail_cons::head_type tail_head;
typedef typename tail_cons::tail_type tail_tail;

print_data(c.get_head());
do_recurse<tail_head, tail_tail>::apply(c.get_tail());
}
};

template < typename H >
struct do_recurse<H, boost::tuples::null_type>
{
static void apply(boost::tuples::cons<H,boost::tuples::null_type> const& c)
{
std::cout << " Type of last elem: " << typeid(H).name() << std::endl;
std::cout << "Value of last elem: " << c.get_head() << std::endl;
}
};

template < typename H, typename T >
void recurse(boost::tuples::cons<H,T> const& c)
{
do_recurse<H,T>::apply(c);
}


int main(int argc, char* argv[])
{
using namespace boost::tuples;
tuple<int, char, double> t(42,'A',3.14);

recurse(t);

std::cin.get();
return 0;
}

Wrapping the function in a struct is necessary because functions
cannot be partially specialized.