Wednesday, December 15, 2010

"Inline" functions and you

There seems to be much confusion about inline functions in C++ and what, if anything, they offer. Many people believe that use of the 'inline' keyword and defining member functions inside class declarations actually cause the compiler to inline functions like macros so that a JMP call (and possibly more) can be avoided. Thus, in the name of efficiency people will define vast numbers of functions inside of headersand invent half-assed coding policies in order to gain the benefit of this optimization. Are they actually gaining anything or are they just causing themselves problems?

The first thing to realize about "inline" functions is that they don't have to be. The C++ standard clearly dictates that a compiler may completely ignore the request to inline a function. When a compiler does this, the only recourse it has is to create multiple definitions of that function, one for each translation unit (compiled source file) that it is used in. When this happens, not only has there been no speed gained through the inlining of the function, but there is now a size increase caused by there being more than one chunk of code that does exactly the same thing.

Of course, people who advocate use of inline functions for performance gain will point out that the linker will then come along and remove duplicates of the function if it was not actually inlined. Is this true? Maybe. The linker is of course free to do so if it is able to. The standard allows implementations to remove any chunk of code that's removal does not change the result of the program. Therefor unused functions and duplicates can be removed. It certainly does not HAVE to do this though.

There is also the point that program size is usually not as important as program speed. Well, this is often true and it is also sometimes false. Size vs. performance requires a cost/benefit analysis that cannot possibly dictate policy beyond any given project. A effort to produce a program on an embedded architecture, for example, will often have exactly the opposite requirement and optimization will be about making the program smaller and use less memory more than making it run faster.

So at this point we can safely say that putting your functions in headers as inline members or globals can increase performance (in which case it increases program space because inlined functions make the functions that use them larger) and may very well use up more space and not provide any performance benefit at all, depending on the implementation.

What about the other end though? It is an apparently wide held belief that not using inline functions makes it impossible for an implementation to perform an inline optimization. Even supposed experts make this very claim. This of course leads those worried about making their programs as fast as possible to spend an inordinate amount of time trying to make everything "inline" in order to gain this boost that may or may not ever occur. Worse, they spend all this effort long before ever actually testing the program to see if it even needs a performance boost at all or that the functions they are doing this with are bottlenecks and need to be inlined at all (something compilers are fairly good at deciding themselves these days).

But is it true that for a function to get inlined it must be in the header so that it is available to the compiler at the point where it is called? In other words, does a function definition have to be available within scope of the place it is called in order for an implementation to perform an inline optimization upon it? The answer to that question is nothing short but an astounding and certain maybe. You see, as with the case of code duplication caused by the inclusion of inlined functions that were not inlined being removed by the linker, today's linkers can also perform function inlining! That's right, an implementation is as free to inline functions at any point in the process as it is to remove duplicates or unused code.

What does your compiler do? Well, you'll just have to read the documentation to find out. The MS VC++ suite calls these linker optimizations "Whole Program Optimization" and you have to turn it on. In the GNU world it is called the standard link-time optimizer and is governed by the -flto switch.

When all is said and done, it is absolutely silly to dictate use of inline functions as policy. You're not necessarily gaining anything at all and can actually be causing bloat without purpose. What you should do, and it's really sad that so few people seem to understand this, is write your program however it seems makes it the most clear, maintainable, and understandable. THEN, if you notice that you're not getting the performance you want or need, you profile your program to find out WHERE the bottleneck is and fix it. This may or may not be answerable by using function inlining.

Too many people start with, "I want my program to be fast and so I must [INSERT ASSUMPTION HERE] in order to make that happen." They are, as in the case of using inline functions, almost certainly in error.

No comments:

Post a Comment