?

Log in

 

Optimizations - Advanced C++ Community

About Optimizations

Previous Entry Optimizations Nov. 11th, 2006 @ 09:42 am Next Entry
I came across this page of C++ optimizations recently (http://www.custard.org/~andrew/optimize.php). Some of them I consider to be fair, while others I consider to either have hidden pitfalls, or possibly even flat out wrong. Below is a list of my reactions. I would like to hear other people's opinions. Especially if I missed a pitfall or misinterpreted what the optimization was for.

Use Initialization Lists - I agree with this one. The author doesn't mention that to initialize a const object, you pretty much have to use initialization lists.

Optimize For Loops - this optimization will create an infinite loop if the variable is unsigned. Also, preferring ++i over i++ is most important if they are being applied to a class. When applied to a built in, they have the same efficiency if the return value is not used.

Use Int - I prefer to use types that specify their bit size. I've seen bugs pop up where due to the fact that int was not the expected bit size.

Make Local Functions Static - actually, they should be put into an anonymous namespace. The only time I use a static variable is for something like singletons.

Optimize If Statements - I consider this advice to be a true WTF. Can anyone give me both a general case and and a specific case example where following the authors advice would give a speed improvement?

Optimize Switch Statements - won't hurt.

Avoid Expensive Operations - this one is obvious to more experience programmers. However, it also makes me think about the adage of premature optimizations.

Initialize on Declaration - always a good idea,

Pass By Reference - fails to point out that the variable passed in may be changed by the function. And that this can be prevented by making the variable const. Or that built in types do not need to be passed in by reference unless they are being changed.

Delay Variable Declarations - I agree with this one completely.

'op=' - This one is true also. Though some (most?) compilers will optimize this for you.

Inline Small Functions - Possible code bloat should be a concern.

Use Nameless Classes - If I am constantly using nameless classes in my code, it would make me wonder if I should start refactoring the code. Otherwise, I do agree that a nameless class is better than a temp variable.
Leave a comment
[User Picture Icon]
From:0xd34df00d
Date:November 11th, 2006 03:22 pm (UTC)
(Link)
Use Int - there's a practice to do, for example, on a one machine:
typedef int myint32;
typedef short myint16;
typedef char myint8;
Then when we need a 32-bit type we just use myint32. When we port to the machine with different native type bit sizes, we use the other native types for typedefing or own types. This is used in, for example, Qt library. Sorry for my bad English, hope you understand me :)

And I use nameless classes very often and don't have any troubles even with the supporting of the code.

And about avoiding expensive operations - you're right, cause it's hard to understand and maintain expressins like
long int a = (b << 8 - c) >> 3;
and it's easy to make an error there.

Code bloat seems to be no problem unless you write for embedded devices.

And there is a problem with initialization lists: think about a member class which could throw an exception while initializing (for example, it could not acquire some recources). You have no way to catch it inside the constructor if you initialize it in init lists.
[User Picture Icon]
From:bsdcat
Date:November 11th, 2006 03:57 pm (UTC)
(Link)
And there is a problem with initialization lists: think about a member class which could throw an exception while initializing (for example, it could not acquire some recources). You have no way to catch it inside the constructor if you initialize it in init lists.

And that's a potential problem anyway, because the member class' default constructor is called behind the scenes if you don't include it in the initialization list. Thus we have a different rule about this situation - constructors should be exception safe.
[User Picture Icon]
From:iamo
Date:November 11th, 2006 06:05 pm (UTC)
(Link)
"And there is a problem with initialization lists: think about a member class which could throw an exception while initializing (for example, it could not acquire some recources). You have no way to catch it inside the constructor if you initialize it in init lists."

This is incorrect.

classname::classname() try : m_whatever(1)
{
} catch {
 /* caught here */
}


However, because the state of the enclosing object is uncertain at this point, there is no way to avoid a rethrow (it'll happen either way). But you can log it or translate it.
[User Picture Icon]
From:spasmaton
Date:November 11th, 2006 06:15 pm (UTC)
(Link)
> Code bloat seems to be no problem unless you write for embedded devices.

Or a games console. And it's always an issue, because memory access is often an bottleneck even on systems that don't have large caches.

On a similar note, whilst int is your casual default it's not always suitable or faster. If you're reading across a list of thousands of items it's faster to use smaller types if they give an acceptable range, because of the memory/cache overhead. And size_t should be used as an index, because it's a)guaranteed to be your machine's limit ('int' is 32bits on some 64bit platforms as it's generally more useful) and b)signed, so it rightfully blows up horribly when abused.

On a similar similar note, the second point is utterly wrong. Memory and cache architectures have been optimised to read forwards since the days of the 486 and fast-page RAM, reading backwards will still load the start of the cache line first so your backwards-access will be waiting longer every time it needs a new line.

And that second point is where I stopped taking anything on this page seriously, all I can say is it's not a very good guide to optimising in C++ and generally disuade people from using it as a reference.
From:the_mart
Date:November 11th, 2006 03:29 pm (UTC)
(Link)

Isn’t it better to let the compiler decide whether to inline functions? Also, can’t functions only be inlined if they are defined in the same file that they are used in?

To me, the if statement optimisation looks rather stupid. If anything, the intention of the code becomes more difficult to understand.

Always using ints seems a little silly. If you just need an integer number, then fine, but if you’re writing the bytes to a file for another program to read, then your data could end up being read completely wrong.

[User Picture Icon]
From:bsdcat
Date:November 11th, 2006 03:48 pm (UTC)
(Link)
Use Initialization Lists It's also worth pointing out that sometimes the default constructor is used, not just the copy constructor. :-)

Optimize for Loops No, it won't create an infinite loop, and I think preferring the prefix increment operator is a good rule of thumb. Also, making full use of C++ usually means replacing C-style loops with iterators, at which point this is a wash.

Use 'int' For just about anything involving the network, knowing your bitsize is important. I'd say that you have to think about each datatype you use, and if you don't have any other issues to consider, then int may be a good idea.

Make Local Functions Static I have mixed feelings about this one. But I tend to write code that doesn't have local functions, but instead has private instance and class methods, so...

Optimize If Statements That is such a huge hit to clarity that I don't think I would ever apply it.

Optimize Switch Statements Consider something else in place of a switch statement, anywhere this optimization actally improves performance. For instance, a lookup table. Going through a switch is O(n), which is why you should put common cases first. Depending on the properties of the value set being used in the switch, a lookup table should be O(1) or O(log n).

Avoid Expensive Operations Do this in performance-critical places. Otherwise, ignore it.

Initialize on Declaration Yep.

Pass by Reference Passing built-in types by reference is slower than passing them in by value, because references are simply pointers (at the lowest level) and making use of them requires dereferencing the pointer every time they're accessed. So pass by reference (paying attention to const) for classes, and prefer pass by value for built-in types.

Delay Variable Declarations Yep.

Use 'op=' I don't think compilers will optimize this with nonstandard types, because e.g. operator+ has no requirement to behave like operator+=. If both operators are implemented, then the compiler is bound to use the one you invoke, and not the other one. Therefore, prefer op= because it should be more efficient.

Inline Small Functions Here's Apple's take on this: http://developer.apple.com/releasenotes/DeveloperTools/GCC3.html (scroll down to the section on Optimization) In short - no.

Use Nameless Classes I agree with you.

In general, think about the implications - particularly in terms of constructors and memory allocation - of your default idioms, and then consider changing your habits. Everywhere, think about the efficiency of the algorithm you're using, and whether it's good enough or you expect O(n2) to bite you in the ass.
(Deleted comment)
[User Picture Icon]
From:notquitezeus
Date:November 11th, 2006 04:25 pm (UTC)
(Link)
these "optimizations" fall into three classes: writing good c++ (initializer lists, using constructors or at least giving the compiler the opportunity to elide constructors, delaying variable declarations, initializing on declaration), letting the compiler and optimizer do their jobs (inlining, reorganizing switch & if statements, hoisting and optimizing loops), and highly suspicious (everything else).

"use nameless classes" is, well, impossible because you can use explicit temporary class instances which weren't assigned a name, but you don't have nameless classes in c++. the fact that dude is abusing terminology this badly should be a clue as to how much you can trust the rest of what he has to say.

use int: no, do not use int. use the appropriate size int for the task at hand and treat unsigned ints with deep suspicion, since it is exceedingly easy to screw up arithmetic with them.

op=: assumes that op= and op are implemented sanely and do the same thing. in a perfect world, they should be but there's nothing in the standard that requires that.

in short: this is mostly trivial or blowing smoke. there's no discussion of favoring arrays of structures over structures of arrays (improves cache locality), how and why to avoid ragged arrays (bad for caching, requires multiple dereference operations to index a single item — the trick is to have a flat array and do the index calculation yourself), avoiding multiple reallocations where possible, etc.
[User Picture Icon]
From:pphaneuf
Date:November 11th, 2006 05:00 pm (UTC)
(Link)
I mostly agree with this comment, particularly about the "nameless classes" and the conclusion.

Using int "when possible" is okay, though. It is supposed to be the processor's natural word size.

Using "unsigned" is actually very useful to eliminate whole classes of bugs. Instead of having to check at runtime for invalid negative values, you can tell the compiler right off that you don't want negative values and have it caught at compile time. But under/over-runs are a problem you have to watch for at all time.
[User Picture Icon]
From:spasmaton
Date:November 11th, 2006 07:25 pm (UTC)
(Link)
but you don't have nameless classes in c++. the fact that dude is abusing terminology this badly should be a clue as to how much you can trust the rest of what he has to say.

Indeed. For readers at home - this is called an "l-value", not a "nameless class" which is something completely different in language concepts and like you say not featured in C++.

It's also not an optimisation, as the value still has to be generated on the stack but doesn't have a name, which makes it a) messier to read and unclear what your value means and b) harder to trace when the code crashes/asserts/throws as there's no symbol for it.

Agreed with your summary, this article isn't very good at all.
[User Picture Icon]
From:pphaneuf
Date:November 11th, 2006 04:42 pm (UTC)
(Link)
For the "for" loops, doing --i (or ++i) repeatedly and comparing with zero as the test will never do an infinite loop. It might not do the correct thing, but it'll wrap around and hit zero someday.

In any case, testing against a constant (like zero) rather than another variable might be one less load, and I wouldn't be surprised if there was a specific opcode on many platforms to test zeroness or non-zeroness, being such a common operation (used everywhere for boolean logic, and wouldn't need storing the value to compare with in the code, yielding smaller code, which is faster, if only for being easier on the icache). But you'd need to be in deep trouble at the point where you'd be wanting to do this. And the example wasn't so good, because of the test not being the simple "i != 0", but "i >= 0".

The correct way to do this "optimization" is like this (you have to use "i - 1" in the code, though, which might make it worse):

for (i = n; i > 0; --i)


Using prefix rather than postfix ++ or -- is a good habit to have, these days. This way, you just use the prefix one all the time, except when you specifically need the postfix semantics. That much, I consider worthwhile, because some copy constructors are quite expensive, and particularly in templates where you just never know.

Making local function static is an excellent idea. Note that he said "functions", not "methods" or "member variable". This prevents external compilation units from using the function or variable, and can allows some more agressive optimizations. For example, if it inlines every call, it can avoid generating the out-of-line version altogether, making the code smaller. It could keep a static variable in a register when calling a static function. And so on. This is why people make a fuss about "whole-program optimization", because they wish for those optimizations that need to ensure the optimizer "knows all".

The "if" statement optimization is really iffy, though. It'd have to be pretty damned quick to undo. An "else" puts an extra jump, yes, and jumps disrupt processors pipelines, which is rather expensive on some processors (ever wondered why Pentium 4 was so slow, clock cycle for clock cycle?).

For the "switch", the order is pretty meaningless, except for the first and last, which both will need one less jump.

All of this is bit twiddling, though
[User Picture Icon]
From:bsdcat
Date:November 11th, 2006 05:12 pm (UTC)
(Link)
I believe you're wrong regarding "switch": recall that case statements cascade and that the cascade has to be explicitly stopped with a break statement. Thus, the compiler has to iterate through all of the cases, so the more cases there are the more tests there are to perform on average.

As a first pass, it is better to put more common cases at the front of the list if there are a lot of them, because they will require fewer tests before they're executed. However, a look up table or function dispatch will get you there must quicker.
[User Picture Icon]
From:iamo
Date:November 11th, 2006 06:17 pm (UTC)
(Link)
Switch is usually implemented as a jump table (cmp x, 5; je case_if_5; cmp x, 6; je case_if_6;). The entire code block of the case statements is itself linear, and a break is just a jump to the end of the switch structure, so it only has to go through as many comparisons as it needs to to find the correct case statement, and then it jumps to the right place in the code block, and only jumps out when it hits a break.

So not all tests have to be done. Only those until the first one that matches.
[User Picture Icon]
From:iamo
Date:November 11th, 2006 06:18 pm (UTC)
(Link)
Mind you, a switch can also be implemented by a mini-hash or tree structure, so it can be very fast (only needs to do O(1) or O(log n) tests. I would expect some implementations do this for very large ones.
[User Picture Icon]
From:iamo
Date:November 11th, 2006 06:21 pm (UTC)
(Link)
Agreed that it's bit twiddling, but the switch one is more or less true for a jump table switch (which is pretty normal). You'll have fewer comparisons if the case that's caught is near the top, and the first incurs just as much as the last.

But it doesn't keep you from the real performance killer these days, branching. I imagine a switch structure completely thrashes the code cache in modern CPUs because of there being so many jumps in a row. And it doesn't matter which one you actually use at that point.
[User Picture Icon]
From:tyrdinjer
Date:November 12th, 2006 02:34 am (UTC)
(Link)
The correct way to do this "optimization" is like this (you have to use "i - 1" in the code, though, which might make it worse):

for (i = n; i > 0; --i)


The original article didn't mention this fix. Thus, the way it was written would create an infinite loop. Also, I think using "i - 1" would cause a lot of people looking over the code to stop and wonder if that was a bug, or an attempt to fix an off by one error.

I believe the purpose of this optimization comes from the fact that, by comparing against zero, the cpu does not need to load / store a target number for the index variable to be compared against. I.E., let's save a couple of assembly instructions. If the author is worried about those miniscule instructions, then I'm betting he would be horrified to see "i - 1" in the code. ;-)

I think the only way to write it safely, and avoid "i - 1" would be like this:

i = item_count();
if (i > 0)
{
// handle all cases but zero
for (--i; i > 0; --i)
// do stuff
// do stuff at [0]
}


At that point, simply counting up is a lot easier to read / debug. And the above code just looks like something that needs to be fixed.
[User Picture Icon]
From:zombywuf
Date:November 11th, 2006 06:33 pm (UTC)
(Link)
On using int, if you're zipping through a large array use the smallest type possible, cache loads will eat your cycles otherwise. If you're accessing the same bit of memory sevel times in rapid succession using ints (which will hopefully be aligned) should see a gain.

On inlining functions, leave it to the compiler. Give it as much chance to inline as possible, but don't force it. Use profiling mode if your compiler supports it to let it make better choices.

The if statement thing is probably best left to the CPU's branch predictor and your compiler.

The for loop optimisation will squeeze a few cycles out of your loop if you're not accessing memory sequentially. Otherwise precaching (if it's being done) will be wasted.
[User Picture Icon]
From:spasmaton
Date:November 11th, 2006 07:29 pm (UTC)
(Link)
I fully endorse this product and/or event.
[User Picture Icon]
From:notquitezeus
Date:November 11th, 2006 08:11 pm (UTC)
(Link)
On using int, if you're zipping through a large array use the smallest type possible, cache loads will eat your cycles otherwise. If you're accessing the same bit of memory sevel times in rapid succession using ints (which will hopefully be aligned) should see a gain.

false. read patterson & hennessy and you'll understand that there is absolutely no reason for your loop indices to live outside registers or the stack in almost all cases. on those rare occasions that your loops are heavy enough to force loop variables to actually be allocated on the stack, you need to remember that the top of the stack stays in cache anyways, so you would have to have some pretty hefty calls to push things out from the cache and back into main ram. secondly, since you keep pages of ram in cache, the size of the loop indices is basically meaningless unless you're split over multiple pages. in fact, you can make the argument that doing arithmetic on things that are smaller than word sized is less efficient than using their full sized counterparts because you first have to unpack the relevant byte before you can use it, do your arithmetic, then repack the word, even if it's the assembler that ends up hiding all that detail from you with macros.
[User Picture Icon]
From:fastb
Date:November 11th, 2006 07:49 pm (UTC)
(Link)
I was only somewhat skeptical about this article until I got to the bit on "optimizing" if statements. I can't believe how misguided that tip was. What the author completely ignored was the scenario where the bar() function (which you're trying to undo) has some sort of non-local side effect, for example, modifying a file on disk. How exactly do you undo that?
[User Picture Icon]
From:spasmaton
Date:November 12th, 2006 03:20 am (UTC)
(Link)
'sup

This page is rubbish, it lost me by point 2 but I've reread the if statements 'optimisation' a few times and I still don't get where it's coming from. I endorse what the OP said, really.
[User Picture Icon]
From:omnifarious
Date:November 11th, 2006 08:08 pm (UTC)

My thoughts on some of these.

(Link)
  • Optimize For Loops - Using ++i preferentially is a good habit to get into in general. It's one of those things where it rarely matters if you use i++ or ++i from a semantic standpoint, and when it matters for performance it can be a big deal and hard to spot. So if you just habitually use one unless you really need the other, you don't have to worry that you did the wrong thing someplace and created a performance problem by accident.

    As for counting down... I would like to see some analysis on several processors that shows that actually makes a difference.

  • Use 'int' - I use bit-size specific types when I care, and int when I don't. Though I do generally expect 'int' to be at least 32 bits... I think <cstdint> may solve this problem by providing a 'most efficient integer of at least this size' type.
  • Make Local Functions Static - I've noticed that compilers do a poorer job of optimizing anonymous namespaces. This is pretty stupid IMHO, and I think they should be preferred regardless and compiler makers should make them as good as statics.
  • Optimize If Statements - This piece of advice is really stupid. Most of the time the compiler can do this for you, and doing it makes your code a lot more obscure.
  • Optimize Switch Statements - If you use profile driven optimization, this is of dubious value. The compiler can do this for you, and so it's better to pay more attention to what the clearest way is, not the fastest.
  • 'op=' - Compilers will only optimize this for non-user-defined types. So it's a good habit to get into so you don't have to start remembering whether a type is user-defined or not. Same basic principle as using ++i preferentially.
  • Use Nameless Classes - This is misnamed, and should be called "Use nameless objects" which are also often known as 'temporaries'. Most C++ code already uses temporaries that people are completely unaware of.

My overall impression is that the linked-to piece has a perscriptive recipe quality to it that's rather nieve. It discusses one aspect of the proposed coding style and doesn't talk about pros or cons, nor does it show a deep understanding of the language.

[User Picture Icon]
From:notquitezeus
Date:November 11th, 2006 08:16 pm (UTC)

Re: My thoughts on some of these.

(Link)
again, you're working on the assumption that the ++a and a++ operators do the same thing and there is no requirement or guarantee that that will ever happen

this entire discussion over a++ v. ++a is even more ridiculous since the optimizer does dead code elimination, meaning that returns that are never used won't be done in the first place.
From:legolas
Date:November 12th, 2006 09:14 pm (UTC)
(Link)
Make Local Functions Static - actually, they should be put into an anonymous namespace. The only time I use a static variable is for something like singletons.

What's the difference between the 2? AFAIK, both have the same effect: the function will only be usable in that compilation unit?
[User Picture Icon]
From:spasmaton
Date:November 13th, 2006 12:01 am (UTC)
(Link)
Couple of minor differences:

* You can have class & type definitions in the anonymous namespace, that are guaranteed never to be linked.
* Symbols in the anonymous namespace are guaranteed to take precidence over the same ones that happen to be defined elsewhere. Pretty unusual for this to be an issue though.

In spirit they're the same thing, the anonymous namespace is the 'modern' way of doing it.
(Leave a comment)
Top of Page Powered by LiveJournal.com