Nov. 11th, 2006 @ 09:42 am
I came across this page of C++ optimizations recently (http://www.custard.org/~andrew/optimize.php
). Some of them I consider to be fair, while others I consider to either have hidden pitfalls, or possibly even flat out wrong. Below is a list of my reactions. I would like to hear other people's opinions. Especially if I missed a pitfall or misinterpreted what the optimization was for.Use Initialization Lists
- I agree with this one. The author doesn't mention that to initialize a const object, you pretty much have to use initialization lists.Optimize For Loops
- this optimization will create an infinite loop if the variable is unsigned. Also, preferring ++i over i++ is most important if they are being applied to a class. When applied to a built in, they have the same efficiency if the return value is not used.Use Int
- I prefer to use types that specify their bit size. I've seen bugs pop up where due to the fact that int was not the expected bit size.Make Local Functions Static
- actually, they should be put into an anonymous namespace. The only time I use a static variable is for something like singletons.Optimize If Statements
- I consider this advice to be a true WTF. Can anyone give me both a general case and and a specific case example where following the authors advice would give a speed improvement?Optimize Switch Statements
- won't hurt.Avoid Expensive Operations
- this one is obvious to more experience programmers. However, it also makes me think about the adage of premature optimizations.Initialize on Declaration
- always a good idea,Pass By Reference
- fails to point out that the variable passed in may be changed by the function. And that this can be prevented by making the variable const. Or that built in types do not need to be passed in by reference unless they are being changed.Delay Variable Declarations
- I agree with this one completely. 'op='
- This one is true also. Though some (most?) compilers will optimize this for you.Inline Small Functions
- Possible code bloat should be a concern.Use Nameless Classes
- If I am constantly using nameless classes in my code, it would make me wonder if I should start refactoring the code. Otherwise, I do agree that a nameless class is better than a temp variable.
Use Int - there's a practice to do, for example, on a one machine:
typedef int myint32;
typedef short myint16;
typedef char myint8;
Then when we need a 32-bit type we just use myint32. When we port to the machine with different native type bit sizes, we use the other native types for typedefing or own types. This is used in, for example, Qt library. Sorry for my bad English, hope you understand me :)
And I use nameless classes very often and don't have any troubles even with the supporting of the code.
And about avoiding expensive operations - you're right, cause it's hard to understand and maintain expressins like
long int a = (b << 8 - c) >> 3;
and it's easy to make an error there.
Code bloat seems to be no problem unless you write for embedded devices.
And there is a problem with initialization lists: think about a member class which could throw an exception while initializing (for example, it could not acquire some recources). You have no way to catch it inside the constructor if you initialize it in init lists.
|Date:||November 11th, 2006 03:57 pm (UTC)|| |
|(Link)|And there is a problem with initialization lists: think about a member class which could throw an exception while initializing (for example, it could not acquire some recources). You have no way to catch it inside the constructor if you initialize it in init lists.
And that's a potential problem anyway
, because the member class' default constructor is called behind the scenes if you don't include it in the initialization list. Thus we have a different rule about this situation - constructors should be exception safe.
Isn’t it better to let the compiler decide whether to inline functions? Also, can’t functions only be inlined if they are defined in the same file that they are used in?
To me, the if statement optimisation looks rather stupid. If anything, the intention of the code becomes more difficult to understand.
Always using ints seems a little silly. If you just need an integer number, then fine, but if you’re writing the bytes to a file for another program to read, then your data could end up being read completely wrong.
|Date:||November 11th, 2006 03:48 pm (UTC)|| |
|(Link)|Use Initialization Lists
It's also worth pointing out that sometimes the default constructor is used, not just the copy constructor. :-)Optimize for Loops
No, it won't create an infinite loop, and I think preferring the prefix increment operator is a good rule of thumb. Also, making full use of C++ usually means replacing C-style loops with iterators, at which point this is a wash.Use 'int'
For just about anything involving the network, knowing your bitsize is important. I'd say that you have to think about each datatype you use, and if you don't have any other issues to consider, then
int may be a good idea.Make Local Functions Static
I have mixed feelings about this one. But I tend to write code that doesn't have local functions, but instead has private instance and class methods, so...Optimize If Statements
That is such a huge hit to clarity that I don't think I would ever apply it.Optimize Switch Statements
Consider something else in place of a switch statement, anywhere this optimization actally improves performance. For instance, a lookup table. Going through a switch is O(n), which is why you should put common cases first. Depending on the properties of the value set being used in the switch, a lookup table should be O(1) or O(log n).Avoid Expensive Operations
Do this in performance-critical places. Otherwise, ignore it.Initialize on Declaration
Yep.Pass by Reference
Passing built-in types by reference is slower
than passing them in by value, because references are simply pointers (at the lowest level) and making use of them requires dereferencing the pointer every time they're accessed. So pass by reference (paying attention to const) for classes, and prefer pass by value for built-in types.Delay Variable Declarations
I don't think compilers will optimize this with nonstandard types, because e.g. operator+ has no requirement to behave like operator+=. If both operators are implemented, then the compiler is bound to use the one you invoke, and not the other one. Therefore, prefer op= because it should be more efficient.Inline Small Functions
Here's Apple's take on this: http://developer.apple.com/releasenotes/DeveloperTools/GCC3.html
(scroll down to the section on Optimization) In short - no.Use Nameless Classes
I agree with you.
In general, think about the implications - particularly in terms of constructors and memory allocation - of your default idioms, and then consider changing your habits. Everywhere, think about the efficiency of the algorithm you're using, and whether it's good enough or you expect O(n2
) to bite you in the ass.
these "optimizations" fall into three classes: writing good c++ (initializer lists, using constructors or at least giving the compiler the opportunity to elide constructors, delaying variable declarations, initializing on declaration), letting the compiler and optimizer do their jobs (inlining, reorganizing switch & if statements, hoisting and optimizing loops), and highly suspicious (everything else).
"use nameless classes" is, well, impossible because you can use explicit temporary class instances which weren't assigned a name, but you don't have nameless classes in c++. the fact that dude is abusing terminology this badly should be a clue as to how much you can trust the rest of what he has to say.
use int: no, do not use int. use the appropriate size int for the task at hand and treat unsigned ints with deep suspicion, since it is exceedingly easy to screw up arithmetic with them.
op=: assumes that op= and op are implemented sanely and do the same thing. in a perfect world, they should be but there's nothing in the standard that requires that.
in short: this is mostly trivial or blowing smoke. there's no discussion of favoring arrays of structures over structures of arrays (improves cache locality), how and why to avoid ragged arrays (bad for caching, requires multiple dereference operations to index a single item — the trick is to have a flat array and do the index calculation yourself), avoiding multiple reallocations where possible, etc.
I mostly agree with this comment, particularly about the "nameless classes" and the conclusion.
Using int "when possible" is okay, though. It is
supposed to be the processor's natural word size.
Using "unsigned" is actually very useful to eliminate whole classes of bugs. Instead of having to check at runtime for invalid negative values, you can tell the compiler right off that you don't want negative values and have it caught at compile time. But under/over-runs are a problem you have to watch for at all time.
For the "for" loops, doing --i (or ++i) repeatedly and comparing with zero as the test will never
do an infinite loop. It might not do the correct thing, but it'll wrap around and hit zero someday.
In any case, testing against a constant (like zero) rather than another variable might be one less load, and I wouldn't be surprised if there was a specific opcode on many platforms to test zeroness or non-zeroness, being such a common operation (used everywhere for boolean logic, and wouldn't need storing the value to compare with in the code, yielding smaller code, which is faster, if only for being easier on the icache). But you'd need to be in deep trouble at the point where you'd be wanting to do this. And the example wasn't so good, because of the test not being the simple "i != 0", but "i >= 0".
The correct way to do this "optimization" is like this (you have to use "i - 1" in the code, though, which might make it worse):
for (i = n; i > 0; --i)
Using prefix rather than postfix ++ or -- is a good habit to have, these days. This way, you just use the prefix one all the time, except when you specifically need the postfix semantics. That much, I consider worthwhile, because some copy constructors are quite expensive, and particularly in templates where you just never know.
Making local function static is an excellent
idea. Note that he said "functions", not "methods" or "member variable". This prevents external compilation units from using the function or variable, and can allows some more agressive optimizations. For example, if it inlines every call, it can avoid generating the out-of-line version altogether, making the code smaller. It could keep a static variable in a register when calling a static function. And so on. This is why people make a fuss about "whole-program optimization", because they wish for those optimizations that need to ensure the optimizer "knows all".
The "if" statement optimization is really
iffy, though. It'd have to be pretty damned quick to undo. An "else" puts an extra jump, yes, and jumps disrupt processors pipelines, which is rather expensive on some processors (ever wondered why Pentium 4 was so slow, clock cycle for clock cycle?).
For the "switch", the order is pretty meaningless, except for the first and last, which both will need one less jump.
All of this is bit twiddling, though
|Date:||November 11th, 2006 05:12 pm (UTC)|| |
I believe you're wrong regarding "switch": recall that case statements cascade and that the cascade has to be explicitly stopped with a break statement. Thus, the compiler has to iterate through all of the cases, so the more cases there are the more tests there are to perform on average.
As a first pass, it is
better to put more common cases at the front of the list if there are a lot of them, because they will require fewer tests before they're executed. However, a look up table or function dispatch will get you there must quicker.
On using int, if you're zipping through a large array use the smallest type possible, cache loads will eat your cycles otherwise. If you're accessing the same bit of memory sevel times in rapid succession using ints (which will hopefully be aligned) should see a gain.
On inlining functions, leave it to the compiler. Give it as much chance to inline as possible, but don't force it. Use profiling mode if your compiler supports it to let it make better choices.
The if statement thing is probably best left to the CPU's branch predictor and your compiler.
The for loop optimisation will squeeze a few cycles out of your loop if you're not accessing memory sequentially. Otherwise precaching (if it's being done) will be wasted.
I fully endorse this product and/or event.
|Date:||November 11th, 2006 07:49 pm (UTC)|| |
I was only somewhat skeptical about this article until I got to the bit on "optimizing" if statements. I can't believe how misguided that tip was. What the author completely ignored was the scenario where the bar() function (which you're trying to undo) has some sort of non-local side effect, for example, modifying a file on disk. How exactly do you undo that?
This page is rubbish, it lost me by point 2 but I've reread the if statements 'optimisation' a few times and I still don't get where it's coming from. I endorse what the OP said, really.
|Date:||November 11th, 2006 08:08 pm (UTC)|| |
My thoughts on some of these.
My overall impression is that the linked-to piece has a perscriptive recipe quality to it that's rather nieve. It discusses one aspect of the proposed coding style and doesn't talk about pros or cons, nor does it show a deep understanding of the language.
Re: My thoughts on some of these.
again, you're working on the assumption that the ++a and a++ operators do the same thing and there is no requirement or guarantee that that will ever happen
this entire discussion over a++ v. ++a is even more ridiculous since the optimizer does dead code elimination, meaning that returns that are never used won't be done in the first place.
|Date:||November 12th, 2006 09:14 pm (UTC)|| |
|(Link)|Make Local Functions Static - actually, they should be put into an anonymous namespace. The only time I use a static variable is for something like singletons.
What's the difference between the 2? AFAIK, both have the same effect: the function will only be usable in that compilation unit?
Couple of minor differences:
* You can have class & type definitions in the anonymous namespace, that are guaranteed never to be linked.
* Symbols in the anonymous namespace are guaranteed to take precidence over the same ones that happen to be defined elsewhere. Pretty unusual for this to be an issue though.
In spirit they're the same thing, the anonymous namespace is the 'modern' way of doing it.