I'm glad to see people talking about one of the four stages of optimization:
1. Do we have to optimize this? Time is never free, and opportunity cost in terms of engineering effort is usually very significant.
2. Can we do less work? (this article)
3. What's the bottleneck? CPU, FP, memory bandwidth, lock contention?
4. How do we squeeze out better performance? Assembly, loop unrolling, etc.
I usually cringe when I hear about people talking about #4, as very few of them have asked #1, #2, or #3 yet.
And usually, you just hear people talking about doing what the compiler's already doing (writing it in assembly? /The compiler already does that/), usually doing a pretty reasonable job at high optimization levels. They just exchange maintainability for the warm fuzzy feeling that they've been as macho as needed.
Use the 80-20 rule. 80% of your code's time is spent in 20% of the code.
Optimizing without good profiling is like painting a portrait blindfolded.
Pre-optimizing by coding your system to be "fast" before you get to profile it is like starting to paint a portrait before your model is even sitting.
When Kent Beck came in to optimize the Chrysler C3 payroll system, he discovered that one of the biggest time-sinks in the system was string concatenation!
It's also useless to optimize before you have a system that runs correctly. Kent also asked how he could validate the C3 system so he could optimize it. He was told it wasn't even producing correct results yet. He replied: "In that case I can make it run really fast!"
There is also another step - define what you want. Targets are important.
Preferably, define targets in real (really real) world terms. This is difficult, but "a user should reasonably be able to complete this macro-process in 5 minutes" is the kind of target.
I've seen so many specs that demand "millisecond response time" for the application. Or something ludicrous. (Requirements for business continuity and downtime/etc are usually similarly crazy/unqualified).
When one consultancy I know got a similar requirement forced down their throat contractually -- they simply changed the application so that a process that was a few pages, was now dozens. Much shorter delays! The fact is that the user wouldn't even blink at a several-second delay when submitting (it was a big operation, so you'd expect a delay). The whole process was longer and more cumbersome - they'd optimised the wrong thing entirely.
It's an extreme example I know -- But the upshot is that the target was wrong -- It's not just the people writing the specs. In a lot of places, it's still rare for developers to sit down with actual users. This always seemed crazy to me (half the time it was because companies are worried that developers will go "hey yeah sure, I can implement that in an hour!" -- which does admittedly sometimes happen).
So you're absolutely right! :) Profile, Profile, Profile -- but don't forget to profile your users (imho do this first).
What are the facts? Again and again and again---what are the
facts? Shun wishful thinking, ignore divine revelation,
forget what "the stars fortell," avoid opinion, care not
what the neighbors think, never mind the unguessable
"verdict of history"---what are the facts, and to how many
decimal places? You pilot always into an unknown future;
facts are your single clue. Get the facts!
I think all of your points boil down to the same mantra as the article:
1. is the human variant.
3. finding which thing to "do less" of (benchmarking/profiling seem to be the "do less" ways of doing this task).
4 from the low level CPU perspective.
As I see it, all "four stages of optimization" are the application of the same rule just from different perspectives.
1. Do we have to optimize this? Time is never free, and opportunity cost in terms of engineering effort is usually very significant.
2. Can we do less work? (this article)
3. What's the bottleneck? CPU, FP, memory bandwidth, lock contention?
4. How do we squeeze out better performance? Assembly, loop unrolling, etc.
I usually cringe when I hear about people talking about #4, as very few of them have asked #1, #2, or #3 yet.
And usually, you just hear people talking about doing what the compiler's already doing (writing it in assembly? /The compiler already does that/), usually doing a pretty reasonable job at high optimization levels. They just exchange maintainability for the warm fuzzy feeling that they've been as macho as needed.