What is hand-coded assembly language used for these days?
To put that another way, in the current marketplace, what kinds of program are so worthy of optimization that it's economically sensible to have a human spend several days hand-tuning machine language to squeeze out every CPU cycle?
In addition, look at how popular netbooks are becoming. The Intel Atom is an in-order CPU. Imagine a hyperthreaded, 1.6 GHz 486...
On the iPhone it's even worse. It's got a decent vector unit, but the CPU is very slow. You'll see great wins by doing your 3D math yourself.
As we continue to become multicore, I could imagine somebody shaving a couple cycles out of the core message passing routines, though you're almost certainly bus bound in those situations...
Computers are getting smaller and people want more out of them; assembly language is back in style!