Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

agreed, Itanium investment should have gone to the Alpha.

Itanium was really good at raw performance as long as you could write hand tuned math kernels or kept working with the compiler team to optimize code for your kernel. Took me a while, but I got 97% efficiency with single core DGEMM.



Hand-written code for Itanium was always smoking fast. One-clock microkernel message passes and other insanity. But nobody ever figured out how to write a compiler that could generate code like that for that machine.


Most of it depended on the problem: for a subset of problems it worked well but once you had branchy code and less than very consistent memory access it was dismal. I supported a computational science group during that period and Itanium (and Cell) kept being tested but never made sense since you’d be looking at person-years of work hoping you could beat the current systems (or even previous generation) instead of spending that time on improved application functionality.


> for a subset of problems it worked well but once you had branchy code and less than very consistent memory access it was dismal.

So, a lot like coding for the GPU. Makes sense, given that the low-level architecture is so similar... And it might explain why VLIW itself is not so widely used anymore. AIUI, even the Mill proposed architecture (which boils down to VLIW + lots of tricks to cheaply improve performance on typical workloads) has a hardware-dependent, low-level "compilation" step that's quite reminiscent of what a GPU driver has to do.


The GPU comparison is common and I think it hits the main problem: Intel/HP needed to solve two hard problems to succeed. GPU computing had only one because gamers provided a reliable market for the chips in the meantime.

I’m also curious how this could have gone a generation later: Itanium performance was critically dependent on compilers in an era where they were expensive and every vendor made their own, and the open source movement was just taking off. It seems like things could have gone much better if that’d been, say, LLVM backend & tools and higher level libraries where someone could get updates without licensing costs and wouldn’t be in the common 90s situation of needing to choose between the faster compiler and the more correct one.


> nobody ever figured out how to write a compiler that could generate code like that for that machine

Were a lot of people trying? It was a pretty difficult platform to get hold of and tinker with.


I’m not sure how many people, but it’s all the compiler group at HP did for the last twenty years.


HP has a compiler group. I wasn't aware there were all that many commercial compilers still around, and I definitely wouldn't have thought of HP.


Who else do you think would have developed the compilers for HP-UX, VMS, and NonStop?


Present tense. The GP comment was in present tense, not a long time ago.


There were people trying, but there are some real fundamental issues with the approach for general purpose computing. It's extremely hard for a compiler to know if some data is in cache, in memory, or way out in swap. Without this information it's very hard to know how long any memory fetch is going to take. If you're trying to run a lot of computation in parallel that has some interdependencies then this information is paramount.

It's kind of like trying to use a GPU for general purpose computation. Itanium should have been a coprocessor.


> Took me a while, but I got 97% efficiency with single core DGEMM.

In my experience, it's pretty widely accepted that VLIW (and EPIC) can achieve high performance and efficiency on highly regular tasks such as GEMM and FFT. That's why VLIW has been and continues to be popular for DSPs. The struggle for VLIW is general purpose code that doesn't necessarily have that same kind of regularity.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: