"Compute bound" refers to proof generation (mining), not to verification. It looks like ethash does the following computations per random memory access:
I admit this doesn't look much more involved than Cuckoo's single siphash-2-4, but perhaps we can measure it. How much faster does this run if you leave out the actual memory access. i.e. replace dataset_lookup(p + j) by (p+j)?
In Cuckoo Cycle's case, avoiding the memory lookups reduces runtime by a factor 3.
I'd say the biggest advantage if you take out memory access is not latency (since most of the time latency gets hidden via pipelining), but power consumption.
When developing kernels the amount of times memory is accessed has huge impact on power-efficiency, less on actual computational throughput (especially on GPUs), unless you hit bandwidth limitations (which right now Ethash does).
EDIT: NVM - you're asking about the comparison of FNV1 vs Siphash. I haven't analyzed Siphash but I can say that FNV1 is exactly 5 clock cycles on a modern GPU (4 for a 32 bit multiply and 1 for an XOR).
I had someone run some tests where ethash was accessing only 1KB rather than the whole 1GB of dag, and runtime fell by 40%. So that's roughly the fraction of time spent on memory latency in the single threaded cpu miner. That's less than the 67% measured for Cuckoo Cycle, but still rather high. So I should say that ethash is almost memory-bound, and certainly much more so than other memory oriented hash functions.
In Cuckoo Cycle's case, avoiding the memory lookups reduces runtime by a factor 3.