The volatile keyword will certainly lead to implications for cache coherency, bu...

burfog · on May 26, 2019

Yep. The slide is completely wrong. It is showing low-level architecture details that would be 100% identical between the two cases. Volatile changes nothing on that list.

Volatile just makes sure the compiler bothers. Otherwise, a pair of writes to the same memory location could be optimized by eliminating the first write. Volatile makes the compiler do that. Of course, the CPU itself may then do this optimization, so volatile is thus not good enough for IO.

keldaris · on May 26, 2019

> It is showing low-level architecture details that would be 100% identical between the two cases.

To be as charitable as I can possibly be, the only part that could theoretically make sense is that the compiler could emit non-temporal store instructions to bypass the cache. I know compilers currently don't do that for volatile, but I don't know why.

comex · on May 27, 2019

> the only part that could theoretically make sense is that the compiler could emit non-temporal store instructions to bypass the cache. I know compilers currently don't do that for volatile, but I don't know why.

Two reasons:

First, using nontemporal accesses would break mixed volatile and non-volatile accesses to the same memory, something which is not defined by the C standard but which some programs rely on anyway.

Second, more importantly: why would they?

- If the address you’re accessing points to hardware registers, the page table entry should be marked non-cacheable, which makes nontemporal accesses unnecessary. And if for some reason it’s not marked properly, nontemporal accesses wouldn’t be sufficient to guarantee that things work anyway, because nontemporal is just a hint which the hardware may not respect. In any case, at least on x86, AFAIK the only nontemporal instructions access 128+ bits of memory at a time, which wouldn’t even work for hardware registers (which generally require you to use a specific access size).

- If the address you’re using points to regular memory, on the other hand, volatile is probably being used to implement atomics, in which case bypassing the cache is unnecessary and also slow. In theory, compilers could compile volatile into accesses surrounded by memory barrier instructions, which would enforce a stronger memory ordering (while being faster than bypassing the cache entirely), especially useful on architectures with weaker memory models than x86. In fact, that’s what volatile does in Java. But in C, it’s pretty long-established that volatile accesses should just compile to regular load/store instructions, and any necessary barriers must be inserted manually. People writing high-performance code wouldn’t be happy if the compiler started inserting unnecessary barrier instructions for them… In any case, usage of volatile for atomics is deprecated in favor of C/C++11 atomics, which do insert barriers for you.

Gibbon1 · on May 27, 2019

I think the reason is the details are too complicated to be captured by the volatile keyword.

For instance the processor I use has a controller that enforces consistency on IO memory operations. So volatile works 'fine'. I know that. The compiler is targeting a core not an implementation. So it has no idea.