Ah, right, misread the table. I think the latency would be more relevant in this case.
Re Intel atomics, yes being fully fenced by default is annoying, but it kind of falls out of TSO: it would be strange if an RMW were to be more relaxed than normal stores, so, like all stores, it has to wait for prior stores to be flushed out from the store buffer before continuing, which is high latency. Normally the latency is not observable for stores, but an RMW necessarily cannot execute the load ahead of the store.
I haven't really found much documentation on them, but it is possible the far atomics might end up having these relaxed semantics (and I suspect most of the time they won't be implemented as actual far atomics). Do you have any more info on them than what's on the Intel docs?
Re Intel atomics, yes being fully fenced by default is annoying, but it kind of falls out of TSO: it would be strange if an RMW were to be more relaxed than normal stores, so, like all stores, it has to wait for prior stores to be flushed out from the store buffer before continuing, which is high latency. Normally the latency is not observable for stores, but an RMW necessarily cannot execute the load ahead of the store.
I haven't really found much documentation on them, but it is possible the far atomics might end up having these relaxed semantics (and I suspect most of the time they won't be implemented as actual far atomics). Do you have any more info on them than what's on the Intel docs?