> but it does shoot down the cache line containing the lock in all cores other than the one that acquired the lock.
Well, as long as you do the test-CAS instead of the pure-CAS approach not every loop iteration results in cache line bouncing.
Plus intel has introduced the MWAIT[0] instruction to implement something similar to futex in hardware, i.e. the hyperthread can sleep until another core updates the cacheline in question.
That's true, though MWAIT can only be used from kernel mode. At least the docs say that you get a #UD exception if you attempt to use it from user mode.
Well, as long as you do the test-CAS instead of the pure-CAS approach not every loop iteration results in cache line bouncing.
Plus intel has introduced the MWAIT[0] instruction to implement something similar to futex in hardware, i.e. the hyperthread can sleep until another core updates the cacheline in question.
[0] https://www.felixcloutier.com/x86/mwait