The argument about why `jne` might be faster is that that in the former case, the CPU always executes a dependency chain of length 3: `cmpb` -> `sete` -> `addl`. Each of these instructions have to be computed one after the other, as `sete` depends on the result of `cmpb`, and `addl` depends on the result of `sete`.
With `jne`, the CPU might predict the branch is not taken, in which case, the dependency chain is
`mov` -> `addl` (the `mov` of an immediate might be handled by register renaming?).
Or that it is taken, in which case in which case the dependency chain is just `addl`.
I guess you're arguing that the CPU should handle `sete` the same way?
That is, instead of treating `addl` as dependent on the result, predict what `sete` does and start executing `addl` before `sete` finishes, rewinding if that went wrong?
In principle is perfectly possible to speculatively execute cmov (and viceversa to change jump-over-one-instruction into conditional execution).
But Intel historically didn't do it as programs tend to use cmov when the condition is unpredictable , so there was little reason to optimize it.
After Spectre, I believe intel has given an architectural guarantee that cmov is never speculated so it can be used as part of speculation attack prevention.
With `jne`, the CPU might predict the branch is not taken, in which case, the dependency chain is `mov` -> `addl` (the `mov` of an immediate might be handled by register renaming?).
Or that it is taken, in which case in which case the dependency chain is just `addl`.
I guess you're arguing that the CPU should handle `sete` the same way? That is, instead of treating `addl` as dependent on the result, predict what `sete` does and start executing `addl` before `sete` finishes, rewinding if that went wrong?