C doesn't even originally have true/false, I think you may be conflating the two concepts that "any nonzero int is truthy" and "boolean expressions evaluate to ints". The standard mandates that boolean expressions like equality always evaluate to 0/1.
> The == (equal to) and != (not equal to) operators are analogous to the relational operators except for their lower precedence. Each of the operators yields 1 if the specified relation is true and 0 if it is false.
I'm sure people do that (even though it's not necessary per some year C standard) but generally the pattern is actually for converting things which are not already 0 or 1 into 0 or 1. For example, you might want to use it here:
int num_empty_strings = !!(strlen(s1)) + !!(strlen(s2)) + !!(strlen(s3))
That isn't only more cryptic, it's also potentially a lot more efficient -- strlen takes time proportional to the length of the string, which of course you don't need to do if you only care whether or not the length is zero. You shouldn't use strlen for empty-string tests.
In practice, GCC and Clang don't seem to have any issues inlining the necessary part of strlen at -O1 or higher (https://godbolt.org/z/rM198aYea). But MSVC inlines the empty-string case, while still calling out for nonempty strings, probably since it doesn't realize that the returned length will be nonzero.
I guess since strlen uses an unsigned size, which has specified overflow behavior the compiler not only has to proof the initial iteration, but also all the ULLONG_MAX+1 multiples, which of course refer to the same memory address. But maybe its harder for the optimizer to see.
One thought: If the code is rewritten using bit arithmetic, then potentially the result could be even faster as there need not be a pointer look-up.
A bit arithmetic solution would have a mask created for the characters ‘p’ and ‘s’, and then the result could be AND-ed, and then with more bit arithmetic this all 1s value can be translated to a 1 if and only if all the bits are 1. Following which, there would be a no conditional check and simply be both an add and a subtract operation but where the value to be added will only be 1 if the mask for ‘p’ matches and 1 to be subtracted if the mask for ‘s’ matches respectively. I’m not fully sure if this would necessarily be faster than the pointer look-up solution, but it would be interested to try this version of the code and see how fast it performs.
Update: The bit arithmetic could also be done with an XOR on the mask, and following which the ‘popcnt’ x86 instruction could be used to figure out if all are 0 bits.
People who just skim the headline and article will come away convinced that dropping to assembly is the “way to go fast” even if they never actually do it.
Anyone with a passing understanding of Assembly or compilers would find that idea laughable. As for the others, it turns out not knowing what you don’t know can be very expensive.
https://owen.cafe/posts/the-same-speed-as-c/
And as others have pointed out, you can tweak the input, then vectorize the algo, if you want to go that route.
I considered this a pedagogical exercise and I sincerely hope nobody will start dropping down to assembly without a very good reason to.