As an aside to @kouta-kun's answer for gcc, I searched what llvm does.
I found popcnt being generated in LoopIdiomRecognize.cpp
, a pass that recognizes idioms and transforms simple loops into a non-loop form.The recognizePopcount
and the detectPopcountIdiom functions only recognizes popcnt implementations that look like this:
for (popcount=0; x; popcount++) x &= x - 1;
However, I could not find a detection for the branchless version.
But how would a compiler be able to figure that out, besides hardcoding numerous possible implementations and then comparing until there is a match?
Rather than comparing directly to multiple possible implementations, it's more like pattern matching. Here we detect if the loop:
- is small enough,
- has only one block,
- has only one backedge,
- contains instructions corresponding to
"x2 = x1 & (x1 - 1)"
, - has the
cnt2 = cnt1 + 1
increment... And so on.
Do compilers have a way of 'simulating' the possible inputs and outputs of a function, and determining that they will match the result of a specific assembly instruction?
That would quickly become extremely slow to perform during compilation, just for the sake of detecting this very specific optimization.