By running gcc -O3 -fdump-tree-all-all main.c -S -march=haswell
which prints out all optimization steps, we can find that this optimization is performed in main.c.036t.forwprop1
:
Pass statistics of "forwprop": ----------------Applying pattern match.pd:4684, gimple-match-6.cc:4680Applying pattern match.pd:9281, gimple-match-7.cc:2488
Which in turn, by going to the match.pd file, we can see that this is in fact a hardcoded match for that popcnt implementation, constants and all:
/* 64- and 32-bits branchless implementations of popcount are detected:...(simplify (rshift (mult (bit_and (plus:c (rshift @8 INTEGER_CST@5) (plus:c@8 (bit_and @6 INTEGER_CST@7) (bit_and (rshift (minus@6 @0... /* Check constants and optab. */ (with { unsigned prec = TYPE_PRECISION (type); int shift = (64 - prec) & 63; unsigned HOST_WIDE_INT c1 = HOST_WIDE_INT_UC (0x0101010101010101) >> shift; unsigned HOST_WIDE_INT c2 = HOST_WIDE_INT_UC (0x0F0F0F0F0F0F0F0F) >> shift; unsigned HOST_WIDE_INT c3 = HOST_WIDE_INT_UC (0x3333333333333333) >> shift; unsigned HOST_WIDE_INT c4 = HOST_WIDE_INT_UC (0x5555555555555555) >> shift;
In this particular case, this implementation of popcnt is probably common enough that it warrants a special case being implemented.