Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn’t directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That’s going to be 31-log2(a).
All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.