Compiler doesn't opimize __mul24 with literals __mul24(10,10) versus 10*10

I just noticed that the compiler does not see __mul24 with two literals as a multiplication and resolve it in the compilation phase, like 10*10, so it will actually issue instructions for it.

This is just something that catched me unaware, while switching macros for variables and the other while around. I noticed when I looked at the ptx output.

Also, a multiply with a fixed power of two (both using * and __mul24) doesn’t seem to be changed to a shift left automatically. Why not?

This may actually be optimized when .ptx is converted into device-specific code, though I don’t know if it is or is not.


It might be. But I must say I’m a bit dissatisfied with the smartness of the compiler up to now.

It seems the same applies to inline functions with static arguments. In contrary to, for example Cg, the compiler does not try very hard to do static optimization. This is really a pity for something that is run in 256 threads where every cycle counts.