Hi!
I wonder if someone else faced with this problem previously…
perhaps this is a bit esoteric at first glance…
but if one extensively uses integer arithmetic in the kernel and wants to squeeze out some extra flops
by default the compiler optimizes out multiplications by the powers of 2 with shifts which might not
always be desireable, consider the example:
__umul24(a, 2) + b wich results in 2 operations
while __umul24(a, 3) + b fuses into a single mad24 instruction… grr
isn’t there any way to force the compiler to use mad24 instead ?
although the “heavy weapon” would be to add mad24 intrinsic manually to nvopencc sources… but this is a headache
thanks