__mul24 slow down my algorithm?

garciav · March 31, 2008, 12:37pm

Hi,

I’ve tried to use

__mul24(blockIdx.x,blockDim.x) + threadIdx.x;

but, in my case, it’s slower than

blockIdx.x * blockDim.x + threadIdx.x;

This is not a big deal but I just try to understand =)

Thanks,

Vince

AndreiB · March 31, 2008, 2:16pm

And how do you measure this? =)

garciav · March 31, 2008, 2:55pm

The profiler… I duplicate my function and I replace * by __mul24.
Maybe I’ve made a mistake…

MisterAnderson42 · March 31, 2008, 3:15pm

Check the register usage in the cubin or by using “–ptxas-options -v” on the nvcc command line. I’ve noticed that using mul24 instead of * for blockIdx.x * blockDim.x + threadIdx.x increasing the register usage of my kernels in the past. The decrease in occupancy then hurt performance.