Integer division by a constant

[This message is primarily for the NVIDIA folks doing development of the CUDA tools, but it may also interest others.]

The CUDA development tools currently do not optimize integer divisions by an integer constant for device code. This is not difficult to do, since a division by a constant can be replaced by a wide multiplication followed by a right shift. The gcc compiler is using this optimization for a long time. It would be quite useful if the nvcc compiler did the same for device code. The details about how to do this can be found in the following paper:
Torbjörn Granlund and Peter L. Montgomery,
“Division by invariant integers using multiplication,”
ACM SIGPLAN Notices, Volume 29, Issue 6, June 1994.
For those with access, you can find it at
Presumably, details of this optimization can also be found in the gcc source code.

Unfortunately, this optimization is useless for non-constant divisors…

I’ve been implementing constant divisors manually this way, would indeed be nice if it was done automatically by the compiler.
It might be non-trivial though because the method puts some restrictions on the ranges of the numbers involved.

I’ve been using this approach, too. In my code I have non-constant divisors from small range (say 1 to 256), so I just buld table of ‘magic’ coefficients, put them to constant memory and use divisor as table index. Works much faster than integer division.