[This message is primarily for the NVIDIA folks doing development of the CUDA tools, but it may also interest others.]
The CUDA development tools currently do not optimize integer divisions by an integer constant for device code. This is not difficult to do, since a division by a constant can be replaced by a wide multiplication followed by a right shift. The gcc compiler is using this optimization for a long time. It would be quite useful if the nvcc compiler did the same for device code. The details about how to do this can be found in the following paper:
Torbjörn Granlund and Peter L. Montgomery,
“Division by invariant integers using multiplication,”
ACM SIGPLAN Notices, Volume 29, Issue 6, June 1994.
For those with access, you can find it at [url=“http://portal.acm.org/citation.cfm?id=773473.178249”]http://portal.acm.org/citation.cfm?id=773473.178249[/url]
Presumably, details of this optimization can also be found in the gcc source code.
Unfortunately, this optimization is useless for non-constant divisors…