I have a kernel that contains the following line and uses 49 registers according to the NVIDIA Visual Profiler.
colors[pixelIndex] = sin(1.0 * curTime);
Here “colors” is a char*, “pixelIndex” is an int, and “curTime” is a float. When I remove this line from the kernel and run it through the profiler it says that the kernel now uses 50 registers. What could be causing the kernel to use an extra register when this line is removed? Could it have something to do with the type mismatches causing the compiler to more aggressively optimize?
Another note is that if instead of the removing the line the “1.0” is changed to “1” then the kernel also uses 50 registers.