I have several pretty different kernels in my app, as most people do I guess. One likes lots of registers, another one runs better with 12-16. I’m trying to figure out how to build different kernels w/ different register counts.
Where in the nvcc process does -maxrregcount come into play? In a compiler transition from .cu to … something?
Can each kernel be built separately and all the stuff linked in the end? Approx. like so:
nvcc kernel1.cu -maxrregcount=16 -o kernel1.something nvcc kernel2.cu =o kernel2.something nvcc cuda-host.cu -o cudahost.something
… and then linking them all up.
Is there any #pragma directive to instruct the compiler how many registers to use, or is the -maxrregcount the only way?
It seems to me that FindCuda.cmake could be useful for this, the build process is more discrete and understandable … will have to look into it.
So, does anyone have any hints for me before I start digging?