Different maxregisters for different kernels?


I have several pretty different kernels in my app, as most people do I guess. One likes lots of registers, another one runs better with 12-16. I’m trying to figure out how to build different kernels w/ different register counts.

Where in the nvcc process does -maxrregcount come into play? In a compiler transition from .cu to … something?

Can each kernel be built separately and all the stuff linked in the end? Approx. like so:

nvcc kernel1.cu -maxrregcount=16 -o kernel1.something

nvcc kernel2.cu =o kernel2.something

nvcc cuda-host.cu -o cudahost.something

… and then linking them all up.

Is there any #pragma directive to instruct the compiler how many registers to use, or is the -maxrregcount the only way?

It seems to me that FindCuda.cmake could be useful for this, the build process is more discrete and understandable … will have to look into it.

So, does anyone have any hints for me before I start digging?


I’m also interested in the results of these questions. cuModuleLoad() caught my interest, and it might be a good place to start.

That’s pretty interesting - thanks for sharing.

Yes you can compile each function, and linking them afterwards. It works just like you would do normally.
cuModuleLoad is part of the driver API and you load cubin’s with it.