When compiling the kernel you can have some control over how many registers the global functions inside are allowed to use through the “-maxregcount” parameter. This however applies to ALL the global functions in the kernel file. As we cannot compile multiple kernel files to have them linked together in a single cubin / object we are stuck with a single maximum count for all the kernels. For example I want to limit the register usage as follows:
global A compiles to 17 registers, limit to 16
global B compiles to 9 registers, limit to 8
The only “solution” I have found so far is to compile two different .cu files, each with a different maxregcount, and use the driver API to load these modules separately. However if these functions share global data, or something similar I have to get handles to both in both cubin files, etc. adding a lot of additional overhead.
Would it be possible, or is there a better way to set maximum register counts per function?
please correct me if I am wrong and it is already possible but I have not found something like this so far.