How to set the max register number for each kernel?

Hi,all:
I know there is a hardware limit for maximum register number for every thread, which may also affect maximum number of resident blocks on one SM. The compiler flag -maxrregcount seems work globally for every cuda source file. So if I have several kernels, some try to use more threads with fewer registers, others try to use more registers with fewer threads, I cannot find a good limit for both cases.

Is that possible to set the -maxregcount flag for every kernel? Or these kernels should be split into separate source files and then compiled separately with different flags?

Thanks for your replies~

launch bounds:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds

Thanks every much! That’s just what I’m looking for ~