live registers

I have a problem with the “live registers” feature of Nsight Compute
Depending on modifications (e.g. commenting a part of the kernel) I do on the source code of a big CUDA kernels (lot of instructions, lot of registers needed) “live registers” may be displayed or not.
I need this feature in order to be able to reduce and control the number of register used.

Is it a known problem, is there a way to make it work in all cases ?

If the Live Registers metric is available depends on the instructions used by that kernel. If it executes e.g. indirect branch instructions, the tool is not able to compute the live registers using only the static information available at that point. In this case, you will likely also see some error in the Output Messages tool window, and a small red flag in the right bottom corner of the UI.

Unfortunately, there is not much that you can do about this. If these instructions are used depends on the compiler. We hope to be able to solve this better in a future version of the tool.

Indeed I have messages about some other kernels (not the one I am working on and that are not involved in the computation I am profiling) with indirect branch instructions but I don’t know why.
How can I know from the PTX (or other way) what is wrong with these kernels which should not contain such indirect branch instruction ?

Apparently the BRX instructions I have found are not linked to our source code but seem to be directly linked to the module size or the whole cuda program size. When I comment the code kernel where BRX instructions appear the BRX instructions may appear in another kernel until I have removed enough code.

As soon as kernels I had to comment were not used for my computation, it didn’t bother me, but now it appears in the kernel I am working on as I uncomment new parts.

We don’t know what and were are the limits. Have we to split the module that is made of a lot of kernels into smaller ones ?