I’m compiling with -maxrregcount=16 and I have this message:
ptxas /tmp/tmpxft_0000251a_00000000-2_ep.ptx, line 0; fatal : (C9999) max reg limit too low
but I don’t have the problem with maxrregcount=17. I would like to know how to discover the sections of the code that need this amount of registers, because I need to use only 16.
Any idea?
There really isn’t much else to do except trial and error analysis. Comment out sub-blocks of code and see what it does to the register count. The one thing to be careful about is variable dependencies. The dead code removal optimization in open64 is very good at identifying code blocks and variables that don’t contribute to modifications of either shared or global memory and simply optimizing them away.