1>### Assertion failure at line 2433 of ../../be/cg/NVISA/cgtarget.cxx:
1>### Compiler Error in file C:\Users\Korax\AppData\Local\Temp/tmpxft_00000c50_00000000-9_cppIntegration.cpp3.i during Register Allocation phase:
1>### ran out of registers in float
This occurs when I try to compile code with lots of heavy-duty computation. The code is syntactically correct. Are there any tricks to work around this problem?
I don’t know what your code looks like, but perhaps if you are declaring a lot of variables in your kernel (float or otherwise), you could look through the code and find ways to ‘reuse’ a variable for other purposes. For example, if you are looping multiple times, don’t use a separate loop counter for each loop – just re-initialize the variable and re-use the counter for the next loop.
I don’t know how much this will help, but it’s a shot…it all depends on what exactly your code is doing. Perhaps you could also do some partial computations, store the results in shared memory, then have some code at the end of the kernel that operates on the partial results to get your final result. This would also have the advantage of letting you run more threads if it reduced your register usage a great deal.
There are indeed very many variables. It seems like when I introduce a whole new bunch of them, I get this problem. I will work on trying to minimize them. Are there any other tricks to avoid this lack of registers, perhaps something along the lines of thread/block balancing?
I’ve encountered this a couple of times, but usually when adding a #pragma unroll in front of a large for loop. In general the solution was not to use #pragma unroll in those places, but that might not apply to you,
There are some low level tricks that you might be able to use, depending on your application. Do a search for assembly optimization and see if you can find anything that works for you (for example, the XOR swap trick that swaps two values without using an intermediate register).
You might also try commenting out some of your code until it compiles, then running it through the profiler to see where all the registers are getting used.