Ok, I placed and used the code that I “unroll” in an empty project, and it compiled just fine with all loops being totaly unrolled and 124 registers and no local memory used. The ptx file was ~10000 lines. Again, when I put the unrolled code in my program, I get this Error code 128 during compilation. Probably it’s because I run out of registers (I heard about 128 registers per thread limit recall “GPU specific maximum of 128 registers” according to the nvcc_2.3.pdf), well the task now is to move some rarely used data from registers to local memory
That’s so much fun!!!
Edit: Hey, wait a minute, why can’t compiler continue using local memory instead of registers(as it did before I added unrolling) if there is not enough of those? I believe I read somewhere that it is exactly how it should behave.