be.exe error code 128

I’m manually unrolling a loop (15 iterations) and after 8 iterations I start getting

1>nvopencc ERROR: C:\CUDA\bin64/…/open64/lib//be.exe returned non-zero status 128

Is there anyone who knows what this 128 stands for? Or can someone point me where to look at?

I suspect it is due to the size of the ptx file produced from the code. The largest ptx file I could produce was of 3800 lines, and I believe there is some kind of limit. Is there one? )

It is actually less than “The maximum kernel size is 2 million PTX instructions” mentioned in the CUDA FAQ.

Ok, I placed and used the code that I “unroll” in an empty project, and it compiled just fine with all loops being totaly unrolled and 124 registers and no local memory used. The ptx file was ~10000 lines. Again, when I put the unrolled code in my program, I get this Error code 128 during compilation. Probably it’s because I run out of registers (I heard about 128 registers per thread limit recall “GPU specific maximum of 128 registers” according to the nvcc_2.3.pdf), well the task now is to move some rarely used data from registers to local memory External Media

That’s so much fun!!! External Media

Edit: Hey, wait a minute, why can’t compiler continue using local memory instead of registers(as it did before I added unrolling) if there is not enough of those? I believe I read somewhere that it is exactly how it should behave.

registers are not yet virtualized