nvcc bug Trying to write to threadIdx...

I think I found a rather anoying bug in the CUDA compiler, but as of now couldn’t isolate the problem.

Here’s what happens: when I compile a rather large CUDA program, I get the following error:

ptxas point_ops_tb.ptx, line 22507; error  : Special register argument not allowed for instruction 'shr'

ptxas point_ops_tb.ptx, line 22508; error  : Arguments mismatch for instruction 'ld'

ptxas point_ops_tb.ptx, line 22509; error  : Arguments mismatch for instruction 'shl'

ptxas point_ops_tb.ptx, line 22510; error  : Arguments mismatch for instruction 'or'

[...]

Now look what funny stuff was generated at those lines:

       shr.u32         %gridid, $r8178, 20;

        ld.shared.u32   %nctaid.x, [__cuda_s_P484+24];

        shl.b32         %nctaid.y, %nctaid.x, 8;

        or.b32  %nctaid.z, %gridid, %nctaid.y;

Any suggestions?

Robert

Ed: I forgot: build in device emu mode works, and outputs the expected results.

Seems you’re running out of virtual registers somehow, and it starts recycling %gridid etc… yes, clearly a bug :)

Just to let you know: this bug seems to be fixed in the 1.1 beta. :-)
Cannot tell so far whether it computes correct values though…

Stats are quite impressing: from approx. 16000 cuda-allocated registers, ptxas creates a 30 register kernel. Nice!