I think I found a rather anoying bug in the CUDA compiler, but as of now couldn’t isolate the problem.
Here’s what happens: when I compile a rather large CUDA program, I get the following error:
ptxas point_ops_tb.ptx, line 22507; error : Special register argument not allowed for instruction 'shr'
ptxas point_ops_tb.ptx, line 22508; error : Arguments mismatch for instruction 'ld'
ptxas point_ops_tb.ptx, line 22509; error : Arguments mismatch for instruction 'shl'
ptxas point_ops_tb.ptx, line 22510; error : Arguments mismatch for instruction 'or'
[...]
Now look what funny stuff was generated at those lines:
shr.u32 %gridid, $r8178, 20;
ld.shared.u32 %nctaid.x, [__cuda_s_P484+24];
shl.b32 %nctaid.y, %nctaid.x, 8;
or.b32 %nctaid.z, %gridid, %nctaid.y;
Any suggestions?
Robert
Ed: I forgot: build in device emu mode works, and outputs the expected results.