Given the binary instruction 0x2100e80c, cuobjdump disagrees with decuda and nv50dis.
cuobjdump reports this instruction as “IADD32 R3, g [0x4], R0;”, while decuda reports “add.half.b32 $r3, s[0x0010], $r0”, and nv50dis agrees: “add b32 $r3 b32 s[0x10] $r0”.
All of the interpretations show an add into r3 from r0 and an integer in shared memory, but cuobjdump reports address 0x4, while decuda and nv50dis report 0x10.
Moreover, the decuda README reports that the first 16 bytes of shared memory are reserved for “%gridflags, %ntid.*, %nctaid.x|y, ctaid.x|y”.
So, I believe this constitutes either a bug report for cuobjdump, or for both nv50dis and decuda, but the former seems more likely.