cuobjdump quirk disagreement between various disassemblers

pyronordicman · October 9, 2010, 4:43am

Given the binary instruction 0x2100e80c, cuobjdump disagrees with decuda and nv50dis.

cuobjdump reports this instruction as “IADD32 R3, g [0x4], R0;”, while decuda reports “add.half.b32 $r3, s[0x0010], $r0”, and nv50dis agrees: “add b32 $r3 b32 s[0x10] $r0”.

All of the interpretations show an add into r3 from r0 and an integer in shared memory, but cuobjdump reports address 0x4, while decuda and nv50dis report 0x10.

Moreover, the decuda README reports that the first 16 bytes of shared memory are reserved for “%gridflags, %ntid.*, %nctaid.x|y, ctaid.x|y”.

So, I believe this constitutes either a bug report for cuobjdump, or for both nv50dis and decuda, but the former seems more likely.

pyronordicman · October 9, 2010, 4:43am

Given the binary instruction 0x2100e80c, cuobjdump disagrees with decuda and nv50dis.

cuobjdump reports this instruction as “IADD32 R3, g [0x4], R0;”, while decuda reports “add.half.b32 $r3, s[0x0010], $r0”, and nv50dis agrees: “add b32 $r3 b32 s[0x10] $r0”.

All of the interpretations show an add into r3 from r0 and an integer in shared memory, but cuobjdump reports address 0x4, while decuda and nv50dis report 0x10.

Moreover, the decuda README reports that the first 16 bytes of shared memory are reserved for “%gridflags, %ntid.*, %nctaid.x|y, ctaid.x|y”.

So, I believe this constitutes either a bug report for cuobjdump, or for both nv50dis and decuda, but the former seems more likely.

Sylvain_Collange · October 10, 2010, 8:29pm

I am not familiar with cuobjdump’s syntax, but it sounds like a cosmetic issue…
0x10 = 16 bytes = 4 words. Assuming cuobjdump reports shared memory addresses in words rather than bytes (as decuda and nv50dis do), it is correct.

Actually, that would be closer to the way the address is encoded in the instruction word.

You are correct that the first 16 bytes of smem are reserved for CTA parameters. So s[0x10] (bytes) or g[4] (words) is the first user-addressable location.

Sylvain_Collange · October 10, 2010, 8:29pm

I am not familiar with cuobjdump’s syntax, but it sounds like a cosmetic issue…
0x10 = 16 bytes = 4 words. Assuming cuobjdump reports shared memory addresses in words rather than bytes (as decuda and nv50dis do), it is correct.

Actually, that would be closer to the way the address is encoded in the instruction word.

You are correct that the first 16 bytes of smem are reserved for CTA parameters. So s[0x10] (bytes) or g[4] (words) is the first user-addressable location.

Sarnath · October 11, 2010, 4:20am

Nice catch!

Sarnath · October 11, 2010, 4:20am

Nice catch!