Thanks for the useful info. I’m looking at a “password cracking” tool, here is the beginning of the kernel:
MOV R1, c[0x0][0x20] // LOAD ARG
MOV R2, c[0x0][0x190] // LOAD ARG
MOV R3, c[0x0][0x194] // LOAD ARG
LDG.E R2, [R2] // LOAD from ptr R2
ISETP.NE.AND P0, PT, R2, RZ, PT // if R2 != 0:
@P0 EXIT // then EXIT
MOV R30, c[0x0][0x174] // LOAD ARG
MOV R40, c[0x0][0x170] // LOAD ARG
MOV32I R6, 0x1 // R6 = 1
MOV R7, c[0x0][0x164] // LOAD ARG
SHF.L.W R2, R40, 0x1, R30 // ROTATE LEFT 1
LOP.XOR R6, R6, c[0x0][0x160] // XOR
SHF.L.W R0, R30, 0x1, R40 // ROTATE LEFT 1
LOP3.LUT R5, R7, RZ, R2, 0x96 // XOR 3
LOP3.LUT R12, R6, RZ, R0, 0x96 // XOR 3
SHF.L.W R11, R12, 0xc, R5 // ROTATE LEFT 12
SHF.L.W R12, R5, 0xc, R12 // ROTATE LEFT 12
MOV R17, c[0x0][0x17c] // FROM ARG
LOP3.LUT R18, R6, c[0x0][0x168], R0, 0x96 // XOR 3
LOP3.LUT R3, R7, c[0x0][0x16c], R2, 0x96 // XOR 3
LOP.XOR R0, R0, R6 // XOR1
LOP.XOR R61, R2, c[0x0][0x164] // XOR1
XMAD R10, R9, c[0x0][0x8], RZ // R10 = blockDim.x * R9
XMAD.MRG R15, R9, c[0x0][0x8].H1, RZ // R15 = (R9 * 64 + 0) + 64 << 16
MOV R8, c[0x0][0x178] // LOAD ARG
…(other computation)
(3) I got lost in the last 3 lines, I’m not sure where is $R9 coming from.
(4) I didn’t dump SASS code with cuobjdump since the tool using JIT compiler (apparently the tool owner also wanted to protect it, even nvprof couldn’t run the app). I had to use cuda-gdb to get these SASS code. is it a good way to extract sass? or it can cause missing some line of code.