How does 'LEA' instruction works?


LEA R10, P2, R193.reuse, c[0x0][0x1f8], 0x2;

What’s the operation of LEA instr?

Moving this to “CUDA Programming and Performance” forum so that CUDA team can take a look.

NVIDIA doesn’t document the details of the GPU machine instructions. I cannot find my reverse engineering notes for LEA right now. LEA is basically a left-shift-plus-add intended for 64-bit address computations. From memory (I will surely get something wrong here!), it is something like:

LEA d, a, b, c, s ===> d = ((a:c) << s) + b

where “:” denotes concatenation. To update a 64-bit pointer one would then use an instruction followed by an LEA.HI.X instruction to complete the 64-bit addition, because such a pointer requires two 32-bit GPU registers. In your example, it seems the pointer’s low part would be in c[0x0][0x1f8], c is not shown because it is RZ = 0 (?), and the shift factor is 2. So my best guess is

R10 = (R193 << 2) + c[0x0][0x1f8]

The P2 in your example should refer to a predicate register for use in predicated execution. cuobjdump --dump-sass normally doesn’t show predicates in that operand position, so I am not sure what to make of it. GPU machine instructions are architecture specific. What GPU architecture is the code shown in #0 for?