Where does RET.REL.NODEC R4 0x0 cause the program to jump to?

This is a part of an SASS file. How can I determine which part of the SASS file the statement RET.REL.NODEC R4 0x0; jumps to?

/00a0/ MOV R4, 0xc0 ;

/00b0/ CALL.REL.NOINC 0x160 ;

/00c0/ IMAD.MOV.U32 R7, RZ, RZ, R3 ;

/00d0/ BRA 0x120 ;

/00e0/ MUFU.RCP R7, R0 ;

/00f0/ FFMA R3, R0, R7, -1 ;

/0100/ FADD.FTZ R4, -R3, -RZ ;

/0470/ IMAD.MOV.U32 R5, RZ, RZ, 0x0 ;

/0480/ RET.REL.NODEC R4 0x0 ;

/0490/ BRA 0x490;

I’d like to ask another question as well. While analyzing the SASS file for the 1/x function, I noticed that the input R0 undergoes the following operations at the beginning.

LDG.E R0, desc[UR4][R4.64] ;

IADD3 R3, R0, 0x1800000, RZ ;

LOP3.LUT R3, R3, 0x7f800000, RZ, 0xc0, !PT ;

ISETP.GT.U32.AND P0, PT, R3, 0x1ffffff, PT ;

These operations seem to first add 3 to the exponent bits of the input, then set the mantissa and sign bits to 0, and finally compare it to 0x1ffffff.

My understanding is that this is checking whether the input is a subnormal number. But if that’s the case, why use the IADD3 operation to add 3 to the exponent bits first?

Unless there are special manipulations of the return address, a RET instruction directs control flow to the instruction following the corresponding CALL. There could be any number of call sites in the code that invoke the particular subroutine at whose end this RET is located.

This means that solely based on the snippet provided, we cannot tell where this RET returns to.

You will need to look at the full code, and then trace the control flow in the forward direction. Hypothetically, you may discover that there is a single subroutine in this code, which starts at address 0x160, and that there is only one CALL in the entire code that jumps there, and that this RET is in fact reachable from control flow reaching address 0x160. In that case the RET would transfer control to address 0x00c0 with the IMAD.MOV.U32 instruction.

1 Like

May I ask what R4 and 0x0 represent in this instruction RET.REL.NODEC R4 0x0;?

I assume R4 is the designated link register which holds the return address. SASS is not publicly documented to that level of detail, and I have not paid attention to these details as they are not important to my work. You can make educated guesses or spend time to reverse engineer the ABI, details of machine instructions, etc.

Alright, it seems this detail doesn’t affect my understanding of the whole code, so I don’t need to dwell on it, hhhhhh.

Do you want to clue us in to what purpose you are trying to decode CUDA’s reciprocal function? Do you suspect a bug? Do you think it is inefficient?

The return address (0xc0) is probably stored here:

The 0x0 in the return statement could be a relative offset (added to R4), I am not sure.