I’d like to ask another question as well. While analyzing the SASS file for the 1/x function, I noticed that the input R0 undergoes the following operations at the beginning.
LDG.E R0, desc[UR4][R4.64] ;
IADD3 R3, R0, 0x1800000, RZ ;
LOP3.LUT R3, R3, 0x7f800000, RZ, 0xc0, !PT ;
ISETP.GT.U32.AND P0, PT, R3, 0x1ffffff, PT ;
These operations seem to first add 3 to the exponent bits of the input, then set the mantissa and sign bits to 0, and finally compare it to 0x1ffffff.
My understanding is that this is checking whether the input is a subnormal number. But if that’s the case, why use the IADD3 operation to add 3 to the exponent bits first?
Unless there are special manipulations of the return address, a RET instruction directs control flow to the instruction following the corresponding CALL. There could be any number of call sites in the code that invoke the particular subroutine at whose end this RET is located.
This means that solely based on the snippet provided, we cannot tell where this RET returns to.
You will need to look at the full code, and then trace the control flow in the forward direction. Hypothetically, you may discover that there is a single subroutine in this code, which starts at address 0x160, and that there is only one CALL in the entire code that jumps there, and that this RET is in fact reachable from control flow reaching address 0x160. In that case the RET would transfer control to address 0x00c0 with the IMAD.MOV.U32 instruction.
I assumeR4 is the designated link register which holds the return address. SASS is not publicly documented to that level of detail, and I have not paid attention to these details as they are not important to my work. You can make educated guesses or spend time to reverse engineer the ABI, details of machine instructions, etc.