What does predicate register aside operands mean in IMAD instruction?

I’m reading sass code in nsight compute and find some sass code like this:

  IMAD.WIDE.U32.X R72, P3, R35, R13, R58, P3 

I know the ptx instruction madc. It has only four operands(3 inputs and 1 output). But the sass code above has 6 operands(4 normal register and 2 predicate register). What are these predicate register used for in the instruction?
I have read the documentation of cuda binary utilities but found no explanation about the instruction. Could you explain what this instruction does?
Best Regards.

NVIDIA does not explain SASS instructions to this level of detail, a stance they have maintained for 15 years, so unlikely to change. If you really must know, you will need to spend some quality time reverse engineering the details by looking at lots of SASS with various IMAD flavors.

This is an IMAD with .X suffix, so used in some sort of extended-precision computation. A wild guess could be that the two predicate registers are used to specify the registers holding carry-in and carry-out. Passing the carry through designated predicate registers allows multiple chains of carry-based dependencies to be alive at the same time; a generalization of the x86-64 ADOX / ADCX scheme which allows for two such dependency chains, if you will.

As I said, a wild guess only.

I think your guess is reasonable. Thank you.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.