SET instruction


I am trying to understand the destination register value in the following assembly instruction given by decuda and cuobjdumop -

Decuda: set.le.s32 $p0|$o127, s[0x0018], $r0;
cuobjdump: ISET.S32.C0 o [0x7f], g [0x6], R0, LE;

What is the predicate register value in the above instructions? I am wondering how can it be 127, as we just have 4 predicate registers per thread.

In the following instructions though, $p0/C0 is used for conditional execution:

Decuda: @$ return ;
cuobjdump: RET C0.NE;



This is disassembly from sm_1x, correct? My memory is a bit hazy as I haven’t looked at sm_1x code in a while. The condition code register is C0, o[127] is a bit bucket, I think (not entirely sure). So the instruction stores the result of the comparison g[0x6] <= R0 in condition code register 0, i.e. C0.

Thanks for replying njuffa.

Yes, the above dis-assembly is for sm_1x architecture. What does the bit bucket mean?

Also, decoding the instruction gives 127 as the destination register number, and not 0 corresponding to C0. Any thoughts on that?



ISET.S32.C0 o [0x7f], g [0x6], R0, LE

Data that is discarded is said to go into the “bit bucket”. Depending on the outcome of the comparison, ISET writes a mask of all 0s or all 1s into a destination register, and in addition sets a condition code in a condition code register. The condition code register is indicated by a suffix, here .C0, meaning record the result in condition code register C0. The destination register here is encoded as o[127] which tells the hardware that the mask is to be discarded and not stored into an actual register. The advantage of providing a bit-bucket option is that one does not need to temporarily use up a register for a mask that will never be never used, because one only needs the condition code. An ISET with no mask stored is the closest equivalent to a CMP instruction in the x86 instruction set.

The bit-bucket feature also exists on various RISC processors, where often the zero-register serves as a bit bucket when used as a destination. For example, in the SPARC architecture, %g0 serves as both zero register (on read) and bit bucket (on write), and

cmp  reg_src1, reg_or_immediate

is a synthetic instruction that is actually encoded as

subcc reg_src1, reg_or_immediate, %g0

njuffa: Thanks a lot. Is o[127] a specific encoding scheme to tell the hardware that the mask of all 0/1s is not required?