branchless exchange based on condition ?

I’m offen do something like this in kernel

uint nodeA = …;
uint nodeB = …;

if (some condition)
uint tmp = nodeA;
nodeA = nodeB;
nodeB = nodeA;

is there any way/trick to do it branchless ? (let’s say ‘some condition’ is bool variable)

The compiler is probably already making this branchless. The GPU is capable of predicated execution. I’m no PTX expert, but one once told me that the nvcc PTX output doesn’t show these, ptxas performs the transformation and the predicated execution shows up if you decuda the cubin.