Hi there,
I am currently doing some experiments with a sort algorithm and I was wondering if there is a really quick way of swapping the contents of two variables in shared memory.
// array to sort, assume pointer is initialized with a location in shared mem
float *var;
// this swap method is a classic, using a temp register
float tmp = var[0];
var[1]=var[0];
var[0]=tmp;
For integers, there are swap methods using xor that can work without a temp register, but this does not map well to floats… unless one treats the float as an integer representation. And these hacks also need 3 instructions to complete.
For applicability in sorting I need to compare the contents of both variables and swap conditionally, i.e.
// compiler uses predication hopefully instead of branch divergence
if (var[0] > var[1])
{
float tmp = var[0];
var[1]=var[0];
var[0]=tmp;
}
But now the question to CUDA experts. Are there any PTX instructions that could be used to accelerate this procedure? I would like the entire swap operation to consume as few clock cycles as possible. Are there any built-in swap or conditional swap primitives in the GPU? If not… this would go on my wish list!
Christian