Fastest way to swap floats and integers? also looking for conditional swaps

Hi there,

I am currently doing some experiments with a sort algorithm and I was wondering if there is a really quick way of swapping the contents of two variables in shared memory.

// array to sort, assume pointer is initialized with a location in shared mem
float *var;

// this swap method is a classic, using a temp register
float tmp = var[0];
var[1]=var[0];
var[0]=tmp;

For integers, there are swap methods using xor that can work without a temp register, but this does not map well to floats… unless one treats the float as an integer representation. And these hacks also need 3 instructions to complete.

For applicability in sorting I need to compare the contents of both variables and swap conditionally, i.e.

// compiler uses predication hopefully instead of branch divergence
if (var[0] > var[1])
{
float tmp = var[0];
var[1]=var[0];
var[0]=tmp;
}

But now the question to CUDA experts. Are there any PTX instructions that could be used to accelerate this procedure? I would like the entire swap operation to consume as few clock cycles as possible. Are there any built-in swap or conditional swap primitives in the GPU? If not… this would go on my wish list!

Christian

Atomic CAS (Compare And Swap) instruction can do that look at PTX pdf. You’ll need SM11 compute capability for swa[ing in global memory and SM12 for swapping in shared mem. SM13 can swap 64bits float too.

I have compute capability 1.1 (G92 chip), and atomic swapping in global memory might be a bit slow… I wonder if Atomic CAS allows for any read/write coalescing when accessing global memory?

Christian