I need to sort array of ~1 milion floats, this array is already on,
comparator is ‘>’
currently i’m using bitonic sort to sort 512 floats subarrays, then i’m using
‘merge’ sort to merge subarrays over and over again
(i’m caching both reads & writes with shared memory)
The average time of executing all the kernels (+ reading last element to host memory to be sure that the kernel has finished) is 0.37 sec. on my 8800 GT.
Can i do it better ? Are there better algorithms to sort on GPU ?
Thanks for any ideas :)
Well the CPU version takes about ~0.1 second on one core of core2 quad CPU.
Thats why i’m seeking more efficient GPU sort because the sorted data is then subsequently used by GPU in another kernels and cudaMemcpy’ing forth and back
is not a vise decision here.
The (modified) radix sort from particle example is probably not a solution here
because i’m sorting floats External Image