I need to sort array of ~1 milion floats, this array is already on,
comparator is ‘>’
currently i’m using bitonic sort to sort 512 floats subarrays, then i’m using
‘merge’ sort to merge subarrays over and over again
(i’m caching both reads & writes with shared memory)
The average time of executing all the kernels (+ reading last element to host memory to be sure that the kernel has finished) is 0.37 sec. on my 8800 GT.
Can i do it better ? Are there better algorithms to sort on GPU ?
Thanks for any ideas :)