pairwise parallel comparison

In “Gem 3” Chapter 39, they talked about Radix Sort on CUDA, second part is doing bitonic merge sort, but for the pairwise parallel comparison, is it in some CUDA library? if not, how can we efficiently implement it on CUDA?

External Image

No, it’s all done in CUDA, no library. The details on the latest CUDA radix sort are here:
[url=“http://mgarland.org/files/papers/nvr-2008-001.pdf”]http://mgarland.org/files/papers/nvr-2008-001.pdf[/url]

Code here:
[url=“http://gpgpu.org/developer/cudpp”]http://gpgpu.org/developer/cudpp[/url]

Thx