# Partial sort an array in CUDA or Thrust?

I am working on a problem that looks like this:

``````typedef pair<float,int> pt;
int N = 5000; // matrix size NxN
// element at (row,col) = (i,j) is at position i * N + j
vector<pt> data;
// data.size() = N * (N- 1) / 2, representing all pairs of objects
sort(data.begin(), data.end());

// pair objects greedily
for (int i = 0; i < data.size(); i++)
{
int x = data[i].second / N, y = data[i].second % N;
if (selected[x] || selected[y]) continue;

selected[x] = true;
selected[y] = true;
// ...
}
``````

I found that in my data, the first 5% of the array contribute to ~98% of the pairs. So the best method is to partially sort and find the first 5%, then find the remaining unpaired objects and work on them separately.

So, is there any library that support partial sort_by_key in CUDA? `thrust::sort_by_key` always sort the whole array, which is not needed in my case.

An alternative is to find K-th smallest element (like `std::nth_element`), is there a similar function in any GPU library?

Thanks!

Edit: actually I don’t even need to find K largest elements, I only need to find the K-th largest element. Because I will process the rest on CPU anyway

You could sort chunks of size `k` and merge them in parallel, always keeping only the `k` smallest elements of the resulting range.

I’ve tried your suggestion and it’s actually slower.

``````int K = N / 10;
for (int t = 0; t < 10; t++) {
int starter  = t * K;
int ender = min(starter + K, N);
thrust::sort(thrust::device, d_ptr + starter, d_ptr + ender);
}
``````

With `N = 25 * 10^6`, `d_ptr*` as (equivalent of) `std::pair<float,int>`, sorting the whole array takes 15ms. While sorting by chunks alone takes ~20ms, so I didn’t add merging them.