I am working on a problem that looks like this:

```
typedef pair<float,int> pt;
int N = 5000; // matrix size NxN
// element at (row,col) = (i,j) is at position i * N + j
vector<pt> data;
// data.size() = N * (N- 1) / 2, representing all pairs of objects
sort(data.begin(), data.end());
// pair objects greedily
for (int i = 0; i < data.size(); i++)
{
int x = data[i].second / N, y = data[i].second % N;
if (selected[x] || selected[y]) continue;
selected[x] = true;
selected[y] = true;
// ...
}
```

I found that in my data, the first 5% of the array contribute to ~98% of the pairs. So the best method is to partially sort and find the first 5%, then find the remaining unpaired objects and work on them separately.

So, is there any library that support partial sort_by_key in CUDA? `thrust::sort_by_key`

always sort the whole array, which is not needed in my case.

An alternative is to find K-th smallest element (like `std::nth_element`

), is there a similar function in any GPU library?

Thanks!

Edit: actually I don’t even need to find K largest elements, I only need to find *the* K-th largest element. Because I will process the rest on CPU anyway