Is it possible to somehow fuse thrust::partition and thrust::sort?

Okay, I have 3 arrays of integers, pa, ta, la, each of size N. The values in each array will either be a value greater than or equal to 0 or it will be -1. We do have the guarantee that if 1 entry in an array is -1 then the entire array tuple will be full of -1’s. We’re operating on a zip_iterator in this instance.

What we want to do is, partition these 3 arrays so that all natural numbers are to the left and -1 is to the right. We want a partition as well because we want the location of where they split in the array.

But at the same time, it would be nice to sort the left partitioned side. Ideally I’d love to somehow fit all of this into one kernel but I’m fine using thrust::partition and then thrust::sort in a sequential order.

This a rough, untested attempt at doing what I want:

#include <thrust/execution_policy.h>
#include <thrust/sort.h>
#include <thrust/partition.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
#include <thrust/distance.h>

auto get_assoc_size(
  int* pa,
  int* ta,
  int* la,
  int const assoc_capacity) -> int
{
  using int_tuple = thrust::tuple<int, int, int>;
  
  int assoc_size = 0;

  auto begin = thrust::make_zip_iterator(thrust::make_tuple(pa, ta, la));

  decltype(begin) middle = thrust::partition(
    thrust::device,
    begin, begin + assoc_capacity,
    [] __device__ (int_tuple const& a) -> bool
    {
      return thrust::get<0>(a) != -1;
    });
    
  assoc_size = thrust::distance(begin, middle);
  
  thrust::sort(
    thrust::device,
    begin, begin + assoc_size,
    [] __device__ (
      int_tuple const& a,
      int_tuple const& b) -> bool
    {
      int const a_pa = thrust::get<0>(a);
      int const b_pa = thrust::get<0>(b);
      
      if (a_pa < b_pa) {
        return true;
      }
      
      if (a_pa > b_pa) {
        return false;
      }
      
      if (a_pa == b_pa) {
        int const a_ta = thrust::get<1>(a);
        int const b_ta = thrust::get<1>(b);
        
        return a_ta < b_ta;
      }
      
      return true;
    });
  
  return assoc_size;
}

I can certainly put the sorting and partitioning into a single operation, but I don’t know how to do that plus give you the partition point in a single operation. It might be possible, I just don’t have any ideas about how to do it at the moment.

However, it might be that putting the sorting+partitioning into a single operation followed by thrust::find might be cheaper than partition followed by sort. If you had a small number of -1 entries, it might be quicker. If you had a large number of -1 entries, it might not. It’s difficult for me to guess at what would be better without benchmarking.

sorting usually require multiple passes over data (N*log(N)), so partitioning+sorting should be faster unless you have too small amount of -1 values.

it’s easy to combine sorting+partitioning by treating values as unsigned. then use thrust::partition_point i.e. binary search to quickly find partition bounds

Huh, so how would you guys go about merging the partition and the sort? I get the idea of using unsigned conversions though because I got that tip from this forum before so I know how to sort really well but how can I partition + sort at the same time?

sort can take an ordering functor (or lambda if you prefer)

pass it a functor that orders things the way you want

$ cat t1236.cu
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/copy.h>

typedef int mytype;

struct my_sort_functor
{
  __host__ __device__
  bool operator()(mytype t1, mytype t2){
    if (t1 == -1) return false;
    if (t2 == -1) return true;
    return (t1<t2);
    }
};

int main(){

  mytype data[] = {3, -1, 1, -1, 2, -1};
  int ds = sizeof(data)/sizeof(mytype);
  thrust::device_vector<mytype> d(data, data+ds);
  thrust::sort(d.begin(), d.end(), my_sort_functor());
  thrust::copy(d.begin(), d.end(), std::ostream_iterator<mytype>(std::cout, ","));
  std::cout << std::endl;
  return 0;
}
$ nvcc -o t1236 t1236.cu
$ ./t1236
1,2,3,-1,-1,-1,
$

since you need those -1 = FFFF at the end of array, just sort it as unsigned numbers

instead of partition (prior to sort), you can compact the array, filtering out -1 tuples, then fill remaining part with -1 tuples. it looks like more efficient approach than partitioning

txbob, your Thrust-fu is always a sight to behold. I really like your thrust::copy into std::ostream_iterator. That’s brilliant lol. I would’ve never thought to combine Thrust and the STL in such a way. I always learn so much from you.

Bulat, I’m going to research your method a little bit. I wouldn’t have thought of stream compaction and filling. It would probably be significantly faster to remove every -1 tuple entry then sort that remaining part and then fill it. Partitioning is “too smart” for this algorithm so I think the “dumb” approach (i.e. just blindly filling with a pre-determined value) is perfect for this problem.

Edit: I also do recommend experimenting with device-side lambdas. They’re actually pretty pleasant to work with. Granted, I know that was a toy example but writing your functor inline with your function call is just too good.

Edit edit:
This appears to be a working implementation

#include <thrust/execution_policy.h>
#include <thrust/sort.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
#include <thrust/distance.h>
#include <thrust/remove.h>
#include <thrust/fill.h>

auto get_assoc_size(
  int* pa,
  int* ta,
  int* la,
  int const assoc_capacity) -> int
{
  using int_tuple = thrust::tuple<int, int, int>;
  
  int assoc_size = 0;

  auto begin = thrust::make_zip_iterator(thrust::make_tuple(pa, ta, la));

  decltype(begin) new_last = thrust::remove_if(
    thrust::device,
    begin, begin + assoc_capacity,
    [] __device__ (int_tuple const& a) -> bool
    {
      return thrust::get<0>(a) == -1;
    });
    
  assoc_size = thrust::distance(begin, new_last);
  
  thrust::sort(
    thrust::device,
    begin, begin + assoc_size,
    [] __device__ (
      int_tuple const& a,
      int_tuple const& b) -> bool
    {
      int const a_pa = thrust::get<0>(a);
      int const b_pa = thrust::get<0>(b);
      
      if (a_pa < b_pa) {
        return true;
      }
      
      if (a_pa > b_pa) {
        return false;
      }
      
      if (a_pa == b_pa) {
        int const a_ta = thrust::get<1>(a);
        int const b_ta = thrust::get<1>(b);
        
        return a_ta < b_ta;
      }
      
      return true;
    });
    
  thrust::fill(
    thrust::device,
    new_last, new_last + (assoc_capacity - assoc_size),
    thrust::make_tuple(-1, -1, -1));
  
  return assoc_size;
}