need to sort an array of int2, fairly large (n>(1<<22)), and am getting some weird errors. There seems to be no default comparator function for that type, so I have tried both a comparison object and an operator definition;
struct compare_int2{
__host__ __device__ bool operator()(int2 a,int2 b){return (a.x!=b.x) ? (a.x<b.x):(a.y<b.y);}
};
//
cudaMalloc((void **)&D_dist,bound*sizeof(int2));
thrust::device_ptr<int2> D_beg=thrust::device_pointer_cast(D_dist);
compare_int2 cmp;
/*call some kernel which fills int2 array*/
//everything seems fine until here
thrust::sort(D_beg,D_beg+bound,cmp);//sort in device the now populated int2 array
//copy back that sorted array to host
I am getting a thrust::system::system_error when it gets to the sort() statement.
Do not want to use the thrust::device vector(), and need to use a pair of ints, either as std::pair() or int2.
Probably missing something obvious, because I have no issues sorting int,float,double etc.
Can someone at least tell me if there is an obvious error, or something deeper. The error message is not specific.
When I comment out the sort call, and run there are no errors which result from the kernel, so this is either an error in the thrust::sort() using the int2 type, or I am incorrectly defining/passing the comparator functor.
I can sort the int2 type on the CPU using STL::sort(), and oddly it is faster when I use the int2 type than when I do the same sort using std::pair<int,int>.
Ok, after some testing I found that is has to do with the size of the int2 array. When the size is less than or equal to 8192 it works, and when the amount is greater than that it crashes and gives me a system error related to memory.
Also this ONLY applies to the int2 type it seems, because I also tries just sorting a large array of ints and that had no bugs.
I do need to sort large a large array of pairs on the device as fast as possible, is there another way?
Over on the Google thrust user group ‘txbob’ did help me figure out this error.
The problem was that I was running thrust in debug -G, which is not allowed. I had gotten away with doing this before while using thrust even using large data sets, using max_element and sort, but when I started using structs it all went to hell…
What is even more embarrassing is that I was running all my tests based the times with -G on, which was slowing down the code quite a bit. So my all my CUDA projects in actuality run about 4-6 times faster than I had originally thought.
Now I need to update all my timing tables… Hope some others out there learn from my mistake.