Are there CUDA library to sort each row in a matrix simultaneously?
if I use thrust sort library, I have to sequentially call thrust sort function to sort each row, if there are 10000 rows, I need to make 10000 sequential calls. This is not efficient.
Here is a snippet of code that can help understand it:
array a = randu(4, 3); // random matrix of 4 rows and 3 columns
array b = sort(a, 0); // Sort along the columns
array c = sort(a, 1); // Sort along the rows
print(a);
print(b);
print(c);