Sort a 2d array concurrently?

I have a 2d array(about 500x10000) and want to sort it by rows, My plan is to assign each row to a block, the sort algorithm I choose radix sort. But I think “global radix sort” may be better, I read the global radix sort in cudpp, but can not understand it. So I have 2 question: will any one help me understand global radix sort? Is is possible to slightly modify the implementation in cudpp to meet my requirement? Thx

Sorry for the silly question but do you mean sort each row by it self or sort the rows according to a certain column ?

I want to sort the row by itself, like this:

a[2][3] =

[3 1 5][6 9 8]

after sort:

a[2][3]=[1 3 5][6 8 9]