I have a matrix of size MxN (width x height).
I have to sort independently the value of each column.
Now, I threw one thread by column, each thread sorting the column.
But this is not very efficient. Do you have any suggestion to optimize that?