Hi,

I have a weird behavior of my CUDA code.

I have a matrix with N rows and M cols.

I’d like to sort independently each column.

So, I do an horizontal grid to threw on thread per column.

Each thread use a combo sort algorithm.

My problem is that when I increase the dimension M, the time spent to sort my matrix change but with a weird behavior. Let’s have a look on the enclosed figure. When M is a multiple of 8 or 16, the computation is 2 time faster!

Thanks for your help.

Vince