thrust sort problem

I want to sort each row in a large matrix on GPU using thrust library.

cudaMalloc( (void**)&d_distMatrix, sizeof(float)nm); //n rows and m columns of matrix, their values are already assigned correctly.

for(i=0; i<n; i++){

   thrust::device_ptr<float> dev_ptr(d_distMatrix+i*m);
   thrust::sort(dev_ptr, dev_ptr + m);

}

when m is small, it can return correct results. But when m is larger, i.e., more than 10000, the sorted results are rubbish values!!

My device memory is large enough, 6GB. There is no error message.

Does anyone know what is wrong with my codes?

Thanks a lot~

Did you manage to solve it? I have the same problem.