CUDA's thrust weird performance issue

Hi,
I’m running a simple Cuda test with thrust and see big jumps in the timings. It ranges from 0.2ms to 1.5.ms … it also happens when I use thrust’s cache allocation mechanism.
The profiler seems to indicate its a wasted time on the CPU between the copy_if and sort.
Any idea would be greatly appreciated.

Blockquote
Elapsed time: 0.291712ms
Elapsed time: 0.289984ms
Elapsed time: 0.325888ms
Elapsed time: 0.308384ms
Elapsed time: 0.317024ms
Elapsed time: 0.280864ms
Elapsed time: 0.287104ms
Elapsed time: 0.287264ms
Elapsed time: 1.27184ms
Elapsed time: 0.310432ms
Elapsed time: 0.295648ms
Elapsed time: 0.288224ms
Elapsed time: 0.284416ms
Elapsed time: 0.810144ms
Elapsed time: 0.291456ms
Elapsed time: 0.262528ms
Elapsed time: 0.308128ms

test.cu (2.3 KB)

Hi Eyalhir74,

You see a big jump in timing between what condition and what new condition?

Hi,
Please see some nvprof output. The performance jumps from time to time while I run the copy_if + sort in a loop.

3


looks like on the CPU side. Please use Nsight Systems instead of Nvprof. It can show you a lot more on the CPU side.