Hi,
I’m running a simple Cuda test with thrust and see big jumps in the timings. It ranges from 0.2ms to 1.5.ms … it also happens when I use thrust’s cache allocation mechanism.
The profiler seems to indicate its a wasted time on the CPU between the copy_if and sort.
Any idea would be greatly appreciated.
Blockquote
Elapsed time: 0.291712ms
Elapsed time: 0.289984ms
Elapsed time: 0.325888ms
Elapsed time: 0.308384ms
Elapsed time: 0.317024ms
Elapsed time: 0.280864ms
Elapsed time: 0.287104ms
Elapsed time: 0.287264ms
Elapsed time: 1.27184ms
Elapsed time: 0.310432ms
Elapsed time: 0.295648ms
Elapsed time: 0.288224ms
Elapsed time: 0.284416ms
Elapsed time: 0.810144ms
Elapsed time: 0.291456ms
Elapsed time: 0.262528ms
Elapsed time: 0.308128ms
test.cu (2.3 KB)