I’ve run into a problem that is killing my program execution speed. Mainly, my GPU seems to go idle for approximately 15 to 16 milliseconds after running 500 iterations in a single program. Essentially, prior to 500 iterations, my kernels each run in under 1 ms. After 500, it will run two or three iterations at sub millisecond timeframes then do a single iteration at about 15 to 16 ms.
The timeframe looks like this if its confusing
1st 500 iterations: 0 in terms of milliseconds.
501st iter: 0
502nd iter: 0
503rd: 15
504th:0
505th:0
506th:0
507th:16
and so on until my program ends. I have confirmed that even calling a kernel with nothing in it also results in the same amount of idle time, albeit far more spread apart. The only thing I can think of that this can be is another process/thread preempting my program while its running on the CPU. Has anyone else run into this problem?
Noticed it too.
when running my code in cuda profiler I have this “white spots” where - for no reason - GPU is idling, although my host code isn’t doing anything special.