Kernel call overhead Is this overhead or am I blocking with the CPU?


I’m running an algorithm consisting of several kernels, which are called in a loop - combining them into one big kernel maybe would be possible, but ugly… before I’m investing any time there, I’d like to check ;)

The thing is: Viusal Profile gives me the following numbers:
Kernel1 - 93 calls - 5680 usec (63%)
Kernel2 - 93 calls - 1051 usec (11%)
Kernel3 - 47 calls - 885 usec (10%)
and 4 others, accounting for the remaining few %.
In total that would be around 10ms for all of them, which should yield 100 executions per second. The problem is that inbetween the kernel calls there is a delay, the time width plot shows huge white gaps between the individual kernels. In practice I achieve 10 executions per second. There is nothing happening between kernel calls on the CPU… is there a way how I can reduce this idle time/overhead?

Kind regards

Edit: I’m attaching a screenshot of the time width plot for better illustration. I’m referring to the huge white blocks between kernel calls, where not even the CPU is doing anything (gray blocks at the bottom).


did you find by chance a solution to your problem ? I have the same situation now.