I have a question about profiler.
my code looks like this :
So i suppose this code should run strictly sequentially (there is no CPU instructions between cuda operations invocations). But looking into some profiling info (attached) there is some gap between kernel1 invocation (blue strip) and following memcpy (red strip). And it repeats further after every kernel invocation.
What can be reason of such GPU idle times things?