I have an application running in 5 milliseconds. This is fine of course, but I’d like to compress it, since timeline profiling shows large empty gaps :
Since I can’t upload a picture here, I’ll describe the timeline with words…
I first have 7 kernels launched, with the usual 20micro between them.
Then I have a huge white gap of 1 millisecond in the middle of which there is a tiny cudaMAlloc of 20micro.
After that application resumes with more kernels. And the pattern repeats a few more times.
How can I remove that gap?
PS : I work on stream 1 and never synchronize anything.