I am using several streams overlapping with memory-copy. The question is regarding the profiler. Every time I run it, I have almost different output. There are strange gaps between memory copy and/or streams like you see in the image http://goo.gl/pxjn10
I used cudaStreamQuerry(0) also, but nothing was different.
Do you have any idea why this is happening?
GeForce GT 750M
CUDA Driver Version: 6.5.18
Nsight Eclipse Edition Version: 6.0.0