Unexplained gap in profiling multi-GPU timeline

Profiled via NVVP an application which uses streams and 2 GPUs and noticed a gap in one of the timelines which seems to be unexplained (the arrow points to the gap);


This is a rather complicated process which involves a combination of host-to-device copies, custom kernels, multiple calls to the cufft library and device-to-host copies. This is all using streams and splitting the problem evenly between the two GPUs.

While I am happy with the overall amount of overlap between compute and bi-directional copies I wonder about that that gap for GPU #0 (which happens to be the GPU connected to the display).

Any ideas or ways to find out?

CUDA 8.0
Windows 8.1 x64
NVVP (claims to be using CUDA 9.0 but I compile against CUDA 8.0)
2x GTX 1080TI

1 Like

I also see such gaps in the profiler’s output https://pasteboard.co/JxCDCLk.jpg
I don’t know if that means the GPU is idle in the gaps. If yes, why?