Profiled via NVVP an application which uses streams and 2 GPUs and noticed a gap in one of the timelines which seems to be unexplained (the arrow points to the gap);
This is a rather complicated process which involves a combination of host-to-device copies, custom kernels, multiple calls to the cufft library and device-to-host copies. This is all using streams and splitting the problem evenly between the two GPUs.
While I am happy with the overall amount of overlap between compute and bi-directional copies I wonder about that that gap for GPU #0 (which happens to be the GPU connected to the display).
Any ideas or ways to find out?
Windows 8.1 x64
NVVP (claims to be using CUDA 9.0 but I compile against CUDA 8.0)
2x GTX 1080TI