I updated to 416.34 recently and noticed that sometimes a cudaDeviceSynchronize call hangs. Sometimes it recovers; sometimes it appears to hang indefinitely. Using several profiling tools (NSight, gpu-z, task manager) I can confirm that the GPU is not being utilized at all during this time. I’ve reproduced this on a 1080Ti, 1070Ti, and Quadro with WDDM drivers. This only seems to happen on Windows. I cannot reproduce this problem on Linux with a 2080 or a 1070. Using the Quadro card on Windows with TCC drivers also works fine. I tried reverting back to driver version 411.63, and still see the same problem. Has anyone else noticed this behavior? This seems like a bug in the driver.
I was told that cudaMalloc and cudaFree cause a device synchronization to take place, so I tried replacing my cudaDeviceSynchronize with a cudaMalloc followed immediately by a cudaFree. Surprisingly, that fixed the problem. So, what is cudaMalloc and cudaFree doing(or not doing) under the hood?