cudaDeviceSynchronize sometimes hangs with driver versions after 411.63

I updated to 416.34 recently and noticed that sometimes a cudaDeviceSynchronize call hangs. Sometimes it recovers; sometimes it appears to hang indefinitely. Using several profiling tools (NSight, gpu-z, task manager) I can confirm that the GPU is not being utilized at all during this time. I’ve reproduced this on a 1080Ti, 1070Ti, and Quadro with WDDM drivers. This only seems to happen on Windows. I cannot reproduce this problem on Linux with a 2080 or a 1070. Using the Quadro card on Windows with TCC drivers also works fine. I tried reverting back to driver version 411.63, and still see the same problem. Has anyone else noticed this behavior? This seems like a bug in the driver.

I was told that cudaMalloc and cudaFree cause a device synchronization to take place, so I tried replacing my cudaDeviceSynchronize with a cudaMalloc followed immediately by a cudaFree. Surprisingly, that fixed the problem. So, what is cudaMalloc and cudaFree doing(or not doing) under the hood?

In my original post, I said that the problem was reproducible in Linux until I upgraded to CUDA 10. That was a mistake. The problem was never reproducible in Linux.

I am using Quadro P2000 / Win 7 Pro 64-bit / CUDA 8 / WDDM 411.63 and I have not experienced this issue so far. From your description it sounds like a possible livelock scenario?

I have no idea what kind of synchronization cudaMalloc/cudaFree perform under the hood, but it is not difficult to imagine that the amount of synchronization they require may be less than the full hammer provided by cudaDeviceSynchronize.

Thanks for the response njuffa. I’m going to continue trying to tease out the problem. Until your post, I had only tried to reproduce the problem in Windows 10, but I can now confirm that it is reproducible in Windows 7 as well.

Since you seem to have a reliable reproducer across a variety of platforms, it may be time to consider filing a bug with NVIDIA.