Launch latency with Sleep()


I was playing with addWithCuda program (the sample code that’s present upon creating a new CUDA project in Visual Studio). I modified it a bit so it looks like this:

      memcpy vector A HtoD
      memcpy vector B 
      launch kernel<<>>()
      memcpy result DtoH

Then I looked at it in Nsight Systems. The launch latency of kernel was about 30ms - which is more or less the time of Sleep. I find this behaviour weird: I would expect CPU to run the kernel then go to sleep, while GPU is working on a kernel while CPU is sleeping.

So I would like to know a bit more about this mechanism. How is it working, why is it working like that etc.

My systems specs if it is platform specific
Windows 10 x64
GF GTX 3090 / Quadro P2000
CUDA 11.3 / 10.2

You’re running into wddm command batching. You cannot switch your GTX 3090 out of WDDM mode, but it may be possible to switch your Quadro P2000 to TCC mode (using nvidia-smi). In that case this observation should mostly disappear.

1 Like