Launch latency with Sleep()


I was playing with addWithCuda program (the sample code that’s present upon creating a new CUDA project in Visual Studio). I modified it a bit so it looks like this:

      memcpy vector A HtoD
      memcpy vector B 
      launch kernel<<>>()
      memcpy result DtoH

Then I looked at it in Nsight Systems. The launch latency of kernel was about 30ms - which is more or less the time of Sleep. I find this behaviour weird: I would expect CPU to run the kernel then go to sleep, while GPU is working on a kernel while CPU is sleeping.

So I would like to know a bit more about this mechanism. How is it working, why is it working like that etc.

My systems specs if it is platform specific
Windows 10 x64
GF GTX 3090 / Quadro P2000
CUDA 11.3 / 10.2

You’re running into wddm command batching. You cannot switch your GTX 3090 out of WDDM mode, but it may be possible to switch your Quadro P2000 to TCC mode (using nvidia-smi). In that case this observation should mostly disappear.

1 Like

I have a Quadro P2000 running in TCC mode alongside a Quadro RTX 4000 running in WDDM mode. The obvious requirement is that the Quadro P2000 is not driving a display. I only use Quadro GPUs, so I cannot say whether mixing Quadro GPUs and consumer GPUs would cause an issue, but I cannot think of a reason why it should.

C:\Users\Norbert\My Programs>nvidia-smi
Wed Sep 29 16:53:56 2021
| NVIDIA-SMI 462.31       Driver Version: 462.31       CUDA Version: 11.2     |
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Quadro P2000        TCC  | 00000000:17:00.0 Off |                  N/A |
| 91%   81C    P0    61W /  75W |    360MiB /  5053MiB |    100%      Default |
|                               |                      |                  N/A |
|   1  Quadro RTX 4000    WDDM  | 00000000:65:00.0  On |                  N/A |
| 70%   85C    P0   102W / 125W |   1045MiB /  8192MiB |     95%      Default |
|                               |                      |                  N/A |