Launch latency with Sleep()

matkrzyz · September 13, 2021, 5:53pm

Hello,

I was playing with addWithCuda program (the sample code that’s present upon creating a new CUDA project in Visual Studio). I modified it a bit so it looks like this:

addWithCuda(...){
   mallocs
   for(i<100){
      memcpy vector A HtoD
      memcpy vector B 
      launch kernel<<>>()
      Sleep(30)
      memcpy result DtoH
   }
}

Then I looked at it in Nsight Systems. The launch latency of kernel was about 30ms - which is more or less the time of Sleep. I find this behaviour weird: I would expect CPU to run the kernel then go to sleep, while GPU is working on a kernel while CPU is sleeping.

So I would like to know a bit more about this mechanism. How is it working, why is it working like that etc.

My systems specs if it is platform specific
Windows 10 x64
GF GTX 3090 / Quadro P2000
CUDA 11.3 / 10.2

Robert_Crovella · September 13, 2021, 5:58pm

You’re running into wddm command batching. You cannot switch your GTX 3090 out of WDDM mode, but it may be possible to switch your Quadro P2000 to TCC mode (using nvidia-smi). In that case this observation should mostly disappear.

njuffa · September 29, 2021, 11:56pm

I have a Quadro P2000 running in TCC mode alongside a Quadro RTX 4000 running in WDDM mode. The obvious requirement is that the Quadro P2000 is not driving a display. I only use Quadro GPUs, so I cannot say whether mixing Quadro GPUs and consumer GPUs would cause an issue, but I cannot think of a reason why it should.

C:\Users\Norbert\My Programs>nvidia-smi
Wed Sep 29 16:53:56 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 462.31       Driver Version: 462.31       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2000        TCC  | 00000000:17:00.0 Off |                  N/A |
| 91%   81C    P0    61W /  75W |    360MiB /  5053MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 4000    WDDM  | 00000000:65:00.0  On |                  N/A |
| 70%   85C    P0   102W / 125W |   1045MiB /  8192MiB |     95%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+