Two Quadro P4000. deviceProp.cooperativeLaunch one true if both in TCC. Windows 10, Cuda 9.2

virtamario · July 16, 2019, 3:07pm

Hi,

my computer has two Quadro P4000 connected, so far both in WDDM. I successfully use the second one to compute using Cuda using cudaSetDevice(1). Also, NSight VS Edition confirms me that Cuda only runs on the 2nd device.
Now I would like to use grid synchronization, for which, as far as I understand, the device needs to be in TCC (which is also reported to reduce latency, which my program would profit from). I switched the second device to TCC by running

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi -g 1 -dm 1

and rebooting. I then compile the conjugateGradientMultiBlockCG example using Visual Studion 2017 and add

devID = 1;
	checkCudaErrors(cudaSetDevice(devID));

after line 396 to use device one. I however get the following error:
Selected GPU (1) does not support Cooperative Kernel Launch, Waiving the run
caused by

!deviceProp.cooperativeLaunch

Now, out of curiosity I also tried with both GPU in TCC mode using my onboard GPU for graphics. In that case, the conjugateGradientMultiBlockCG runs successfully out of the box on the first, and on the second GPU by adding the two lines mentioned above. Since I also need graphical output from one of the GPU, having both in TCC is not feasible.

Is this expected behavior? Is there a way to enable TCC one only one GPU and have it support Cooperative Kernel Launch?

Also, I noticed that with TCC enabled on the second GPU, performance is worse than having both in WDDM. Nsight was looking like something is being run on the second GPU and breaking the tighter packing of my kernels there. Could that be caused by DirectX being used on the first GPU in WDDM?

virtamario · July 16, 2019, 3:13pm

Some more details:

I noticed that nvidia-smi reports using Cuda 10.2, and infoRom corruption:

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi
Tue Jul 16 16:39:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 431.02       Driver Version: 431.02       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000       WDDM  | 00000000:01:00.0  On |                  N/A |
| 47%   45C    P0    29W / 105W |    428MiB /  8192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro P4000        TCC  | 00000000:03:00.0 Off |                  N/A |
| 47%   40C    P8     5W / 105W |      0MiB /  8117MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1504    C+G   Insufficient Permissions                   N/A      |
|    0      4772    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0      8800    C+G   ...11411.0_x64__8wekyb3d8bbwe\Video.UI.exe N/A      |
|    0      9200    C+G   ...48.51.0_x64__kzf8qxf38zg5c\SkypeApp.exe N/A      |
|    0      9572    C+G   C:\Windows\explorer.exe                    N/A      |
|    0     10436    C+G   ....451.0_x64__8wekyb3d8bbwe\YourPhone.exe N/A      |
|    0     10468    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    0     13640    C+G   ...0076.0_x64__8wekyb3d8bbwe\onenoteim.exe N/A      |
|    0     13868    C+G   ...4.0_x64__8wekyb3d8bbwe\WinStore.App.exe N/A      |
|    0     14304    C+G   ...hell.Experiences.TextInput.InputApp.exe N/A      |
+-----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:01:00.0

However I am pretty sure I uninstalled Cuda 10 using a Windows System Restore point.
E.g. the Visual Studio project configuration reports a CUDA C/C++ > Command Line of

# Driver API (NVCC Compilation Type is .cubin, .gpu, or .ptx)
set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits0\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio017\Professional\VC\Tools\MSVC4.16.27023\bin\HostX86\x64" -x cu -rdc=true -I./ -I../../common/inc   -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -o x64/Debug/%(Filename)%(Extension).obj "%(FullPath)"

# Runtime API (NVCC Compilation Type is hybrid object or .c file)
set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits0\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio017\Professional\VC\Tools\MSVC4.16.27023\bin\HostX86\x64" -x cu -rdc=true -I./ -I../../common/inc   -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN32 -Xcompiler "/EHsc  /nologo  /FS /Zi  /MTd " -o x64/Debug/%(Filename)%(Extension).obj "%(FullPath)"

Robert_Crovella · July 16, 2019, 3:15pm

You might need to set devID to zero. Changing from WDDM to TCC may cause a modification of the CUDA enumeration order.

I wouldn’t be surprised if your TCC device is at devID 0 now, and your WDDM device has moved to 1. (note this is not necessarily the same as the order in nvidia-smi, which is PCI enumeration order)

You can confirm CUDA enumeration order using deviceQuery sample code.

Topic		Replies	Views
GTX 1080 does not support Cooperative Kernel Launch? CUDA Setup and Installation	9	2209	November 13, 2024
Since when was cooperative launch now supported in Windows non TCC mode? CUDA Programming and Performance	3	94	November 13, 2024
Computation mode only CUDA Setup and Installation	2	1447	September 11, 2019
Concurrent cooperative kernel launches? CUDA Programming and Performance	4	81	April 17, 2025
Do Quadro P600 and P620 Support TCC? CUDA Programming and Performance	4	1928	August 22, 2018
cudaLaunchCooperativeKernel behaviour CUDA Programming and Performance	9	56	August 11, 2025
WDDM and TCC mode cannot coexist? CUDA Programming and Performance	2	2384	January 21, 2017
Does GTX 1050 ti support grid_group? CUDA Setup and Installation	1	795	October 8, 2019
Nvidia Quadro RTX 5000 support for TCC? CUDA Programming and Performance	12	3997	July 22, 2023
Problems with Setting a Titan RTX to TCC mode CUDA Setup and Installation	8	1811	June 6, 2019

Two Quadro P4000. deviceProp.cooperativeLaunch one true if both in TCC. Windows 10, Cuda 9.2

Related topics