my computer has two Quadro P4000 connected, so far both in WDDM. I successfully use the second one to compute using Cuda using cudaSetDevice(1). Also, NSight VS Edition confirms me that Cuda only runs on the 2nd device.
Now I would like to use grid synchronization, for which, as far as I understand, the device needs to be in TCC (which is also reported to reduce latency, which my program would profit from). I switched the second device to TCC by running
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi -g 1 -dm 1
and rebooting. I then compile the conjugateGradientMultiBlockCG example using Visual Studion 2017 and add
devID = 1; checkCudaErrors(cudaSetDevice(devID));
after line 396 to use device one. I however get the following error:
Selected GPU (1) does not support Cooperative Kernel Launch, Waiving the run
Now, out of curiosity I also tried with both GPU in TCC mode using my onboard GPU for graphics. In that case, the conjugateGradientMultiBlockCG runs successfully out of the box on the first, and on the second GPU by adding the two lines mentioned above. Since I also need graphical output from one of the GPU, having both in TCC is not feasible.
Is this expected behavior? Is there a way to enable TCC one only one GPU and have it support Cooperative Kernel Launch?
Also, I noticed that with TCC enabled on the second GPU, performance is worse than having both in WDDM. Nsight was looking like something is being run on the second GPU and breaking the tighter packing of my kernels there. Could that be caused by DirectX being used on the first GPU in WDDM?