I tried to run the 6_Advanced/reductionMultiBlockCG sample code on my GTX 1080 and got this message: Selected GPU does not support Cooperative Kernel Launch. I thought any gpu with pascal or volta architecture support Cooperative Groups. Does grid synchronization work only with volta gpus?
Were you able to run cooperative groups based reduce kernel on GTX 1080( windows machine)? I’m able to run it on titanx(linux machine). Both 1080 and titanx are same generation sm_61. Not sure why it displays “Selected GPU does not support Cooperative Kernel Launch” on 1080.
It’s working on linux machine with GTX 1080 GPU. Not sure why it says it’s not supported on windows machine. Did anyone see similar issue on windows machine?
Just a guess: The WDDM driver on Windows forces the CUDA driver into kernel batching to mitigate the very high launch latencies caused by WDDM, and this interferes with cooperative kernel launch.
The check for this hypotheses would be to switch to the TCC driver (which operates the GPU as a “3D controller” rather than a “VGA”), but the TCC driver is not supported on consumer GPUs like the GTX 1080.
Does that mean cooperative kernel launch is not supported on windows machine? It’s not mentioned anywhere like that in the cuda toolkit.
It’s an omission of the docs, currently. I expect future CUDA docs to reflect the following:
you need to be running on the Linux platform (without MPS) or on current versions of Windows with the device in TCC mode (in order to use cooperative kernel launch)
I’m trying to run samples on a Linux machine with a GTX770 compute capability 3.0 but deviceProp.cooperativeLaunch returns a 0, is cooperative launch not supported on this older card or is there setup I need to perform in Linux (nvidia-smi --compute-mode etc?).
Thank you all for the help.