cudaLaunchCooperativeKernelMultiDevice fails with invalid device ordinal

armagetron · November 13, 2017, 9:20am

I have adopted the vectorAdd example from the CUDA SDK for different kernel launch semantics. See code.

Everything work except for the last call to

cudaLaunchCooperativeKernelMultiDevice(launchParams, numDevices)

I get the error: invalid device ordinal. My system has two GPUs installed and from my understanding the code should start.

Can someone please give a hint on what is wrong with my code.

Robert_Crovella · November 13, 2017, 8:15pm

what are the two GPUs, specifically?

armagetron · November 14, 2017, 7:44am

The GPUs are two Tesla P100-SXM2-16GB.

I managed to get the code working. The problem was, that the streams need to be created per device (with a previous call to cudaSetDevice).

In general it would be nice to have a sample how to use the function in the SDK samples.

Robert_Crovella · November 14, 2017, 7:51am

Yes, streams (and events) are per-device entities. This is covered in the programming guide as well as CUDA multi-GPU sample code e.g. simpleMultiGPU.