Help understanding the changes in the CUDA runtime API cudaSetDeviceFlags from v10.2 to v11.0

Hi, from v10.2 to v11.0, the documentation of the CUDA runtime API cudaSetDeviceFlags was changed:

v10.2

Records flags as the flags to use when initializing the current device. If no device has been made current to the calling thread, then flags will be applied to the initialization of any device initialized by the calling host thread, unless that device has had its initialization flags set explicitly by this or any host thread.

If the current device has been set and that device has already been initialized then this call will fail with the error cudaErrorSetOnActiveProcess. In this case it is necessary to reset device using cudaDeviceReset() before the device’s initialization flags may be set.

v11.0

Records flags as the flags for the current device. If the current device has been set and that device has already been initialized, the previous flags are overwritten. If the current device has not been initialized, it is initialized with the provided flags. If no device has been made current to the calling thread, a default device is selected and initialized with the provided flags.

My project adopts only one thread and one GPU, but it needs to run on the multi-GPU clusters, some of which are still using v10.2 unfortunately. It seems that the behavior of this API has changed: in v11.0, an initialized device will always be made current to the calling thread after this API call, unlike in v10.2, no device initialization is guaranteed, nor is any device made current to the calling thread by this API. – Please correct me if I am wrong.

Another observation is that in v11.0, setting the flags to a current & initialized device will no longer fail with the cudaErrorSetOnActiveProcess error, whereas in v10.2 it will (and I saw this error from the clusters running v10.2). I think this is the reason why before v10.2 some developers suggested setting the flags before calling cudaSetDevice. However, this still confuses me in two ways. (1) There is nowhere in the documentation that implies that calling cudaSetDevice will initialize the device in addition to making it current to the calling thread, and my understanding is setting the flags of a current & uninitialized device is still safe in v10.2. (2) In v11.0, since the device to be used may be different than the default device, I should call cudaSetDevicebefore the cudaSetDeviceFlags in order to leave the unused device uninitialized, which is different than (1). What would be the best way to code it for both v10.2 and v11.0? I hope these questions can get clarified.

Thanks.

One possible approach:

You can conditionally select your code based on the detected CUDA runtime version, which is queryable at runtime.