What happens with cudaMemcpyAsync if multi-GPU code runs on just 1 GPU?

I’ve got this code designed for multi-GPU use, and it uses cudaMemcpyAsync for async data transfer between devices. But what happens if I run this code with just 1 GPU when at compile time the compiler does not know how many GPUs will be used (this is supplied at run-time through a config file)?

Does the compiler consider this and place a branch in the compiled code so that at run time when the number of devices is known there is no call made to this cudaMemcpyAsync if only 1 GPU is made?

No, the compiler doesn’t consider anything about the currently installed GPUs or whether there will or won’t be any GPUs available when a code is run.

There’s no general answer to this question. You would have to study the code call-by-call to see what its behavior would be.

For example, a call like this:

cudaSetDevice(1);

would fail on a machine with only a single GPU, and the currently selected GPU for subsequent CUDA API operations would still be zero.