cuda memory management in multi-gpu programming

I am developing the multi-gpu cuda program.
I have some questions relating to the memory management.

  1. Is there unified memory to share between cuda enabled gpus?
    as i know, new developed unified memory addressing is for between gpu and cpu.
  2. How can I make local variable in cuda kernel or global variable to be unified memory between cpu and gpu? when not using cudaMallocManaged.
  3. local variable in cuda kernel and global variable is in which gpu when several gpu exists?
    and also, how to set them be in the specific gpu?
    Thanks in advance.
  1. Yes. If you read the programming guide section on Unified Memory, there is a whole section on behavior in a multi-GPU environment.
  2. I assume by saying “not using cudaMallocManged” you mean “I don’t want to use Unified Memory”. If you then want a variable to be “unified memory between cpu and gpu” (your question is a bit confusing) then you might also want to look at using cudaHostAlloc. If you’re simply asking how to create a variable in UM without using cudaMallocManaged, the other option is statically via device managed
  3. if you declare a statically allocated device variable, then a separate variable will be instantiated in each GPU that has been initialized in the CUDA context. The one you actually access will be determined by:
  • in device code, where the kernel is executing
  • in host code, which is the most recent cudaSetDevice() call prior to cudaMemcpyTo/FromSymbol()

Otherwise, if you dynamically allocate via cudaMalloc, then the variable will be instantiated on whichever GPU was identified in the most recent call to cudaSetDevice(). Local variables are local to the execution of whatever kernel they are in, so they are local to whatever device that kernel is executing on.

You’ll find answers to questions like these in the programming guide:

Thank you txbob.
for example, suppose that is configured.

device int nGlobalVariable;
device int nGlobalArray[3] = {1, 2, 3};
In this case, nGlobalVariable and nGlobalArray lay in which gpu?
And also, how can i notify them to lay in specific gpu memory?

They are in every GPU that was present and visible when the cuda context was initialized (separate copies in each GPU).

If you want them to be only on a single GPU, do not allocate statically, instead use cudaSetDevice to select the GPU you want, then use cudaMalloc

Oh, Very thanks, It’s what i want to know.