multiple gpu and unified memory

Hello,

I am wondering how unified memory works with multiple gpu systems on nvlink and non-nvlink based systems. I provided a simple code. In that case, I have 4 gpus and I am binding a thread with gpu using OpenMP here. I guess, openmp makes kernel submission concurrent to different gpus. I am doing some calculating with different address of same array “a” and “b”.

My questions are that:

  • Is that code valid for both systems ?
  • Will cuda runtime does P2P GPU transfer for array "a", even if there is only read here ?
  • Will cuda runtime does P2P GPU transfer for array "b" ?

Thanks in advance

__global__ void kernel(int *a, int *b, int n) {
  int tid = ...
  b[tid] = a[tid];
} 

int main(int argc, char const *argv[]){
  int *a,*b, n=65536;
  cudaMallocManaged(a, n*sizeof(int));
  cudaMallocManaged(b, n*sizeof(int));

  int num_of_gpu = 4;
  
  #pragma omp parallel for
  for (int i = 0; i < num_of_gpu; ++i) {
    cudaSetDevice(i);
    kernel<<< 1, 1024 >>>(&a[(n/num_of_gpu)*i], &b[(n/num_of_gpu)*i], n/num_of_gpu);
  }
  return 0;
}

You may want to read the documentation:

[url]Programming Guide :: CUDA Toolkit Documentation

[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-advanced-topics-hd[/url]

There’s not enough information in your question to answer it. nvlink doesn’t make code that was previously valid become “invalid”.

1 Like

How to allocate unified memory that can use all available GPU memory?
I currently found that unified memory will only consider one GPU when calling cudaMallocManaged even we have multiple GPUs on the system.

It can’t be done. There is no facility in CUDA to have a single pointer allocation, part of which is associated with one GPU and part of it is associated with another GPU.

Having said that, you can allocate unified memory that is larger than the memory available on a single GPU. That requires a oversubscription ready system (pascal or newer GPU, and linux) with sufficient host memory, and in a multi-GPU setup there are additional issues to consider. These issues depend on system configuration (whether the GPUs are on the same fabric, or not). You can get insight by reading the managed memory section of the programming guide. I’ve already provided the links above.