multiple gpu and unified memory

grynet · January 26, 2017, 10:22pm

Hello,

I am wondering how unified memory works with multiple gpu systems on nvlink and non-nvlink based systems. I provided a simple code. In that case, I have 4 gpus and I am binding a thread with gpu using OpenMP here. I guess, openmp makes kernel submission concurrent to different gpus. I am doing some calculating with different address of same array “a” and “b”.

My questions are that:

Is that code valid for both systems ?
Will cuda runtime does P2P GPU transfer for array "a", even if there is only read here ?
Will cuda runtime does P2P GPU transfer for array "b" ?

Thanks in advance

__global__ void kernel(int *a, int *b, int n) {
  int tid = ...
  b[tid] = a[tid];
} 

int main(int argc, char const *argv[]){
  int *a,*b, n=65536;
  cudaMallocManaged(a, n*sizeof(int));
  cudaMallocManaged(b, n*sizeof(int));

  int num_of_gpu = 4;
  
  #pragma omp parallel for
  for (int i = 0; i < num_of_gpu; ++i) {
    cudaSetDevice(i);
    kernel<<< 1, 1024 >>>(&a[(n/num_of_gpu)*i], &b[(n/num_of_gpu)*i], n/num_of_gpu);
  }
  return 0;
}

Robert_Crovella · February 2, 2017, 3:41am

You may want to read the documentation:

[url]Programming Guide :: CUDA Toolkit Documentation

[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-advanced-topics-hd[/url]

There’s not enough information in your question to answer it. nvlink doesn’t make code that was previously valid become “invalid”.

Daniel_Wong · March 29, 2022, 9:35pm

How to allocate unified memory that can use all available GPU memory?
I currently found that unified memory will only consider one GPU when calling cudaMallocManaged even we have multiple GPUs on the system.

Robert_Crovella · March 29, 2022, 9:44pm

It can’t be done. There is no facility in CUDA to have a single pointer allocation, part of which is associated with one GPU and part of it is associated with another GPU.

Having said that, you can allocate unified memory that is larger than the memory available on a single GPU. That requires a oversubscription ready system (pascal or newer GPU, and linux) with sufficient host memory, and in a multi-GPU setup there are additional issues to consider. These issues depend on system configuration (whether the GPUs are on the same fabric, or not). You can get insight by reading the managed memory section of the programming guide. I’ve already provided the links above.

Topic		Replies	Views
Unified memory with multiple GPUs and no P2P CUDA Programming and Performance cuda	5	86	January 9, 2025
Questions for multiple GPUs CUDA Programming and Performance	8	7161	April 20, 2009
Are GPU allocated pointers unique? CUDA Programming and Performance	5	797	January 4, 2018
cuda memory management in multi-gpu programming CUDA Programming and Performance	4	4745	December 4, 2014
Unified memory - more than 1 GPU Legacy PGI Compilers	5	2700	January 17, 2019
pinned memory with multiple GPUs CUDA Programming and Performance	4	2573	April 12, 2008
Does Cuda Unified Memory let multiple GPUs access randomly on non-overlapping regions of host array, concurrently? CUDA Programming and Performance	6	2326	March 30, 2018
Trivial question on memory managed and unified memory. Legacy PGI Compilers	4	1423	June 21, 2024
Can Unified Memory Migration use NVLink? CUDA Programming and Performance	2	733	October 12, 2021
OpenACC directives to transfer data between GPUs Legacy PGI Compilers	3	820	May 7, 2021

multiple gpu and unified memory

Related topics