Multi-GPU Unified Memory and Communication

jbehura07 · October 13, 2023, 2:43pm

Hi Mat,

I am planning to build a more powerful GPU server. I am debating between a multi-GPU system (with cheaper GPUs) and a single powerful GPU (more expensive).

Before I venture out on that, I am wondering about the following:

whether the UM when using multiple devices with CC>6.x treats the multiple GPUs like a single GPU.
whether GPU-GPU data communication is automatically handled by the compiler, so that I don’t have to make any changes to my single-GPU OpenACC code.

Would appreciate your input on the above. As always, if there is any literature on this, please feel free to point me to it.

Cheers,
Jyoti

MatColgrove · October 13, 2023, 3:24pm

Hi Jyoti,

Multiple GPUs are treated separately so you need to use MPI with each rank assigned to a particular GPU. There are other methods to support multi-GPU, but I find using MPI the easiest method and then would allow you to scale across multiple systems in the future.

CUDA Aware MPI, which does GPU direct communication, is enabled by default with the MPI versions we ship with the compilers. However, you need to pass the device pointers to the MPI calls by using an OpenACC “host_data” region. Passing UM pointers will work, but MPI wont recognize these as device pointers so wont use GPU direct. Hence if using MPI, I recommend you manually managed your data via data regions.

See the following post with the code I use for device assignment as well as links to some training:

-Mat

jbehura07 · October 13, 2023, 3:26pm

Wonderful! Thanks, Mat!

MatColgrove · October 13, 2023, 5:13pm

Oh, there’s also a class you can take which is derived from Jiri’s talks: https://developer.nvidia.com/openacc-advanced-course

Also, since the link I pointed you uses C/C++, here’s the equivalent Fortran version to do device assignment.

% cat device_assign.F90
!#######################################################################
      program device_assign
!
!-----------------------------------------------------------------------
!
      use openacc
      use mpi
      integer :: dev, devNum, local_rank, local_comm
      integer(acc_device_kind) :: devtype
      integer :: ierr, my_rank
!
!-----------------------------------------------------------------------
!
! ****** Initialize MPI.
!
      call MPI_Init(ierr)
!
! ****** Get the index (rank) of the local processor in
! ****** communicator MPI_COMM_WORLD in variable my_rank.
!
      call MPI_Comm_rank (MPI_COMM_WORLD,my_rank,ierr)
!

!
! ****** Set the Accelerator device number based on local rank
!
     call MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, &
          MPI_INFO_NULL, local_comm,ierr)
     call MPI_Comm_rank(local_comm, local_rank,ierr)
     devtype = acc_get_device_type()
     devNum = acc_get_num_devices(devtype)
     dev = mod(local_rank,devNum)
     call acc_set_device_num(dev, devtype)
     print *, "RANK: ", my_rank, " Using device: ", dev

     call mpi_finalize(ierr)
     end program device_assign
% mpif90 -acc device_assign.F90; mpirun -np 4 a.out
 RANK:             0  Using device:             0
 RANK:             2  Using device:             0
 RANK:             1  Using device:             1
 RANK:             3  Using device:             1

system · October 27, 2023, 5:14pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Direct GPU-to-GPU data transfer with OpenACC+managed+MPI nvc, nvc++ and nvfortran	4	1072	April 12, 2022
Multi-GPU MPI launch failing when UVM enabled Legacy PGI Compilers	5	3769	January 2, 2019
Using multiple GPUs Legacy PGI Compilers	7	22076	August 11, 2009
Good reference/examples for CUDA fortran with MPI, please? Legacy PGI Compilers	9	1242	April 5, 2024
about multi GPU control CUDA Programming and Performance	3	709	December 23, 2019
Unusually slow MPI communication between GPUs nvc, nvc++ and nvfortran	1	498	September 5, 2023
Device selection issue with OpenMP target mixed with do concurrent/OpenACC nvc, nvc++ and nvfortran	4	420	December 1, 2023
Unified memory - more than 1 GPU Legacy PGI Compilers	5	2700	January 17, 2019
CUDA Fortran+Openmp problem Legacy PGI Compilers	9	1127	March 3, 2022
About two or more GPUs Legacy PGI Compilers	6	7150	July 31, 2012

Multi-GPU Unified Memory and Communication

Related topics