Hi,
I am now studying CUDA-aware MPI to further boost my models’ running speed. I have some basic questions.
- With openacc or stdpar, a variable could represent both host variable and device variable. How do MPI know which variable to send, host variable or device variable? Are there any different MPI rules between -gpu=managed and -gpu=nomanaged ? For example in code below, is variable A sent by MPI_Send() function the device variable?
integer, parameter :: N = 10, steps = 180
integer,allocatable :: A(:), B(:)
Allocate(A(N), B(N))
...
!$ACC ENTER DATA COPYIN(A,B)
if (rank == 0) then
! On GPU 0
!$acc parallel loop present(A)
do i = 1, N
A(i) = A(i) + 1
end do
! Send Array A to GPU 1
call MPI_Send(A, N, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, ierr)
! Recieve Array B from GPU 1
call MPI_Recv(B, N, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE, ierr)
else if (rank == 1) then
! On GPU 1
do concurrent (i = 1:N)
B(i) = B(i) - 1
end do
! Recieve Array A from GPU 0
call MPI_Recv(A, N, MPI_INTEGER, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE, ierr)
! Send Array B to GPU 1
call MPI_Send(B, N, MPI_INTEGER, 0, 0, MPI_COMM_WORLD, ierr)
end if
- Is there a simple code to test whether my compiler support CUDA-aware MPI? Which MPI version is better, Openmpi-4.1.8 or Openmpi-5.0.7?
Thanks a lot!