Hi Adrian,
I typically delay using any OpenACC constructs until after I call MPI_Init so it’s unclear why this isn’t working correctly for you. Though I’ll use the following boiler plate code to set the device number so each rank uses a different device. Setting the device number is optional, but every rank would use the same default device without it.
I can provide the source if required.
That would be helpful in understanding the issue.
Here’s an example of what I typically do when using MPI+OpenACC. I’m using a system with 4 V100s.
% cat test_mpi_acc.f90
PROGRAM test
use mpi
use openacc
implicit none
integer :: rank, world_size
integer :: dev, devNum, local_rank, local_comm
integer :: devtype, ierr
integer, dimension(:), allocatable :: Arr
integer :: asize, i
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, world_size, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, &
MPI_INFO_NULL, local_comm,ierr)
call MPI_Comm_rank(local_comm, local_rank,ierr)
devtype = acc_get_device_type()
devNum = acc_get_num_devices(devtype)
dev = mod(local_rank,devNum)
call acc_set_device_num(dev, devtype)
dev = acc_get_device_num(devtype)
print *, "Rank ", rank, " using Device ", dev, " out of ", devNum
asize = 1024
allocate(Arr(asize))
!$acc kernels loop copyout(Arr)
do i=1,asize
Arr(i) = i+rank
enddo
print *, "Rank ", rank, " A(10)=", Arr(10)
deallocate(Arr)
call MPI_FINALIZE(ierr)
END PROGRAM
% mpif90 -V21.2 -acc -fast test_mpi_acc.f90
% mpirun -np 4 a.out
Rank 3 using Device 3 out of 4
Rank 0 using Device 0 out of 4
Rank 2 using Device 2 out of 4
Rank 1 using Device 1 out of 4
Rank 0 A(10)= 10
Rank 3 A(10)= 13
Rank 2 A(10)= 12
Rank 1 A(10)= 11
% mpirun -np 8 a.out
Rank 0 using Device 0 out of 4
Rank 1 using Device 1 out of 4
Rank 3 using Device 3 out of 4
Rank 6 using Device 2 out of 4
Rank 5 using Device 1 out of 4
Rank 7 using Device 3 out of 4
Rank 2 using Device 2 out of 4
Rank 4 using Device 0 out of 4
Rank 0 A(10)= 10
Rank 1 A(10)= 11
Rank 3 A(10)= 13
Rank 6 A(10)= 16
Rank 5 A(10)= 15
Rank 7 A(10)= 17
Rank 4 A(10)= 14
Rank 2 A(10)= 12
-Mat