Failure when using OpenACC after MPI_Init

Hi Adrian,

I typically delay using any OpenACC constructs until after I call MPI_Init so it’s unclear why this isn’t working correctly for you. Though I’ll use the following boiler plate code to set the device number so each rank uses a different device. Setting the device number is optional, but every rank would use the same default device without it.

I can provide the source if required.

That would be helpful in understanding the issue.

Here’s an example of what I typically do when using MPI+OpenACC. I’m using a system with 4 V100s.

% cat test_mpi_acc.f90
      PROGRAM test
      use mpi
      use openacc
      implicit none

      integer :: rank, world_size
      integer :: dev, devNum, local_rank, local_comm
      integer :: devtype, ierr
      integer, dimension(:), allocatable :: Arr
      integer :: asize, i

      call MPI_INIT(ierr)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, world_size, ierr)
      call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

      call MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, &
           MPI_INFO_NULL, local_comm,ierr)
      call MPI_Comm_rank(local_comm, local_rank,ierr)
      devtype = acc_get_device_type()
      devNum = acc_get_num_devices(devtype)
      dev = mod(local_rank,devNum)
      call acc_set_device_num(dev, devtype)
      dev = acc_get_device_num(devtype)
      print *, "Rank ", rank, " using Device ", dev, " out of ", devNum
      asize = 1024
      allocate(Arr(asize))
!$acc kernels loop copyout(Arr)
      do i=1,asize
         Arr(i) = i+rank
      enddo
      print *, "Rank ", rank, " A(10)=", Arr(10)
      deallocate(Arr)

      call MPI_FINALIZE(ierr)
      END PROGRAM

% mpif90 -V21.2 -acc -fast test_mpi_acc.f90                                                                                  
% mpirun -np 4 a.out
 Rank             3  using Device             3  out of             4
 Rank             0  using Device             0  out of             4
 Rank             2  using Device             2  out of             4
 Rank             1  using Device             1  out of             4
 Rank             0  A(10)=           10
 Rank             3  A(10)=           13
 Rank             2  A(10)=           12
 Rank             1  A(10)=           11
% mpirun -np 8 a.out
 Rank             0  using Device             0  out of             4
 Rank             1  using Device             1  out of             4
 Rank             3  using Device             3  out of             4
 Rank             6  using Device             2  out of             4
 Rank             5  using Device             1  out of             4
 Rank             7  using Device             3  out of             4
 Rank             2  using Device             2  out of             4
 Rank             4  using Device             0  out of             4
 Rank             0  A(10)=           10
 Rank             1  A(10)=           11
 Rank             3  A(10)=           13
 Rank             6  A(10)=           16
 Rank             5  A(10)=           15
 Rank             7  A(10)=           17
 Rank             4  A(10)=           14
 Rank             2  A(10)=           12

-Mat