MPI calls and handling of redundant array slicing in nvfortran

I’m working on a fortran code which uses MPI. It is also written in a style which often explicitly states the bounds of arrays by saying ARRAY(LowerBound:UpperBound). It even does this in MPI calls. I realise that MPI might not handle sliced arrays properly (MPI_SUBARRAYS_SUPPORTED = F), however the code works fine with gfortran and cray-fortran but fails with nvfortran. I believe the reason it is working with the other compilers is that the slice is just the whole array, and the compiler recognises this and just treats it as the whole array being passed in. This also seems to work with nvfortran if the array is 1d but in the 2d case it fails.

I have a simplified reproduction. The wait says the non-blocking receive has completed but the receiving array hasn’t been overwritten with the sent values:

MODULE MPIINFO
   USE MPI
   IMPLICIT NONE

   INTEGER:: N !THE NUMBER OF RANK THAT I HAVE(FOR EACH PROCESSOR!
   INTEGER:: ISIZE !THE TOTAL NUMBER OF RANKS(SIZE OF)

   INTEGER::IERROR, provided
   INTEGER:: STATUSES(MPI_STATUS_SIZE,2)

   INTEGER::EXCHANGE_PROCID

   INTEGER:: DIM1, DIM2

   INTEGER, ALLOCATABLE, DIMENSION(:, :)::SOL_SEND   !ARRAY TO HOLD THE VALUES TO BE SEND
   INTEGER, ALLOCATABLE, DIMENSION(:, :)::SOL_RECV   !ARRAY TO HOLD THE VALUES TO BE RECEIVED

CONTAINS

   SUBROUTINE EXCHANGE_SOL()
      IMPLICIT NONE
      INTEGER:: T
      integer:: n_requests
      integer, dimension(2) :: requests

      n_requests = 0


      requests(:) = 0

      n_requests = n_requests + 1

      PRINT *, "Rank", N, " Sending 2D Array to ", EXCHANGE_PROCID
      PRINT *,""
      PRINT *, N, " Contents of send buffer: ", SOL_SEND(1:DIM1, 1:DIM2)
      PRINT *,""

      CALL MPI_ISEND( &
         SOL_SEND(1:DIM1, 1:DIM2), &
         DIM1*DIM2, MPI_INTEGER, &
         EXCHANGE_PROCID, 0, &
         MPI_COMM_WORLD, requests(n_requests), ierror &
         )
      
      PRINT *, "Rank", N, " Send errors: ", ierror

      n_requests = n_requests + 1

      CALL MPI_IRECV( &
         SOL_RECV(1:DIM1, 1:DIM2), &
         DIM1*DIM2, MPI_INTEGER, &
         EXCHANGE_PROCID, 0, &
         MPI_COMM_WORLD, requests(n_requests), ierror &
         )

      PRINT *, "Rank", N, " Receive errors: ", ierror

      CALL MPI_WAITALL(n_requests, requests, STATUSES, ierror)

      PRINT *, "Rank", N, " Wait errors: ", ierror

      PRINT *, "Rank", N, " Contents of received buffer", SOL_RECV(1:DIM1, 1:DIM2)

   END SUBROUTINE EXCHANGE_SOL

END MODULE

PROGRAM MPI_2D_SLICED_NONBLOCKING_TEST
   USE MPIINFO
   IMPLICIT NONE

   INTEGER:: J, K

   DIM1 = 2
   DIM2 = 2

   CALL MPI_INIT_THREAD(MPI_THREAD_FUNNELED, PROVIDED, IERROR)
   CALL MPI_COMM_SIZE(MPI_COMM_WORLD, ISIZE, IERROR)
   CALL MPI_COMM_RANK(MPI_COMM_WORLD, N, IERROR)

   ALLOCATE (SOL_SEND(DIM1, DIM2))
   ALLOCATE (SOL_RECV(DIM1, DIM2))

   IF (N .eq. 0) THEN

      EXCHANGE_PROCID = 1

   END IF

   IF (N .eq. 1) THEN

      EXCHANGE_PROCID = 0

   END IF

   DO J = 1, DIM1
      DO K = 1, DIM2
         SOL_RECV(J, K) = -J - K
         SOL_SEND(J, K) = J + K
      END do
   END DO

   CALL EXCHANGE_SOL()

   CALL MPI_BARRIER(MPI_COMM_WORLD, IERROR)
   CALL MPI_FINALIZE(IERROR)

END PROGRAM MPI_2D_SLICED_NONBLOCKING_TEST

Output from nvfortran compiled code (between two mpi processes):

Rank 0 Sending 2D Array to 1

Rank 0 Contents of send buffer: 2 3 3
4

Rank 0 Send errors: 0
Rank 0 Receive errors: 0
Rank 0 Wait errors: 0
Rank 0 Contents of received buffer -2 -3
-3 -4
Rank 1 Sending 2D Array to 0

Rank 1 Contents of send buffer: 2 3 3
4

Rank 1 Send errors: 0
Rank 1 Receive errors: 0
Rank 1 Wait errors: 0
Rank 1 Contents of received buffer -2 -3
-3 -4

Expected output (gfortran compiled, between 2 mpi processes):

Rank 0 Sending 2D Array to 1

Rank 0 Contents of send buffer: 2 3 3 4

Rank 1 Sending 2D Array to 0

Rank 1 Contents of send buffer: 2 3 3 4

Rank 0 Send errors: 0
Rank 0 Receive errors: 0
Rank 1 Send errors: 0
Rank 1 Receive errors: 0
Rank 0 Wait errors: 0
Rank 0 Contents of received buffer 2 3 3 4
Rank 1 Wait errors: 0
Rank 1 Contents of received buffer 2 3 3 4

If I remove the lowerboud:upperbound slice in the mpi calls it works. Obviously, I can go through the code and remove every unnecessary slice from the code’s mpi calls. It is just a slight annoyance that it works with the other compilers but not nvfortran.

I believe the core issue here is because nvfortran doesn’t fully support the ASYNCHRONOUS attribute. This in turn causes the compiler to pass in the array as a slice given it’s size is not known at compile time.

It’s a know limitation in nvfortran, however our generation Fortran compiler based on flang will not have this issue.

Another possible work around is to use pointers, something like:

...
   integer, pointer, contiguous :: ptr(:,:)
...
      ptr => sol_recv(1:DIM1, 1:DIM2)
      CALL MPI_IRECV( &
!         SOL_RECV(1:DIM1, 1:DIM2), &
         ptr, &
         DIM1*DIM2, MPI_INTEGER, &
         EXCHANGE_PROCID, 0, &
         MPI_COMM_WORLD, requests(n_requests), ierror &
         )

Just in case it’s an easy fix, I added a bug report, TPR #36194, and will see if we can get this working with nvfortran so you’re not having to wait for our new flang based compiler.

Thanks.

I’ll try to modify the code to work around this limitation.

Unfortunately, I can’t use the pointer solution because the receive array is inside a derived type array and so can’t be given the TARGET property.

I’ll try to remove the slicing.