MPI calls and handling of redundant array slicing in nvfortran

seinsinnes · August 16, 2024, 12:07pm

I’m working on a fortran code which uses MPI. It is also written in a style which often explicitly states the bounds of arrays by saying ARRAY(LowerBound:UpperBound). It even does this in MPI calls. I realise that MPI might not handle sliced arrays properly (MPI_SUBARRAYS_SUPPORTED = F), however the code works fine with gfortran and cray-fortran but fails with nvfortran. I believe the reason it is working with the other compilers is that the slice is just the whole array, and the compiler recognises this and just treats it as the whole array being passed in. This also seems to work with nvfortran if the array is 1d but in the 2d case it fails.

I have a simplified reproduction. The wait says the non-blocking receive has completed but the receiving array hasn’t been overwritten with the sent values:

MODULE MPIINFO
   USE MPI
   IMPLICIT NONE

   INTEGER:: N !THE NUMBER OF RANK THAT I HAVE(FOR EACH PROCESSOR!
   INTEGER:: ISIZE !THE TOTAL NUMBER OF RANKS(SIZE OF)

   INTEGER::IERROR, provided
   INTEGER:: STATUSES(MPI_STATUS_SIZE,2)

   INTEGER::EXCHANGE_PROCID

   INTEGER:: DIM1, DIM2

   INTEGER, ALLOCATABLE, DIMENSION(:, :)::SOL_SEND   !ARRAY TO HOLD THE VALUES TO BE SEND
   INTEGER, ALLOCATABLE, DIMENSION(:, :)::SOL_RECV   !ARRAY TO HOLD THE VALUES TO BE RECEIVED

CONTAINS

   SUBROUTINE EXCHANGE_SOL()
      IMPLICIT NONE
      INTEGER:: T
      integer:: n_requests
      integer, dimension(2) :: requests

      n_requests = 0


      requests(:) = 0

      n_requests = n_requests + 1

      PRINT *, "Rank", N, " Sending 2D Array to ", EXCHANGE_PROCID
      PRINT *,""
      PRINT *, N, " Contents of send buffer: ", SOL_SEND(1:DIM1, 1:DIM2)
      PRINT *,""

      CALL MPI_ISEND( &
         SOL_SEND(1:DIM1, 1:DIM2), &
         DIM1*DIM2, MPI_INTEGER, &
         EXCHANGE_PROCID, 0, &
         MPI_COMM_WORLD, requests(n_requests), ierror &
         )
      
      PRINT *, "Rank", N, " Send errors: ", ierror

      n_requests = n_requests + 1

      CALL MPI_IRECV( &
         SOL_RECV(1:DIM1, 1:DIM2), &
         DIM1*DIM2, MPI_INTEGER, &
         EXCHANGE_PROCID, 0, &
         MPI_COMM_WORLD, requests(n_requests), ierror &
         )

      PRINT *, "Rank", N, " Receive errors: ", ierror

      CALL MPI_WAITALL(n_requests, requests, STATUSES, ierror)

      PRINT *, "Rank", N, " Wait errors: ", ierror

      PRINT *, "Rank", N, " Contents of received buffer", SOL_RECV(1:DIM1, 1:DIM2)

   END SUBROUTINE EXCHANGE_SOL

END MODULE

PROGRAM MPI_2D_SLICED_NONBLOCKING_TEST
   USE MPIINFO
   IMPLICIT NONE

   INTEGER:: J, K

   DIM1 = 2
   DIM2 = 2

   CALL MPI_INIT_THREAD(MPI_THREAD_FUNNELED, PROVIDED, IERROR)
   CALL MPI_COMM_SIZE(MPI_COMM_WORLD, ISIZE, IERROR)
   CALL MPI_COMM_RANK(MPI_COMM_WORLD, N, IERROR)

   ALLOCATE (SOL_SEND(DIM1, DIM2))
   ALLOCATE (SOL_RECV(DIM1, DIM2))

   IF (N .eq. 0) THEN

      EXCHANGE_PROCID = 1

   END IF

   IF (N .eq. 1) THEN

      EXCHANGE_PROCID = 0

   END IF

   DO J = 1, DIM1
      DO K = 1, DIM2
         SOL_RECV(J, K) = -J - K
         SOL_SEND(J, K) = J + K
      END do
   END DO

   CALL EXCHANGE_SOL()

   CALL MPI_BARRIER(MPI_COMM_WORLD, IERROR)
   CALL MPI_FINALIZE(IERROR)

END PROGRAM MPI_2D_SLICED_NONBLOCKING_TEST

Output from nvfortran compiled code (between two mpi processes):

Rank 0 Sending 2D Array to 1

Rank 0 Contents of send buffer: 2 3 3
4

Rank 0 Send errors: 0
Rank 0 Receive errors: 0
Rank 0 Wait errors: 0
Rank 0 Contents of received buffer -2 -3
-3 -4
Rank 1 Sending 2D Array to 0

Rank 1 Contents of send buffer: 2 3 3
4

Rank 1 Send errors: 0
Rank 1 Receive errors: 0
Rank 1 Wait errors: 0
Rank 1 Contents of received buffer -2 -3
-3 -4

Expected output (gfortran compiled, between 2 mpi processes):

Rank 0 Sending 2D Array to 1

Rank 0 Contents of send buffer: 2 3 3 4

Rank 1 Sending 2D Array to 0

Rank 1 Contents of send buffer: 2 3 3 4

Rank 0 Send errors: 0
Rank 0 Receive errors: 0
Rank 1 Send errors: 0
Rank 1 Receive errors: 0
Rank 0 Wait errors: 0
Rank 0 Contents of received buffer 2 3 3 4
Rank 1 Wait errors: 0
Rank 1 Contents of received buffer 2 3 3 4

If I remove the lowerboud:upperbound slice in the mpi calls it works. Obviously, I can go through the code and remove every unnecessary slice from the code’s mpi calls. It is just a slight annoyance that it works with the other compilers but not nvfortran.

MatColgrove · August 16, 2024, 8:22pm

I believe the core issue here is because nvfortran doesn’t fully support the ASYNCHRONOUS attribute. This in turn causes the compiler to pass in the array as a slice given it’s size is not known at compile time.

It’s a know limitation in nvfortran, however our generation Fortran compiler based on flang will not have this issue.

Another possible work around is to use pointers, something like:

...
   integer, pointer, contiguous :: ptr(:,:)
...
      ptr => sol_recv(1:DIM1, 1:DIM2)
      CALL MPI_IRECV( &
!         SOL_RECV(1:DIM1, 1:DIM2), &
         ptr, &
         DIM1*DIM2, MPI_INTEGER, &
         EXCHANGE_PROCID, 0, &
         MPI_COMM_WORLD, requests(n_requests), ierror &
         )

MatColgrove · August 16, 2024, 9:18pm

Just in case it’s an easy fix, I added a bug report, TPR #36194, and will see if we can get this working with nvfortran so you’re not having to wait for our new flang based compiler.

seinsinnes · August 17, 2024, 5:48pm

Thanks.

I’ll try to modify the code to work around this limitation.

seinsinnes · August 19, 2024, 3:15pm

Unfortunately, I can’t use the pointer solution because the receive array is inside a derived type array and so can’t be given the TARGET property.

I’ll try to remove the slicing.

Topic		Replies	Views
Unusually slow MPI communication between GPUs nvc, nvc++ and nvfortran	1	515	September 5, 2023
Failure when using OpenACC after MPI_Init nvc, nvc++ and nvfortran	7	1592	April 23, 2021
Cuda-Fortran MPI_Sendrecv segment fault Legacy PGI Compilers	1	5069	April 22, 2014
Nvfortran in nvhpc 23.11 fails to compile MPI codes nvc, nvc++ and nvfortran	1	382	January 26, 2024
Off-by-one-error in array access when using array slice in BIND(C) subroutine nvc, nvc++ and nvfortran	5	35	April 17, 2025
Running nvidia Fortran on multiple GPUs with MPI nvc, nvc++ and nvfortran	36	318	December 14, 2024
Allocatable arrays made private for openmp leads to out of bounds (0x0) writes with nvfortran nvc, nvc++ and nvfortran	17	438	June 21, 2024
MPI f08 alignment warnings (21.3) nvc, nvc++ and nvfortran	5	739	March 27, 2025
MPI Fortran on Archlinux nvc, nvc++ and nvfortran	5	784	December 20, 2022
PGI HPF issues Legacy PGI Compilers	8	6951	February 14, 2012

MPI calls and handling of redundant array slicing in nvfortran

Related topics