MPI f08 alignment warnings (21.3)

At the linking stage of our fortran application, when using the use mpi_f08 fortran bindings, I see the following messages with HPC SDK 21.3 (Power9).

Is this something I should be concerned about or can I safely ignore these warnings?

Cheers, Thomas

/usr/bin/ld: Warning: alignment 4 of symbol `ompi_f08_mpi_double_precision' in /m100/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/21.3/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so is smaller than 8 in obj/opt_acc/control_all.o
/usr/bin/ld: Warning: alignment 4 of symbol `ompi_f08_mpi_integer' in /m100/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/21.3/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so is smaller than 8 in obj/opt_acc/control_all.o
/usr/bin/ld: Warning: alignment 4 of symbol `ompi_f08_mpi_ub' in /m100/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/21.3/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so is smaller than 8 in obj/opt_acc/pputil.o
/usr/bin/ld: Warning: alignment 4 of symbol `ompi_f08_mpi_min' in /m100/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/21.3/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so is smaller than 8 in obj/opt_acc/pputil.o
/usr/bin/ld: Warning: alignment 4 of symbol `ompi_f08_mpi_real8' in /m100/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/21.3/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so is smaller than 8 in obj/opt_acc/control_all.o
/usr/bin/ld: Warning: alignment 4 of symbol `ompi_f08_mpi_comm_null' in /m100/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/21.3/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so is smaller than 8 in obj/opt_acc/solver/interfaceSolverInOrb5.o
/usr/bin/ld: Warning: alignment 4 of symbol `ompi_f08_mpi_integer8' in /m100/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/21.3/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so is smaller than 8 in obj/opt_acc/parmove.o

I tried to make a reproducer.

[thayward@login01 mpi_test]$ cat test_mpi.F90
program test_mpi
  use mpi_f08
  implicit none
  integer :: ierr, rank
  type(MPI_Comm) :: comm
  integer :: ierror, count
  real, dimension(1) :: a1

  call MPI_INIT(ierr)
  call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierror)
  a1(1) = real(rank)
  count = 1
  ! reference: MPI_Bcast(buffer, count, datatype, root, comm, ierror)
  call MPI_Bcast(a1, count, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierror)


  print *, rank, ierr, ierror, a1
end program test_mpi


[thayward@login01 mpi_test]$  mpif90 -Mr8 test_mpi.F90 && mpirun -n 4 ./a.out
/usr/bin/ld: Warning: alignment 4 of symbol `ompi_f08_mpi_double_precision' in /m100/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/21.3/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so is smaller than 8 in /tmp/nvfortranlAPbHB7Batdt.o
            0            0            0    0.000000000000000
            2            0            0    0.000000000000000
            1            0            0    0.000000000000000
            3            0            0    0.000000000000000


[thayward@login01 mpi_test]$ which mpif90; mpif90 -V
/cineca/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/2021/comm_libs/mpi/bin/mpif90

nvfortran 21.3-0 linuxpower target on Linuxpower
NVIDIA Compilers and Tools
Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

[thayward@login01 mpi_test]$ which mpirun; mpirun -V
/cineca/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/2021/comm_libs/mpi/bin/mpirun
mpirun (Open MPI) 3.1.5

Report bugs to http://www.open-mpi.org/community/help/

If you use openmpi4, instead of 3, the warning will go away.

export PATH=/cineca/prod/opt/compilers/hpc-sdk/2021/binary/Linux_ppc64le/2021/comm_libs/openmpi4/openmpi-4.0.5/bin:$PATH

I would stay away from r8, there are better ways to set different precisions.

I also encountered this issue. After following your suggestion to switch from Open MPI 3 to Open MPI 4 by updating the PATH and LD_LIBRARY_PATH, I noticed a significant performance drop. Open MPI 4 is much slower than Open MPI 3, even though I am using the same code and compiler flags.

Here are the details of my environment:
CUDA 11.8
NVHPC 22.11
Compiler Command: mpif90 -O3 -cpp -tp=zen3 -gpu=ptxinfo -g -Wall -Minfo

Any suggestions or insights would be appreciated.

Thank you!

Hi cxs,

Sorry but I dis not see any performance difference when I moved to OpenMPI4 years ago nor have seen any other reports.

Have you tried profiling your code or done other investigation to determine where the performance difference is coming from?

-Mat

I used Nsight Systems to profile these two cases, and I found an interesting result. For the kernels, the time costs are almost the same. However, for data transfers (implemented in Open MPI), Open MPI 4 takes significantly longer — more than 5x compared to Open MPI 3.

Additionally, I noticed that in Open MPI 4, the transferred data appears as pageable, while in Open MPI 3, it is pinned. I understand that data is typically pinned during the send-receive process, but I’m not sure what changed in the Open MPI 4 code to cause this behavior.
Do you have insights into why this might be happening or how to resolve it?