Howto build OpenMPI with nvhpc/24.1

Hi,

I have some troubles to compile and run my code with nvhpc/24.1 and OpenMPI.

  1. with the provided OpenMPI version nvhpc-openmpi3/24.1 my code compiles but is not compatible with the slurm setup of the cluster (slurm is 20.11.7-1)
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

I must launch with srun in the batch file as using mpirun do not set some Slurm variables required for accessing the GPUs.

  1. As with Gnu compiler we use OpenMPI 4.1.4 I’ve built this version using nvhpc-nompi/24.1 and the same setup used for GNU:
export cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/24.1/cuda
../configure --with-hwloc --enable-mpirun-prefix-by-default \
  --prefix=$dest --with-pmi --enable-mpi1-compatibility \
  --with-ucx=$dest --enable-mpi-cxx --with-slurm \
  --enable-pmix-timing --with-pmix --without-verbs \
  --with-cuda=$cuda

These options do not seams really different from the one returned by ompi_info from nvhpc-openmpi3/24.1

But we are using wrappers in front of MPI calls as showed in the small test case and with nvhpc-nompi/24.1 + OpenMPI 4.1.4 I cannot compile successfully:
bash-4.4$ mpifort --show
nvfortran -I/opt/nvidia/openmpi-legi/4.1.4/include -I/opt/nvidia/openmpi-legi/4.1.4/lib -L/opt/nvidia/openmpi-legi/4.1.4/lib -rpath /opt/nvidia/openmpi-legi/4.1.4/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
bash-4.4$ mpifort -c comm.f90
NVFORTRAN-S-0155-Could not resolve generic procedure mpi_bcast (comm.f90: 22)
0 inform, 0 warnings, 1 severes, 0 fatal for my_bcast_character
NVFORTRAN-S-0155-Could not resolve generic procedure mpi_bcast (comm.f90: 44)
0 inform, 0 warnings, 1 severes, 0 fatal for my_bcast_logical_scalar

This code compiles successfully with:

  • nvhpc-openmpi3/24.1
  • OpenMPI 4.1.4 and GNU OG13 branch

May be there are special things to do when building OpenMPI with Nvidia SDK ?

comm.f90.txt (1.9 KB)

Thanks for your advices

Patrick

1 Like

Hi Patrick,

This is a known issue with recent versions of Open MPI and the NVIDIA HPC compilers.

As a workaround, try adding -Mstandard to the FCFLAGS variable when you invoke the ./configure script of Open MPI.

Hope this helps.

+chris

1 Like

Thanks Chris, this option solves the problem.
This is now my basic compilers setup to build openmpi 4.1.4 with nvhpc-nompi/24.1

CC=nvc++ CXX=nvc++ FC=nvfortran CFLAGS=-fPIC CXXFLAGS=-fPIC FCFLAGS=-fPIC \
FCFLAGS="-Mstandard -fPIC" CFLAGS=-fPIC CXXFLAGS=-fPIC \
  ../configure .....

and it works!
(Now I’ve to check if GPU to GPU communications works too but they should as they did with using include "mpif.h" as a workaround with the previous OpenMPI build)

Patrick

Hi,
I have some additional question about this setup of OpenMPI with Nvidia GPU. I’me running some tests on a node with 2 PCI4 A100 GPU. I’m using osu-micro-benchmarks-3.8 tests (unable to compile the latest version with nvhpc).
I slightly modified osu_bw.c to take account of slurm resources (to be sure the 2 processes are offloaded on a distinct GPU). The maximum bandwith reached is 16GB/s for osu_bw, so this is half a PCI4-16x bandwith.

I’ve read about GPUDirect (Benchmark Tests - NVIDIA Docs) but it seams to be related to “GPU-Node-Node-GPU” communications. Should I use it too for intranode communications ? And modify my OpenMPI setup ?

# OSU MPI-CUDA Bandwidth Test v3.8
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
1                       0.09
2                       0.18
4                       0.37
8                       0.73
16                      1.52
32                      2.80
64                      5.49
128                    10.75
256                    21.71
512                    43.51
1024                   84.30
2048                  172.98
4096                  330.46
8192                  578.57
16384                3013.24
32768                5584.44
65536                8690.47
131072              11354.22
262144              13467.84
524288              14828.03
1048576             15621.00
2097152             16045.29
4194304             16266.04

Yes, CUDA Aware MPI, which uses GPU Direct communication, can be used between GPUs on the same node,.

And modify my OpenMPI setup?

CUDA Aware MPI works by passing in device pointers to the MPI calls, so it’s more a program issue assuming your OpenMPI was built with CUDA Aware MPI enabled (which it looks like yours was).

I haven’t used the OSU benchmarks for quite awhile myself but assume they’ve been updated to use CUDA Aware MPI, in which case you shouldn’t need to do anything extra.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.