Nvfortran with MPI - NVHPC version 23.3

I have compiled a simple Fortran code (with just MPI_INIT and MPI _FINALIZE), and compiled it with NVHPC 23.3. However, when I try to run it, it fails with a segmentation fault, implying there is a bug in the runtime system. NVHPC version 22.11 does not crash during runtime. Is there a workaround for this? Also, is the Open MPI that comes bundled with NVHPC 23.3 been configured to work with SLURM? e.g. does it recognise the host file and other environment variables that SLURM has created?

What CUDA driver do you have installed, and what CUDA version is on your paths?
Also, what compilation line do you use?

I’m used to using SLURM built with PMIx support. I use mpicc & mpifort to build my apps and then run them using
srun --mpi=pmix …
which manages the processes across nodes.
This should work with the NVHPC/HPCX version of MPI
Linux_x86_64/23.5/comm_libs/hpcx

I haven;'t tried testing with 23.5 yet.

Hi, CUDA driver is 12.0 and CUDA is 12.0. Using Open MPI in 23.3/comm_libs/hpcx/hpcx-2.14/ompi/{bin,lib} now gives me the following error:

$ mpirun -n 1 ./simple.exe 
[1685136685.073832] [indigo59:228375:0]    ucp_context.c:1470 UCX  WARN  UCP version is incompatible, required: 1.15, actual: 1.8 (release 0 /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libucp.so.0)

I am using RHEL7, and it seems like it has an old version of libucp.so (no idea what this does). Do I need to use RHEL8 (or 9)?

Regards,

You may be picking up stale copies of these libs on your paths somewhere.
We may be able to untangle this by adjusting your paths, but I strongly recommend moving from RHEL/CentOS 7 up to RHEL/CentOS/Rocky 8 in order to clear-away as much junk as possible before we do this.

Also don’t install RPMs for MPI or UCX, they will (a) not be current, (b) not have been built with GPU support, and (c) will potentially cause collisions with newer libraries.