Howto build OpenMPI with nvhpc/24.1

Hi,
I have some additional question about this setup of OpenMPI with Nvidia GPU. I’me running some tests on a node with 2 PCI4 A100 GPU. I’m using osu-micro-benchmarks-3.8 tests (unable to compile the latest version with nvhpc).
I slightly modified osu_bw.c to take account of slurm resources (to be sure the 2 processes are offloaded on a distinct GPU). The maximum bandwith reached is 16GB/s for osu_bw, so this is half a PCI4-16x bandwith.

I’ve read about GPUDirect (Benchmark Tests - NVIDIA Docs) but it seams to be related to “GPU-Node-Node-GPU” communications. Should I use it too for intranode communications ? And modify my OpenMPI setup ?

# OSU MPI-CUDA Bandwidth Test v3.8
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
1                       0.09
2                       0.18
4                       0.37
8                       0.73
16                      1.52
32                      2.80
64                      5.49
128                    10.75
256                    21.71
512                    43.51
1024                   84.30
2048                  172.98
4096                  330.46
8192                  578.57
16384                3013.24
32768                5584.44
65536                8690.47
131072              11354.22
262144              13467.84
524288              14828.03
1048576             15621.00
2097152             16045.29
4194304             16266.04