I’m experiencing a weird behaviour when the gpu-awareness is enabled.
The code I’m using is Fortran-based and accelerated using openACC.
If I use the openMPI (of the hpc-sdk toolkit) everything is fine.
However, when I use MPI Spectrum (IBM), and I enable the gpu awareness, performance on multi-node are really bad (20/30x times slower).
I also noticed that when the -gpu flag is used, all MPI calls are used in the GPU-aware mode, even if the host_data use_device directive is absent.