Can't compile with OpenMPI 4.1.4, "broken function"


I have been testing our MPI+OpenACC codes on the NCSA Delta-GPU system.

They were using OpenMPI 4.1.2 and with that, my code compiles fine and runs (although has inefficient MPI communications across nodes - a topic in another thread).

They recently added a OpenMPI 4.1.4 module, but when I try to use it I get the following compiler error:

Intrinsic has incorrect return type!
i64 (i32, i64)* @llvm.nvvm.match.any.sync.i64
LLVM ERROR: Broken function found, compilation aborted!
NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (mas_sed_expmac.f: 67540)
NVFORTRAN/x86-64 Linux 22.2-0: compilation aborted
make: *** [Makefile:57: mas.o] Error 2

The code line it mentions is simply assigning a logical variable to another logical variable.
However, right after that, an MPI call is called with the logical:

logical :: field_present

logical, external :: rs_array_present



call MPI_Bcast (field_present,1,MPI_LOGICAL,iproc0,comm_all,ierr)

The rs_array_present is a logical function defined elsewhere in the code.

Is it possible that the MPI_LOGICAL type is not the same as the fortran logical type?
Is this an OpenMPI bug?

– Ron

Hi Ron,

I wouldn’t think that this has anything to do with OpenMPI, but maybe it’s due to the environment being used?

The only time I’ve seen the “LLVM ERROR: Broken function found, compilation aborted!” is when libnvvm (the back-end LLVM based device code generator) changed from one LLVM version to another and some of the LLVM intrinsics changed.

My best guess is that when using this module, some environment variable is being set so that the compiler is then picking up an different than the one we’re expecting.

Do you know if “CUDA_HOME” or “LD_LIBRARY_PATH” is getting set in this module, and if so, what they are being set to?



Here is what I see:

DELTA-GPU: /u/sumseq> module list

Currently Loaded Modules:
  1) cue-login-env/1.0   3) default      5) ucx/1.11.2    7) openmpi/4.1.2   9) subversion/1.13.0
  2) modtree/gpu         4) nvhpc/22.2   6) cuda/11.6.1   8) git/2.31.1     10) libtirpc/1.2.6
DELTA-GPU: /u/sumseq> module load openmpi/4.1.4
The following dependent module(s) are not currently loaded: cuda/11.6.1 (required by: ucx/1.11.2)

The following have been reloaded with a version change:
  1) cuda/11.6.1 => cuda/11.7.0     2) openmpi/4.1.2 => openmpi/4.1.4
DELTA-GPU: /u/sumseq> module show cuda/11.7.0
whatis("Name : cuda")
whatis("Version : 11.7.0")
whatis("Target : zen3")
whatis("Short description : CUDA is a parallel computing platform and programming model invented by NVIDIA. It 
enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GP
help([[CUDA is a parallel computing platform and programming model invented by
NVIDIA. It enables dramatic increases in computing performance by
harnessing the power of the graphics processing unit (GPU). Note: This
package does not currently install the drivers necessary to run CUDA.
These will need to be installed manually. See: for details.]])
DELTA-GPU: /u/sumseq> module show openmpi/4.1.4
whatis("Name : openmpi")
whatis("Version : 4.1.4")
whatis("Target : zen3")
whatis("Short description : An open source Message Passing Interface implementation.")
whatis("Configure options : --enable-shared --disable-silent-rules --disable-builtin-atomics --with-pmi=/usr --
enable-static --enable-mpi1-compatibility --with-ofi=/sw/external/libraries/libfabric-1.14.0 --without-fca --wi
thout-verbs --without-psm --without-xpmem --without-psm2 --without-cma --with-knem=/opt/knem- --wi
th-ucx=/sw/spack/delta-2022-03/apps/ucx/1.12.1-nvhpc-22.2-i5zpucv --without-mxm --without-hcoll --without-cray-
xpmem --without-alps --without-tm --without-sge --with-slurm --without-lsf --without-loadleveler --disable-memc
hecker --with-lustre=/usr --with-pmix=/usr/local --with-zlib=/sw/spack/delta-2022-03/apps/zlib/1.2.11-nvhpc-22.
2-q7ooed6 --with-hwloc=/usr --disable-java --disable-mpi-java --with-gpfs=no --enable-dlopen --with-cuda=/sw/sp
ack/delta-2022-03/apps/cuda/11.7.0-nvhpc-22.2-eiijfgu --enable-wrapper-rpath --disable-wrapper-runpath --disabl
e-mpi-cxx --disable-cxx-exceptions --with-wrapper-ldflags=-Wl,-rpath,/sw/spack/delta-2022-03/apps/nvhpc/22.2-gc
help([[An open source Message Passing Interface implementation. The Open MPI
Project is an open source Message Passing Interface implementation that
is developed and maintained by a consortium of academic, research, and
industry partners. Open MPI is therefore able to combine the expertise,
technologies, and resources from all across the High Performance
Computing community in order to build the best MPI library available.
Open MPI offers advantages for system and software vendors, application
developers and computer science researchers.]])

It looks like it is setting the CUDA to the 11.7 but that should be ok I think… Maybe the NV compiler is till linking to the old CUDA?

– Ron

Hi Ron,

Yes, I think that libnvvm changed between CUDA 11.6 and 11.7. Since NVHPC 22.2 only supports up to 11.6, when using CUDA 11.7 the generated device LLVM is not in the correct format.

To fix, unset “CUDA_HOME” in the environment, update the compiler to 22.5 (which support CUDA 11.7), or add the flag “-gpu=nonvvm” which will fall back to our older CUDA C device code generator.

Note that in 22.7, we decided to stop using CUDA_HOME if users what to use a different CUDA. Instead we changed to using NVHPC_CUDA_HOME for this. Hopefully it will avoid these types of issues where CUDA_HOME is not being intentionally set.



After I unset CUDA_HOME the code compiles!
I assume the CUDA_HOME should also be unset when running the code?

– Ron


That probably doesn’t matter. LD_LIBRARY_PATH might though if it picks-up the wrong CUDA runtime libs.