Nvcc undeclared builtins reporting failure -- configure test method for PMIx?

This issue also reported at the OpenMPI forums. It’s unclear whether this is nvcc in NVHPC 23.1 and 23.11 or the OpenMPI 5.x configuration method for getting the compiler to report undeclared builtins.


Background information

5.0.0 and 5.0.1 configure test for PMIx no longer works with NVHPC, appears to be the compiler check for undeclared builtins. The associated stub compiles and runs but does not produce output, which is interpreted as inability to get the compiler to report undeclared builtins, halting the configure since pmix is required as of OMP 5.x. GCC through 13.2.0 does not have this issue. OMPI 4.x pmix test stub differs (orted vs prrte?) and does not have this issue for GCC 12.2.0 / 13.2.0 or NVHPC 23.1 / 23.11. Currently unable to try OMP 5.0.2 or external pmix/hwloc/libevent recent enough for pmix>=4.2 (policy). Can anyone please verify working OMP 5.x build with NVHPC 23.11 (cuda 12.3) or 23.1 (cuda 12.0)?

Versions

  • NVHPC 23.11, Cuda compilation tools, release 12.0, V12.0.76, Build cuda_12.0.r12.0/compiler.31968024_0
  • NVHPC 23.1, Cuda compilation tools, release 12.3, V12.3.52, Build cuda_12.3.r12.3/compiler.33281558_0

OMPI build

Source build using NVHPC 23.1 (CUDA 12.0) and 23.11 (CUDA 12.3), CC=nvcc, FC=nvfortran, CXX=nvc++
Configure is scripted, script block is:

            ...<archive copy/unroll, builddir create, set $distro and $basearch, etc>...
        module purge
        module load nvhpc/23.11
            ...<setup ./pbs-config for --with-tm>...
        export CFLAGS=''
        export FCFLAGS=''
        ../configure \
            --prefix=/opt/soft/$distro/$basearch/openmpi/5.0.1/nvhpc/23.11 \
            --x-includes=/usr/include \
            --x-libraries=/usr/lib64 \
            --enable-branch-probabilities \
            --enable-dependency-tracking \
            --enable-mpi-ext=all \
            --with-pmix=internal \
            --enable-pmix-timing \
            --with-package-string="Open MPI 5.0.1 with NVHPC 23.11" \
            --with-ident-string="NRLDC CCS" \
            --enable-ipv6 \
            --enable-heterogeneous \
            --enable-hwloc-pci \
            --with-hwloc=internal \
            --with-ofi \
            --with-verbs \
            --with-tm="$PBS_EXEC" \
            --enable-sparse-groups \
            --enable-peruse \
            --enable-mpi-fortran=all \
            CC=nvcc \
            FC=nvfortran \
            CXX=nvc++
            # --enable-mpi-cxx \ # (C++ bindings no longer supportd)
            # --enable-mpi-cxx-seek \ # (C++ bindings no longer supportd)

System

  • Operating system/version: Rocky Linux 8.8 aarch64
  • Computer hardware: HPE Apollo80 gen 2 (Fujitsu A64FX armv8 FX700)
  • Network type: 10Gb ether, 100Gb ether, HDR infiniband

Details

Configure fails at PMIx checking:

checking for nvcc options needed to detect all undeclared functions... cannot detect
configure: error: in `/tmp/openmpi-nvhpc/openmpi-5.0.1/build/3rd-party/openpmix':
configure: error: cannot make nvcc report undeclared builtins

The outer config.log for this failure is:

configure:5795: *** Configuring PMIx
configure:63782: ===== configuring 3rd-party/openpmix =====
configure:63971: running /bin/sh ../../../3rd-party/openpmix/configure --disable-option-checking '--prefix=/opt/soft/el8/aarch64/openmpi/5.0.1/nvhpc/23.11' --without-tests-examples --enable-pmix-binaries --disable-pmix-backward-compatibility --disable-visibility --disable-hwloc-lib-checks --with-hwloc-extra-libs="/tmp/openmpi-nvhpc/openmpi-5.0.1/build/3rd-party/hwloc-2.7.1/hwloc/libhwloc.la" '--x-includes=/usr/include' '--x-libraries=/usr/lib64' '--enable-branch-probabilities' '--enable-dependency-tracking' '--enable-mpi-ext=all' '--enable-pmix-timing' '--with-package-string=Open MPI 5.0.1 with NVHPC 23.11' '--with-ident-string=NRLDC CCS' '--enable-ipv6' '--enable-heterogeneous' '--enable-hwloc-pci' '--with-ofi' '--with-verbs' '--with-tm=' '--enable-sparse-groups' '--enable-peruse' '--enable-mpi-fortran=all' 'CC=nvcc' 'CFLAGS=' 'CPPFLAGS=-I/tmp/openmpi-nvhpc/openmpi-5.0.1/build/3rd-party/hwloc-2.7.1/include -I/tmp/openmpi-nvhpc/openmpi-5.0.1/3rd-party/hwloc-2.7.1/include' 'CXX=nvc++' 'FC=nvfortran' 'FCFLAGS=' 'CPP=cpp' 'PKG_CONFIG_PATH=/opt/soft/el8/aarch64/ucx/1.13.1/lib/pkgconfig:/opt/soft/el8/aarch64/openssl/1.1.1s/lib/pkgconfig' --cache-file=/dev/null --srcdir=../../../3rd-party/openpmix
configure:63991: ===== done with 3rd-party/openpmix configure =====
configure:65532: error: Could not find viable pmix build.

The inner config.log (<build_dir>/3rd-party/openpmix/config.log) for this failure is:

...snip snip...
| /* end confdefs.h.  */
| #include <float.h>
| #include <limits.h>
| #include <stdarg.h>
| #include <stddef.h>
| extern void ac_decl (int, char *);
|
| int
| main (void)
| {
| (void) ac_decl (0, (char *) 0);
|   (void) ac_decl;
|
|   ;
|   return 0;
| }
configure:18028: result: cannot detect
configure:18032: error: in `/tmp/openmpi-nvhpc/openmpi-5.0.1/build/3rd-party/openpmix':
configure:18034: error: cannot make nvcc report undeclared builtins

The stub compiles and runs but does not produce output under nvcc in NVHPC 23.1 or 23.11.

I’ll ask our folks here who build the OpenMPI we ship for advice.

Though is there a reason why you’re using “nvcc” which is the CUDA C++ Compiler and not “nvc” which is our C compiler?

‘nvcc’ as advised in an earlier (2022?) thread with nvidia, when building openmpi 4.x, I’ll see if I can find it. That worked well (still does) for ompi 4.x series builds. I’ll try switching it back and see what it does.

The issue here is that Open MPI 5.x is passing -Wno-unused-parameter to nvcc, which does not accept the flag:

configure:18011: nvcc -c -DNDEBUG  -Wno-unused-parameter -fno-builtin  conftest.c >&5
nvcc fatal   : Unknown option '-Wno-unused-parameter'
configure:18011: $? = 1

You will need to either work with the Open MPI developers to fix the PMIx ./configure script to avoid passing this flag to nvcc, or modify the ./configure script yourself to do the same.

Building ompi 4.1.4 with nvhpc 23.1, CC=nvc: compiles but fails ‘make check’ with

"../../../test/datatype/partial.c", line 89: warning: argument of type "opal_datatype_t **" is incompatible with parameter of type "ompi_datatype_t **" [incompatible_param]
      ompi_datatype_create_contiguous(CONT_COUNT, base, &vector);
                                                        ^

"../../../test/datatype/partial.c", line 93: warning: argument of type "opal_datatype_t *" is incompatible with parameter of type "const ompi_datatype_t *" [incompatible_param]
      ompi_datatype_dump(vector);
                         ^

"../../../test/datatype/partial.c", line 97: warning: argument of type "ompi_datatype_t *" is incompatible with parameter of type "const opal_datatype_t *" [incompatible_param]
      opal_datatype_type_extent(base, &base_extent);

and

  CCLD     reduce_local
make[3]: Leaving directory '/tmp/openmpi-nvhpc/openmpi-4.1.4/build/test/datatype'
make  check-TESTS
make[3]: Entering directory '/tmp/openmpi-nvhpc/openmpi-4.1.4/build/test/datatype'
make[4]: Entering directory '/tmp/openmpi-nvhpc/openmpi-4.1.4/build/test/datatype'
../../../config/test-driver: line 107: 2190625 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: opal_datatype_test
PASS: unpack_hetero
PASS: checksum
PASS: position
FAIL: position_noncontig
PASS: ddt_test
PASS: ddt_raw
../../../config/test-driver: line 107: 2190801 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: ddt_raw2
../../../config/test-driver: line 107: 2190826 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: unpack_ooo
PASS: ddt_pack
PASS: external32

before hanging indefinitely at test/datatype/.libs/lt-large_data, strace -fv shows no calls, process in run state, no wchan, .

The same build process (literally cut/pasted commands) but with CC=nvcc compiles and checks without issue.

Building omp 5.0.1 with nvhpc 23.1, CC=nvc: build fails with

/usr/bin/ld: .libs/session_get_nth_pset_f08.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `_mpi_f08_types_8_' which may bind externally can not be used when making a shared object; recompile with -fPIC
.libs/session_get_nth_pset_f08.o: In function `mpi_session_get_nth_pset_f08_':
/tmp/openmpi-nvhpc/openmpi-5.0.1/build/ompi/mpi/fortran/use-mpi-f08/../../../../../ompi/mpi/fortran/use-mpi-f08/session_get_nth_pset_f08.F90:26:(.text+0x8): dangerous relocation: unsupported relocation
/usr/bin/ld: .libs/session_get_num_psets_f08.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `_mpi_f08_types_8_' which may bind externally can not be used when making a shared object; recompile with -fPIC
.libs/session_get_num_psets_f08.o: In function `mpi_session_get_num_psets_f08_':
/tmp/openmpi-nvhpc/openmpi-5.0.1/build/ompi/mpi/fortran/use-mpi-f08/../../../../../ompi/mpi/fortran/use-mpi-f08/session_get_num_psets_f08.F90:24:(.text+0x8): dangerous relocation: unsupported relocation
/usr/bin/ld: profile/.libs/libmpi_usempif08_pmpi.a(psession_get_nth_pset_f08.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `_mpi_f08_types_8_' which may bind externally can not be used when making a shared object; recompile with -fPIC
profile/.libs/libmpi_usempif08_pmpi.a(psession_get_nth_pset_f08.o): In function `pmpi_session_get_nth_pset_f08_':
/tmp/openmpi-nvhpc/openmpi-5.0.1/build/ompi/mpi/fortran/use-mpi-f08/profile/psession_get_nth_pset_f08.F90:26:(.text+0x8): dangerous relocation: unsupported relocation
/usr/bin/ld: profile/.libs/libmpi_usempif08_pmpi.a(psession_get_num_psets_f08.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `_mpi_f08_types_8_' which may bind externally can not be used when making a shared object; recompile with -fPIC
profile/.libs/libmpi_usempif08_pmpi.a(psession_get_num_psets_f08.o): In function `pmpi_session_get_num_psets_f08_':
/tmp/openmpi-nvhpc/openmpi-5.0.1/build/ompi/mpi/fortran/use-mpi-f08/profile/psession_get_num_psets_f08.F90:24:(.text+0x8): dangerous relocation: unsupported relocation
make[3]: *** [Makefile:2627: libmpi_usempif08.la] Error 2
make[3]: Leaving directory '/tmp/openmpi-nvhpc/openmpi-5.0.1/build/ompi/mpi/fortran/use-mpi-f08'
make[2]: *** [Makefile:2669: all-recursive] Error 1
make[2]: Leaving directory '/tmp/openmpi-nvhpc/openmpi-5.0.1/build/ompi/mpi/fortran/use-mpi-f08'
make[1]: *** [Makefile:2799: all-recursive] Error 1
make[1]: Leaving directory '/tmp/openmpi-nvhpc/openmpi-5.0.1/build/ompi'
make: *** [Makefile:1533: all-recursive] Error 1

The same build process (literally cut/pasted commands) but with CC=nvcc fails in the way described in the OP.

Building omp 5.x with nvhpc 23.x wrt -Wno-unused-parameter: complaints found in config logs for openpmix but also for romio341:

23:55:42 olagarde@compute104.godzilla:/tmp/openmpi-nvhpc/openmpi-5.0.1/build $ find .. -type f -exec grep -Hi no-unused-parameter {} \;
../3rd-party/openpmix/config/pmix.m4:    CFLAGS="$CFLAGS -Wno-unused-parameter"
../3rd-party/openpmix/config/pmix.m4:    CFLAGS="$CFLAGS -Wno-unused-parameter"
../3rd-party/openpmix/configure:    CFLAGS="$CFLAGS -Wno-unused-parameter"
../3rd-party/openpmix/configure:    CFLAGS="$CFLAGS -Wno-unused-parameter"
../3rd-party/romio341/confdb/aclocal_cc.m4:    #   -Wno-unused-parameter -- For portability, some parameters go unused
../3rd-party/romio341/confdb/aclocal_cc.m4:        -Wno-unused-parameter
../3rd-party/romio341/mpl/confdb/aclocal_cc.m4:    #   -Wno-unused-parameter -- For portability, some parameters go unused
../3rd-party/romio341/mpl/confdb/aclocal_cc.m4:        -Wno-unused-parameter

Will follow up with OpenMPI forums.

Here’s two combinations that appear to work, at least as far as build, check, and simplistic hybrid mpi/mp batch jobs go (homebrew jacobian matrix calc and stock HYCOM as benchmarks):

  • fPIC is getting inserted for 4.x/nvcc so CFLAGS=‘’, FCFLAGS=‘’
  • fPIC is getting skipped for 5.x/nvc so CFLAGS=‘-fPIC’, FCFLAGS=‘-fPIC’
  • 4.x works with nvcc but not nvc
  • 5.x works with nvc but not nvcc
  • want_picky_compiler stays off explicitly
  • nvhpc 23.x for both cases

There are several hundred instances of things like

"../../../test/datatype/ddt_pack.c", line 250: warning: transfer of control bypasses initialization of: [branch_past_initialization]
            variable "type" (declared at line 413)
      if (ret != 0) goto cleanup;

and

"../../../test/datatype/ddt_raw2.c", line 234: warning: integer conversion resulted in a change of sign [integer_sign_change]
          { .loop = { { 16, 0}, 2, 3, -1, 16} },

for the ompi 5.x and nvhpc 23.x (nvc, forced -fPIC) that don’t occur elsewhere. This can mask edge condition errors but (a) AFAICT there aren’t any errors, the successful tests true positives; (b) these only occur in the testcases so … meh?