HPC SDK does not work with Autoconf/Automake-based projects

The latest release of the NVIDIA HPC SDK (23.11), and quite possibly previous versions, does not work with code that uses Autoconf/Automake.

After loading the module file (so that PATH, CC and other environment variables are set correctly), running a project’s ./configure seems to work correctly – but then running make fails.

Steps to reproduce

  1. Make sure the HPC SDK module is loaded (module list should include …/modulefiles/nvhpc/23.11)
  2. Download https://ftp.zap.org.au/pub/trader/unix/trader-7.18.tar.xz and unpack it.
  3. In the “trader-7.18” directory, run ./configure – this will seem to work correctly.
  4. Run make – it will fail with the following error message:
make[4]: Entering directory '/data/tmp/src/trader-7.18/lib'
/opt/nvhpc-23.11/Linux_x86_64/23.11/compilers/bin/nvc  -I. -I..     struct __va_list_tag { unsigned int gp_offset; unsigned int fp_offset; char *overflow_arg_area; char *reg_save_area; }; typedef struct __va_list_tag __pgi_va_list[1]; -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -g -O2 -MT libgnu_a-c-ctype.o -MD -MP -MF .deps/libgnu_a-c-ctype.Tpo -c -o libgnu_a-c-ctype.o `test -f 'c-ctype.c' || echo './'`c-ctype.c
/bin/bash: -c: line 1: syntax error near unexpected token `}'
/bin/bash: -c: line 1: `/opt/nvhpc-23.11/Linux_x86_64/23.11/compilers/bin/nvc  -I. -I..     struct __va_list_tag { unsigned int gp_offset; unsigned int fp_offset; char *overflow_arg_area; char *reg_save_area; }; typedef struct __va_list_tag __pgi_va_list[1]; -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -g -O2 -MT libgnu_a-c-ctype.o -MD -MP -MF .deps/libgnu_a-c-ctype.Tpo -c -o libgnu_a-c-ctype.o `test -f 'c-ctype.c' || echo './'`c-ctype.c'
make[4]: *** [Makefile:1672: libgnu_a-c-ctype.o] Error 2
make[4]: Leaving directory '/data/tmp/src/trader-7.18/lib'

Cause of the error

When running ./configure, the variable GL_CFLAG_ALLOW_WARNINGS is set to an incorrect value. The reason is that it invokes nvc -E (the C preprocessor) to check for values of __GNUC__, etc – the code is on line 18005 of trader-7.18/configure.

But when nvc -E is called, it automatically includes …/Linux_x86_64/23.11/compilers/include/_cplus_macros.h, which includes …/Linux_x86_64/23.11/compilers/include/_cplus_preinclude.h, which crucially outputs a struct __va_list_tag block – which gets included into the GL_CFLAG_ALLOW_WARNINGS variable.

Work-around

None yet known. Perhaps there is a CPPFLAGS setting to prohibit the output of the struct __va_list_tag block? This would not be ideal, as the same CPPFLAGS setting would need to work for actual compilation.

Requested solution

Ideally, the HPC SDK nvc -E should mimic the behaviour of both gcc -E and clang -E: do not generate any superfluous output.

nvc is really nvc++ in “C” mode and why you’re getting C++ preinclude header files. There’s a flag to disable this, “–no_preincludes”, however adding this to either CFLAGS or CPPFLAGS will cause compilation errors given the headers are required for compilation.

I didn’t see a way to pass in the flag so it’s only used when “-E” is applied, so I hand edited your configure to apply this change:

$ diff -u configure.org configure
--- configure.org       2024-01-05 08:34:12.707778720 -0800
+++ configure   2024-01-05 08:34:43.707749022 -0800
@@ -18030,7 +18030,7 @@
       -Wno-unsuffixed-float-constants
       #endif
 EOF
-    gl_command="$CC $CFLAGS $CPPFLAGS -E conftest.c > conftest.out"
+    gl_command="$CC $CFLAGS $CPPFLAGS -E --no_preincludes conftest.c > conftest.out"
     if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$gl_command\""; } >&5
   (eval $gl_command) 2>&5
   ac_status=$?

This will fix the issue, but given the configure is GNU centric, it’s implicitly including gcc flags:

nvc  -I. -I..     -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -g -O2 -MT libgnu_a-c-ctype.o -MD -MP -MF .deps/libgnu_a-c-ctype.Tpo -c -o libgnu_a-c-ctype.o `test -f 'c-ctype.c' || echo './'`c-ctype.c
nvc-Error-Unknown switch: -Wno-conversion
nvc-Error-Unknown switch: -Wno-float-conversion
nvc-Error-Unknown switch: -Wimplicit-fallthrough
nvc-Error-Unknown switch: -Wno-pedantic
nvc-Error-Unknown switch: -Wno-sign-conversion
nvc-Error-Unknown switch: -Wno-type-limits
nvc-Error-Unknown switch: -Wno-unsuffixed-float-constants

To fix this, add the flag “-noswitcherror” so the compiler gives a warning, not an error, when it see unknown flags.

export CC=nvc
export CFLAGS="-fast -noswitcherror"
./configure

`

After this, I’m able to build the project.

Hope the helps,
Mat

Thanks for that, Mat. Given that configure and the various Makefile / Makefile.am files are autogenerated, and that this problem is not just with my project but with all projects using Automake/Autoconf, I suspect there are two ultimate solutions and one workaround:

  1. Get NVIDIA to fix the output of nvc -E (as I suggest above) – even in C++ mode, many projects do not expect superfluous output from this preprocessor mode (ie, nvc -E should behave like gcc -E and clang -E); setting CFLAGS to -noswitcherror is easy enough.

  2. Fix Automake and Autoconf to recognise the NVIDIA HPC SDK compiler. I’ve already helped do a little of that with ax_compiler_vendor.m4 and ax_cflags_warn_all.m4, but the underlying assumption that all/most compilers mimic the GNU Compiler Collection or LLVM/Clang is quite deep, meaning it would take a lot of work to do this – then every project that uses Automake/Autoconf would need to be regenerated, or

  3. Individual patching of generated configure output similar to what you did above for projects that want to cater to using nvc/nvc++.

I’d prefer option (1), of course!

I added an issue report, TPR #34973, and asked engineering to take a look.

Though given “__va_list_tag” is how variadics are defined, I don’t think we can simply get rid of it.