How to preprocess source file for a bug submission? (C++/CUDA/OpenMP)

Dear Support team,

I have created some issues before with reproducers and in general aware about preprocessing flags etc. But, with code mix of C++, OpenACC and CUDA header includes, sometime also end-up with weird compilation errors when compiling preprocessed source file. And then wonder what is the exact recipe.

So my intention with this issue is to ask this naive question to (compiler) experts and settle this confusion once and for all :) : Give me a recipe to create a preprocessed reproducer especially when code is mix of C++, OpenACC/OpenMP and CUDA.

I can add later more examples later, let’s begin with this one:

I have simple code with like below:

#include <Random123/philox.h>

#ifdef __CUDACC__
#define g_k_qualifiers __device__ __constant__
#else
#define g_k_qualifiers
#endif

g_k_qualifiers philox4x32_key_t g_k{{0}};

and that end-up in compiler error as:

$ nvc++ foo.cpp -c -I/home/external/Random123/include -DR123_USE_SSE=0 -cuda
NVC++-F-0000-Internal compiler error. size of unknown type       0  (foo.cpp)
NVC++/x86-64 Linux 23.11-0: compilation aborted

So my next step is to preprocess file:

nvc++ foo.cpp -c -I/home/external/Random123/include -DR123_USE_SSE=0 -cuda -E -o bar.cpp

And then I thought I am ready to send this as a reproducer. But when checking this myself, I see:

$ nvc++ -cuda  bar.cpp
"/usr/include/x86_64-linux-gnu/bits/types.h", line 155: error: invalid redeclaration of type name "__fsid_t" (declared at line 155)
  typedef struct { int __val[2]; } __fsid_t;
                                   ^

"/usr/include/ctype.h", line 48: error: "_ISupper" has already been declared in the current scope
    _ISupper = ((0) < 8 ? ((1 << (0)) << 8) : ((1 << (0)) >> 8)),
    ^

"/usr/include/ctype.h", line 49: error: "_ISlower" has already been declared in the current scope
    _ISlower = ((1) < 8 ? ((1 << (1)) << 8) : ((1 << (1)) >> 8)),
    ^

"/usr/include/ctype.h", line 50: error: "_ISalpha" has already been declared in the current scope
    _ISalpha = ((2) < 8 ? ((1 << (2)) << 8) : ((1 << (2)) >> 8)),
    ^
....
"/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/include/vector_types.h", line 300: error: invalid redeclaration of type name "ulonglong1" (declared at line 404)
  struct __attribute__((device_builtin)) ulonglong1
                                         ^

"/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/include/vector_types.h", line 305: error: invalid redeclaration of type name "longlong2" (declared at line 405)
  struct __attribute__((device_builtin)) __attribute__((aligned(16))) longlong2
                                                                      ^

Error limit reached. Use -fmax-errors=N to change the limit, N=0 for unlimited.
100 errors detected in the compilati

Then I look at the available CLI flags from nvc++ -help:

$ nvc++ --help | grep include
-idirafter<incdir>  Add a directory to the end of the include file search path, after the standard include directories, and mark it as a system include directory
--include_directory<incdir>
                    Add directory to include file search path
-iquote<incdir>     Add a directory to the beginning of the include file search path, and use it only when processing includes with quotes
--[no_]implicit_include
-include<name>      compatibility: File to include at start of compilation
--no_preincludes    Ignore all preincluded files: used for compiling preprocessed files
--preinclude<name>  File to include at start of compilation
-cuda               Add CUDA include paths. Link with the CUDA runtime libraries. Please refer to -gpu for target specific options

I am able remove some errors by mixing some of the flags above but then other errors appear. And hence this confusion.

So, it would be great if you could provide some general instructions about most common CLI flags to create a preprocessed reproducer and avoid above type of error when mixing C++ / CUDA / OpenMP/OpenACC.

Thanks in advance!

Hi pramod,

This is an internal compiler error due to the initializer of the device variable so there’s not a command line option here to fix it, except to remove “-cuda” and compile for the host.

Note that nvc++ has limited support for CUDA and is not meant as a replacement for nvcc . So while I’m not sure this is something we support, the compiler shouldn’t be giving an ICE. Hence I’ve filed a problem report, TPR #35057, and sent to engineering for review.

-Mat

So while I’m not sure this is something we support, the compiler shouldn’t be giving an ICE. Hence I’ve filed a problem report, TPR #35057, and sent to engineering for review.

thanks @MatColgrove!

For the compiler issue itself, this was identified in the neuronsimulator/nrn/issues/2563. A minimal reproducer without dependency on Random123 library is:

#include <cstdint>

struct r123array2x32 {
  uint32_t v[2];
};

__device__ __constant__ r123array2x32 g_k{}; // Or, k{{0}} 

This code was working fine in NVHPC 23.1 but at least in 23.7, 23.9 and 23.11 giving ICE. So I assumed this is a regression. Thank you for reporting this to the compiler dev team! (By the way, I can avoid error by initialising as g_k{{0, 0}})

This is an internal compiler error due to the initializer of the device variable so there’s not a command line option here to fix it, except to remove “-cuda” and compile for the host.

Oh! Maybe I failed to clearly explain my aim here!

My original goal was not to highlight/report an error (even though its great that you already created internal ticket!) but to understand which CLI flags that I can use to preprocess source code file when I have to provide a standalone reproducer for compiler bugs.

As an example:

  • In the first post I did provide a small reproducer (which I manually created!) with dependency with Random123 library. This library is open source and hence the compiler team can download and reproduce the error.
  • But the actual project file is here we can see that the nrnran123.cpp include lot of project specific headers. So I can not just attach nrnran123.cpp as it is.
  • if I have to submit this error with a reproducer then I would typically add -E option to compilation command and then attach generated, preprocessed source code. This way, compiler dev team get a code without external dependencies.
  • But in the above example,
    • I highlighted how I first generated -E -o bar.cpp
    • when I tried to compile bar.cpp then I end-up with the compilation error.

So I wanted to know which nvc++ flags I can use to generate bar.cpp so that a compile dev can then able to compile it.

This is what I could quickly gather:

  1. In below example, when -cuda CLI flag exists, we get following error when compiling preprocessed code:
# original error
$ nvc++ -g -O2 -acc --c++17 -cuda -c -I/home/pramod/nrn/nrn/src -I/home/pramod/nrn/nrn/build_gpu/generated -isystem /home/pramod/nrn/nrn/external/CLI11/include -I/home/pramod/nrn/nrn/external/Random123/include /home/pramod/nrn/nrn/src/coreneuron/utils/randoms/nrnran123.cpp
NVC++-F-0000-Internal compiler error. size of unknown type       0  (/home/pramod/nrn/nrn/src/coreneuron/utils/randoms/nrnran123.cpp)
NVC++/x86-64 Linux 23.11-0: compilation aborted

# create preprocessed code
$ nvc++ -g -O2 -acc --c++17 -cuda -E -o bar.cpp -c -I/home/pramod/nrn/nrn/src -I/home/pramod/nrn/nrn/build_gpu/generated -isystem /home/pramod/nrn/nrn/external/CLI11/include -I/home/pramod/nrn/nrn/external/Random123/include /home/pramod/nrn/nrn/src/coreneuron/utils/randoms/nrnran123.cpp

# try to compile preprocessed code
$ $ nvc++ -g -O2 -acc --c++17 -cuda -c bar.cpp
"/usr/include/x86_64-linux-gnu/bits/types.h", line 155: error: invalid redeclaration of type name "__fsid_t" (declared at line 155)
  typedef struct { int __val[2]; } __fsid_t;
                                   ^

"/usr/include/ctype.h", line 48: error: "_ISupper" has already been declared in the current scope
    _ISupper = ((0) < 8 ? ((1 << (0)) << 8) : ((1 << (0)) >> 8)),
...


"/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/include/vector_types.h", line 295: error: invalid redeclaration of type name "longlong1" (declared at line 403)
  struct __attribute__((device_builtin)) longlong1
                                         ^

"/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/include/vector_types.h", line 300: error: invalid redeclaration of type name "ulonglong1" (declared at line 404)
  struct __attribute__((device_builtin)) ulonglong1

Error limit reached. Use -fmax-errors=N to change the limit, N=0 for unlimited.
100 errors detected in the compilation of "bar.cpp".
Compilation terminated.
  1. f I don’t add -cuda to compilethe above bar.cpp then we get:
$ nvc++ -g -O2 -acc --c++17  -c bar.cpp
"/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/include/crt/math_functions.h", line 11226: error: function "std::abs(float)" (declared at line 75 of "/usr/include/c++/11/bits/std_abs.h") was previously declared constexpr
  extern __attribute__((host)) __attribute__((device)) __attribute__((cudart_builtin)) float     abs(float);
                                                                                                 ^

"/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/include/crt/math_functions.h", line 11227: error: function "std::abs(double)" (declared at line 71 of "/usr/include/c++/11/bits/std_abs.h") was previously declared constexpr
  extern __attribute__((host)) __attribute__((device)) __attribute__((cudart_builtin)) double    abs(double);
  1. In the below example (from the previously reported error), with -mp=gpu, we get following error with the preprocessed code:
# original error
$ /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/bin/nvc++ -g  -O2   --c++17 -gpu=cuda11.7,lineinfo,cc70,cc80 -mp=gpu -Mautoinline -DCORENEURON_CUDA_PROFILING -DCORENEURON_ENABLE_GPU -DCORENEURON_BUILD -DHAVE_MALLOC_H -DEIGEN_DONT_PARALLELIZE -DEIGEN_DONT_VECTORIZE=1 -DNRNMPI=1 -DLAYOUT=0 -DDISABLE_HOC_EXP -DENABLE_SPLAYTREE_QUEUING -fPIC -I/home/pramod/nrn/nrn/build_gpu/include/coreneuron/utils/randoms  -I/home/pramod/nrn/nrn/build_gpu/include  -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/comm_libs/12.3/hpcx/hpcx-2.16/ompi/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/comm_libs/12.3/hpcx/hpcx-2.16/ompi/include/openmpi -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/comm_libs/12.3/hpcx/hpcx-2.16/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/comm_libs/12.3/hpcx/hpcx-2.16/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/comm_libs/12.3/hpcx/hpcx-2.16/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -c x86_64/corenrn/mod2c/hh.cpp -o x86_64/corenrn/build/hh.o -DNRN_PRCELLSTATE=0 --diag_suppress=111,550 --diag_suppress=161,177 -DR123_USE_SSE=0
"x86_64/corenrn/mod2c/hh.cpp", line 42: internal error: assertion failed: lower_expr: bad kind (lower_il.cpp, line 17583 in lower_expr_full)

              functor(cacum_Instance* inst) : inst{inst} {}
...

## Add `-E -o bar.cpp` for preprocessing:

$ /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/bin/nvc++ -E -o bar.cpp <same-above-command>


# an error while creating preprocessed code
$ /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/bin/nvc++ -g  -O2   --c++17 -cuda -gpu=cuda11.7,lineinfo,cc70,cc80 -mp=gpu -Mautoinline bar.cpp
"/usr/include/c++/11/bits/cpp_type_traits.h", line 73: error: invalid redeclaration of type name "std::__true_type" (declared at line 73)
    struct __true_type { };
           ^

"/usr/include/c++/11/bits/cpp_type_traits.h", line 74: error: invalid redeclaration of type name "std::__false_type" (declared at line 74)
    struct __false_type { };
...

"/usr/include/math.h", line 942: error: expected an identifier
      FP_ZERO =
      ^

"/usr/include/math.h", line 945: error: expected an identifier
      FP_SUBNORMAL =
      ^

Error limit reached.
100 errors detected in the compilation of "bar.cpp".
Compilation terminated.

Hope that gives additional context. Thanks in advance!

“-E” works, though I prefer “-P” which tells the compiler to stop after pre-processing and output the a file using the same name as the original but with a “.i” suffix.

To compile the post-processed file, add the flag “–no_preincludes”, otherwise all the implicitly added header files get added again and you get these redefinition errors.