Issue with locally defined classes in OpenMP offload region (since NVHPC 22.5)

Hello!

We have following code pattern in a DSL generated code of NEURON simulator:

struct cacum_Instance  {
    double* cai;
};

void nrn_state_cacum(cacum_Instance* inst, int start, int end, int gpu) {
    #pragma omp target teams distribute parallel for if(gpu)
    #pragma acc parallel loop if(gpu)
    for (int id = start; id < end; id++) {
        struct functor {
            cacum_Instance* inst;
            functor(cacum_Instance* inst) : inst{inst} {}
        };
        inst->cai[id] += 1.0;
    }
}

Until NVHPC 22.3, the OpenMP offload version was compiling/working fine. With 22.5 and 22.7 we now get following error:

$ nvc++  -g  -O2  --c++17  -c cacumm_prep.cpp -mp=gpu -Minfo=accel
"cacumm_prep.cpp", line 11: internal error: assertion failed: lower_expr: bad kind (lower_il.cpp, line 17583 in lower_expr_full)

              functor(cacum_Instance* inst) : inst{inst} {}
                                                         ^

1 catastrophic error detected in the compilation of "cacumm_prep.cpp".
Compilation aborted.
nvc++-Fatal-/gpfs/bbp.cscs.ch/ssd/apps/bsd/pulls/1654/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.7-ltthct/Linux_x86_64/22.7/compilers/bin/tools/cpp1 TERMINATED by signal 6
Arguments to /gpfs/bbp.cscs.ch/ssd/apps/bsd/pulls/1654/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.7-ltthct/Linux_x86_64/22.7/compilers/bin/tools/cpp1
/gpfs/bbp.cscs.ch/ssd/apps/bsd/pulls/1654/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.7-ltthct/Linux_x86_64/22.7/compilers/bin/tools/cpp1 --llalign -Dunix -D__unix -D__unix__ -Dlinux -D__linux -D__linux__ -D__NO_MATH_INLINES -D__LP64__ -D__x86_64 -D__x86_64__ -D__LONG_MAX__=9223372036854775807L '-D__SIZE_TYPE__=unsigned long int' '-D__PTRDIFF_TYPE__=long int' -D__amd64 -D__amd64__ -D__k8 -D__k8__ -D__MMX__ -D__SSE__ -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__ABM__ -D__SSE4_1__ -D__SSE4_2__ -D__AVX__ -D__AVX2__ -D__AVX512F__ -D__AVX512CD__ -D__AVX512VL__ -D__AVX512BW__ -D__AVX512DQ__ -D__F16C__ -D__FMA__ -D__XSAVE__ -D__XSAVEOPT__ -D__XSAVEC__ -D__XSAVES__ -D__POPCNT__ -D__AES__ -D__PCLMUL__ -D__CLFLUSHOPT__ -D__FSGSBASE__ -D__RDRND__ -D__BMI__ -D__BMI2__ -D__LZCNT__ -D__FXSR__ -D__RTM__ -D__PKU__ -D__PGI -D__NVCOMPILER -D_GNU_SOURCE -D_PGCG_SOURCE --c++17 -I- -I/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/python-3.9.7-yj5alh/include/python3.9 --sys_include /gpfs/bbp.cscs.ch/ssd/apps/bsd/pulls/1654/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.7-ltthct/Linux_x86_64/22.7/compilers/include --sys_include /gpfs/bbp.cscs.ch/ssd/apps/bsd/pulls/1654/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.7-ltthct/Linux_x86_64/22.7/cuda/11.7/include --sys_include /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-ehzq5x/include/c++/11.2.0 --sys_include /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-ehzq5x/include/c++/11.2.0/x86_64-pc-linux-gnu --sys_include /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-ehzq5x/include/c++/11.2.0/backward --sys_include /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-ehzq5x/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include --sys_include /usr/local/include --sys_include /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-ehzq5x/include --sys_include /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-ehzq5x/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include-fixed --sys_include /usr/include -D__PGLLVM__ -D__NVCOMPILER_LLVM__ -D__extension__= -D_OPENMP=202011 -DCUDA_VERSION=11070 -DPGI_TESLA_TARGET -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -D_PGI_HX --preinclude _cplus_preinclude.h --preinclude_macros _cplus_macros.h --gnu_version=110200 -D__pgnu_vsn=110200 -g --dwarf2 --target_gpu --mp -D_OPENMP=202011 -D_NVHPC_RDC -q -o /tmp/nvc++z7iclJMg-z3g.il cacumm_prep.cpp

If I remove the condition if (gpu) on the offload loop or move class definition outside loop then code compiles fine.

Could you please take a look?

NOTE: OpenACC version compiles fine without any changes. So issue seems to be only with latest NVHPC releases (at least 22.3 & 22.5) and OpenMP offload.

1 Like

Thanks for the report Pramod. I was able to reproduce the regression here and have filed a problem report, TPR #32434. No idea what change in the front-end C++ compiler to cause this, but we’ll have engineering investigate.

-Mat

Thank you for confirming the issue, Mat!

I assume this also appears with the upcoming 22.9 release?

If there is any workaround engineering team
suggests then we will be certainly interested!

Yes, it’s still in 22.9 as well.

Do you need to replicate the case where the offload is conditional at execution?

I was thinking that you could use metadirectives with a user condition, but this triggers parse error when the functor is included. The syntax for the functor is similar to a metadirective so my guess is that somehow it parses incorrectly. Though I filed TPR#32437 and will let the compiler engineers determine the true cause.

My only other thought is to duplicate the loop in a if statement, putting the target teams pragma without the “if(gpu)” in the true condition section. Not ideal, but hopefully a short-term work around.

FYI, here’s what the code would look like using a metadirective (assuming the functor line didn’t cause an error):

#pragma omp metadirective \
        when( user={condition(gpu)}: target teams distribute parallel for)  \
        default( parallel for )
    for (int id = start; id < end; id++) {
        struct functor {
            cacum_Instance* inst;
#ifndef ERROR
            functor(cacum_Instance* inst) : inst{inst} {}
#endif
        };
        inst->cai[id] += 1.0;
    }
}

Do you need to replicate the case where the offload is conditional at execution?

I would yes because we would like to use same binary for CPU or GPU execution. Of course we could build separate modules / binaries but we have been trying to have single binary. This helps binary distributions.

My only other thought is to duplicate the loop in a if statement, putting the target teams pragma without the “if(gpu)” in the true condition section. Not ideal, but hopefully a short-term work around.

Ok, thanks! Most likely will try this as temporary alternative.

FYI, here’s what the code would look like using a metadirective (assuming the functor line didn’t cause an error):

Haven’t used metadirective until now. Thanks for the example!

Dear @MatColgrove,

We still see this issue in the latest v22.11 release. Internally (in TPR#32437) is there any feedback/plan to fix this “soonish”? (Currently we are stuck with 22.3)

It’s true that we have discussed the workarounds but implementing such workaround is non-trivial in our DSL to C++ code generation pipeline. And hence the question.

Thank you!

There’s two reports here. TPR #32434 is assigned to an engineer but looks to be lower on his priority list. I’ve pinged him for status.

For the issue with the metadirective work around code, #32437, this has not been assigned so doubt will be resolved any time soon.

I would yes because we would like to use same binary for CPU or GPU execution.

Is this an “either or”? i.e. when the binary is run, it’s run on either the CPU or GPU, but not together in the same run?

If this is the case, then you can remove the problematic “if(gpu)” and then use the environment variable “OMP_TARGET_OFFLOAD=[MANDATORY|DISABLED]” to control if the code is offloaded or not.

The caveat with “DISABLED” being that the loop will still be parallelized across multicore CPUs. If you do want it to run serially, also set “OMP_NUM_THREADS=1”.

Also note that if there’s no GPU on the system, the binary will use the host fallback code. Use of “OMP_TARGET_OFFLOAD” is only really needed if running on a system with a GPU but you want it to run on the host.

1 Like

Hi Pramod,

FYI the original issue, TPR #32434, was fixed in our 23.3 release.

-Mat

1 Like