Nvc++ OpenMP error inside llc

carl.nettelblad · May 25, 2021, 9:52pm

I’m trying to migrate an OpenMP codebase to OpenMP target offloading, and naturally compiling it with nvc++ for the first time as well.

I’ve now managed to build my code with OpenMP completely turned off. However, if I turn it on, no matter whether it’s on GPU or CPU and seemingly independent of optimization settings, I end up with output similar to the following:

/domus/h1/nettel/nvhpc/Linux_x86_64/21.3/compilers/share/llvm/bin/llc 
/scratch/nvc++g6igsB3eWw9b.ll -march=x86-64 -mcpu=native -O0 -fast-isel=0 -non- global-value-max-name-size=4294967295 -x86-cmov-converter=0 --frame-pointer=none -o 
/scratch/nvc++26igI1YMkJdo.s
/domus/h1/nettel/nvhpc/Linux_x86_64/21.3/compilers/share/llvm/bin/llc: error: 
/domus/h1/nettel/nvhpc/Linux_x86_64/21.3/compilers/share/llvm/bin/llc: 
/scratch/nvc++g6igsB3eWw9b.ll:9237:22: error: use of undefined value '%.F0063.addr'
        %3 = load i32, i32* %.F0063.addr, align 4, !dbg !5681
                            ^
nvc++-Fatal-llc completed with exit code 1

Unlinking /scratch/nvc++M6igYCJYIcXM.il
Unlinking /scratch/nvc++w6igcd-rfoMK.s
Unlinking /scratch/nvc++g6igsB3eWw9b.ll
Unlinking /scratch/nvc++26igI1YMkJdo.s
Unlinking /scratch/nvc++M6igYHjWaNhU.llvm

I’ve not found any easy flag to make it keep temporary files at that pass. -Mkeepasm is not really appropriate for this.

Again, I get this independently of what -mp target I choose (gpu or multicore). Any hints on how to troubleshoot this, or what would be needed to report it as a bug?

MatColgrove · May 26, 2021, 12:13am

This looks like a code generation bug. Are you able to provide a reproducing example the exhibits the error? If small, you can post inline or provide a link. If the code is not something you want to post publicly, feel free to direct message me and we can arrange a way for you to send us the example.

carl.nettelblad · May 26, 2021, 1:42am

I realized that an array reduction could be the culprit, and indeed I get it to compile if I drop the reduction there. From that, I realized that I had a macro problem, so I was in fact using the old openmp pragma from my GCC build, rather than an attempt to use proper OpenMP target pragmas. Hence, I guess the bug might not be very relevant.

Anyway, when I tried to reduce it to a sharable minimal case, I started getting a different error, so it’s not a perfect repro, but I get an internal compiler error instead. If it’s still relevant for you and that’s not enough, we’ll have to find a way to share the full code privately.

(The full code does not look completely like this, if it did I realize I could do the reduction in a number of more reasonable ways.)

bash-4.2$ /home/nettel/nvhpc/Linux_x86_64/2021/compilers/bin/nvc++ repro.cpp -mp=gpu -gpu=cc75
NVC++-S-0053-Illegal use of void type (repro.cpp: 21)
NVC++-S-0053-Illegal use of void type (repro.cpp: 21)
NVC++-F-0000-Internal compiler error. ll_abi_complete_arg_info: void function argument       0  (repro.cpp: 21)
NVC++/x86-64 Linux 21.3-0: compilation aborted

repro.cpp (752 Bytes)

MatColgrove · May 26, 2021, 4:04pm

Thanks Carl. We have a few known issues with array reduction support in OpenMP (we just added it recently), including this one. We have a fix being tested in our development compiler which, assuming testing goes well, will be available in a future release.

carl.nettelblad · May 29, 2021, 11:55am

Thanks for the information. I managed to work around the problem somewhat by using a Thrust reduce by key for that specific loop.

However, I can also note that trying to do the same thing with just atomics sometimes gave errors, but minor rearrangements fixed those. (Obviously, performance is not good when you hammer the same addresses with atomics.)

I finally just wanted to try another naïve approach.
#pragma omp target teams distribute
for (int i = 0; i < num_haps; i++)
{
#pragma omp parallel for reduction(+ : sums0[i])
for (int j = 0; j < num_haps; j++)
{
sums0[i] += table[i * num_haps + j]);
}
#pragma omp parallel for reduction(+ : sums1[i])
for (int j = 0; j < num_haps; j++)
{
sums1[i] += table[j * num_haps + i];
}
}

Even with some slight variations, this just resulted in:
nvc+±Fatal-/domus/h1/nettel/nvhpc/Linux_x86_64/21.3/compilers/bin/tools/cpp1 TERMINATED by signal 11

I have another issue with a lambda capture as well now, but I will try to rule out that it somehow references a stack-allocated pointer before posting.

I already got a 5x speedup with the Thrust + OpenMP Target approach against my CPU version with a T4 (and arguably very old CPU cores), so using nvc++ has been a blast, when cheating using managed memory allocation… From the docs I gather this is really only supported for OpenACC.

MatColgrove · June 1, 2021, 5:09pm

21.5 should be out here soon so you try again once available. Our OpenMP Target Offload support is very new so we’re still working through issues. I don’t know if 21.5 will fix your issue, but there’s a lot of fixes going in, so it may.

Though, you can also try using the OpenMP “loop” construct instead:

#pragma omp target teams loop 
for (int i = 0; i < num_haps; i++)
{
#pragma omp loop reduction(+ : sums0[i])
for (int j = 0; j < num_haps; j++)
{
sums0[i] += table[i * num_haps + j]);
}
#pragma omp loop reduction(+ : sums1[i])
for (int j = 0; j < num_haps; j++)
{
sums1[i] += table[j * num_haps + i];
}
}

I have another issue with a lambda capture as well now, but I will try to rule out that it somehow references a stack-allocated pointer before posting.

FYI, we’ve also started supporting C++ standard language parallelism (built on top of Thrust) if you want to give it a try. See: https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/

when cheating using managed memory allocation… From the docs I gather this is really only supported for OpenACC.

No, it’s supported for OpenMP as well, and the default for C++ stdpar. It’s one of the reasons why we added the “-gpu” flag (as opposed to the older OpenACC “-ta” flag), so we could put the common GPU options for all models under a single flag. Since OpenMP target offload and stdpar are so new, our docs are just a bit behind.

Topic		Replies	Views
[nvhpc-22.2] error: use of undefined value '%L.LB26_8163' nvc, nvc++ and nvfortran	27	2901	July 7, 2023
Issue with locally defined classes in OpenMP offload region (since NVHPC 22.5) nvc, nvc++ and nvfortran	7	1064	March 31, 2023
Improving compiler error with OpenACC + OpenMP: "Internal compiler error. confused OMP private processing" nvc, nvc++ and nvfortran	1	438	October 18, 2021
Problem with the nvc++ compiler for OpeMP GPU offloading nvc, nvc++ and nvfortran	2	574	March 10, 2023
Enabling OpenMP offload breaks OpenACC code nvc, nvc++ and nvfortran	6	1264	December 1, 2021
CUDA_ERROR_ILLEGAL_ADDRESS with OpenMP "distribute parallel for" nvc, nvc++ and nvfortran	2	238	May 15, 2024
Nvc++ OpenACC runtime segfaults if Intel MKL (numpy) is already loaded nvc, nvc++ and nvfortran	8	1255	October 7, 2023
Creating a shared library that utilises OpenMP offloading NVHPC 22.5 nvc, nvc++ and nvfortran	5	736	June 23, 2022
OpenMP offload w/ CUDA interop: undefined reference to `__fatbinwrap__NV_MODULE_ID' nvc, nvc++ and nvfortran	5	1062	May 22, 2023
Incorrect GPU results with #pragma acc atomic capture nvc, nvc++ and nvfortran	11	1357	February 27, 2023

Nvc++ OpenMP error inside llc

Related topics