the code fails to compile with -Minline

Hallo. I am working on some code at https://github.com/AndStorm/QUESTION.git (last commit, the build directory is where nbody.cpp lies). The code works properly on CPU using gcc and PGI 19.10. The code compiles for launching on GPU using the compile line (GPU GeForce 650 Ti installed in Intel Core i7 CPU, compute capability 3.0):

cmake . -DCMAKE_C_COMPILER=pgcc -DCMAKE_CXX_COMPILER=pgc++
-DCMAKE_C_FLAGS="-acc -Minfo=acc -mcmodel=medium -ta=tesla:cc30 -Mcuda=cuda10.1"
-DCMAKE_CXX_FLAGS="-acc -Minfo=acc -mcmodel=medium -ta=tesla:cc30 -Mcuda=cuda10.1"
-DCMAKE_CXX_STANDARD=17 -DACC=ON -DCUDA=ON

But the problem is that the code does not work properly on GPU using PGI 19.10 + OpenAcc without inlining using -Minline compile option. But when I add the -Minline option to the compile line, the compilation fails with (ERROR.dat):

/opt/pgi/linux86-64-llvm/19.10/share/llvm/bin/opt: /tmp/pgc++2-XcIvZiTGvW.ll:1646:103: error: use of undefined value ‘@__sti___70__home_70_gaa_NFbuild_script_CHECK_GPU_CURRENT_WORK_TEMP_COPY_nbody_cpp_bc1207be’
@llvm.global_ctors = appending global [4 x { i32, void ()* }][{ i32, void ()* } { i32 65535, void ()* @__sti___70__home_70_gaa_NFbuild_script_CHECK_GPU_CURRENT_WORK_TEMP_COPY_nbody_cpp_bc1207be }, { i32, void ()* } { i32 65535, void ()* @…acc_data_constructor_1 }, { i32, void ()* } { i32 65535, void ()* @…acc_cuda_funcreg_constructor_1 }, { i32, void ()* } { i32 65535, void ()* @Mcuda_compiled }]
^
CMakeFiles/Test.dir/build.make:62: recipe for target ‘CMakeFiles/Test.dir/nbody.cpp.o’ failed
make[2]: *** [CMakeFiles/Test.dir/nbody.cpp.o] Error 2
CMakeFiles/Makefile2:72: recipe for target ‘CMakeFiles/Test.dir/all’ failed
make[1]: *** [CMakeFiles/Test.dir/all] Error 2
Makefile:83: recipe for target ‘all’ failed
make: *** [all] Error 2

I spent the whole day seeking how to fix the error, but did not find any way out.
Please, help me get this code to work.

Hi Andrey,

I was able to recreate the issue with PGI 19.10 and 20.1, and it does appear to be a compiler code generation issue. Though, it looks like we have a fix in place already which will be available in the next release.

We were planning on releasing next week in conjunction with GTC, but due to COVID-19 disruptions, we’re having to postpone the release a bit. May be a few more weeks.

-Mat

Hi, Mat.
Thank You for the answer. Of course, understand COVID-19 disruptions, which postpone the PGI release.
But, You see, this code is a part of my current work, on which I am to report in the nearest weeks. I am afraid, I will not have time to wait for the next PGI release (as You wrote, a few more weeks). So, I am in trouble.

Maybe there is some workaround how to get this code to work (maybe change somehow the architecture of the code or data transfers to GPU from CPU, so that it would compile for GPU using -Minline)?

Maybe if to delete #pragma acc declare create(…) on line 12 in body/include/T3AllocateData.h, add #pragma acc data copyin(particles) on line 45 in nbody.cpp and pass the array particles as a function parameter (add one more function parameter Particle * particles) in GetFS(…) in body/include/T3Process.h and in tpt/include/T3InelasticddImpl.h, the code will compile with -Minline?

If You see how to get this code to work on GPU using PGI 19.10 + OpenAcc, maybe somehow changing its architecture or allocation of data on GPU, please, tell me. It is very important for me.

Thank You very much.
Andrey.

Maybe if to delete #pragma acc declare create(…) on line 12 in body/include/T3AllocateData.h, add #pragma acc data copyin(particles) on line 45 in nbody.cpp and pass the array particles as a function parameter (add one more function parameter Particle * particles) in GetFS(…) in body/include/T3Process.h and in tpt/include/T3InelasticddImpl.h, the code will compile with -Minline?

Possibly? Although I’m not compiler engineer nor can really understand the intermediate LLVM code, I do believe the issue has to do with “particle” in some way since “llvm.global_ctors” is the global constructor used to creating global classes and structs. Though I don’t know what the missing symbol refers to. Possibly the compiler is inadvertently removing this symbol since after inlining it doesn’t see that it’s used any longer. In the 20.3 pre-release, this symbol is not removed.

Of course, this is just a guess as to what’s going on. I tried adding “attribute((noinline))” to the Particle and T3LorentzVector constructors so they don’t get inlined, but it didn’t change anything.

Note, the problem I see with you removing “particles” from the “declare” directive is that you access “particles” directly from subroutines. If any of these routines are offloaded to the GPU, then you “declare” is the only method to create a globally accessible variable. If this is the case, then you’d need to also pass “particles” as an argument to these routines so the global reference isn’t necessary.

-Mat