segmentation fault in "__c_mcopy4" when using :man

Hi,

I have a code that works perfectly fine with pgi and openacc and target “-ta=tesla:cc70”.
I am now trying to port it to unified memory, but I get a segmentation fault in a pgi function.
This is how I execute the code:

module load   pgi/18.7 cuda/9.2
mpirun -n 1  cuda-memcheck P-Gadget3/P-Gadget3 param.par

These are my compiling options:

mpipgic++ -acc -ta=tesla:managed --c++11  -g  -O2 -mp -Minfo  -Mlarge_arrays ...

This is the output from my “cuda-gdb”:

05:07:26 b8p190ai@p10login1:~/testo/Gennaio2019_pragma/Box4mr/1Node/1MPI/1openMP_1GPU_um_2 cuda-gdb P-Gadget3/P-Gadget3  core.152478 
NVIDIA (R) CUDA Debugger
9.2 release
Portions Copyright (C) 2007-2018 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64le-elf-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from P-Gadget3/P-Gadget3...done.

warning: exec file is newer than core file.
[New LWP 152478]
[New LWP 152506]
[New LWP 152479]
[New LWP 152505]
[New LWP 152480]

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time

warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
Core was generated by `P-Gadget3/P-Gadget3 param.par '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00002000011c7da4 in __c_mcopy4 () from /gpfs/gpfs_gl4_16mb/pgi/18.7/linuxpower/18.7/lib/libpgc.so
[Current thread is 1 (Thread 0x200001830000 (LWP 152478))]
(cuda-gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x200001830000 (LWP 152478) 0x00002000011c7da4 in __c_mcopy4 () from /gpfs/gpfs_gl4_16mb/pgi/18.7/linuxpower/18.7/lib/libpgc.so
  2    Thread 0x20001a36f050 (LWP 152506) 0x0000200001467ad8 in poll () from /lib64/libc.so.6
  3    Thread 0x200001d0f050 (LWP 152479) 0x0000200001467ad8 in poll () from /lib64/libc.so.6
  4    Thread 0x200019f5f050 (LWP 152505) 0x000020000147b31c in accept4 () from /lib64/libc.so.6
  5    Thread 0x20000227f050 (LWP 152480) 0x0000200001479398 in epoll_wait () from /lib64/libc.so.6
(cuda-gdb) bt
#0  0x00002000011c7da4 in __c_mcopy4 () from /gpfs/gpfs_gl4_16mb/pgi/18.7/linuxpower/18.7/lib/libpgc.so
#1  0x0000200000cf7314 in __pgi_uacc_fill_buffer () from /gpfs/gpfs_gl4_16mb/pgi/18.7/linuxpower/18.7/lib/libaccg2mp.so
#2  0x0000200000ba9958 in __pgi_uacc_cuda_dataup1 () from /gpfs/gpfs_gl4_16mb/pgi/18.7/linuxpower/18.7/lib/libaccnmp.so
#3  0x0000200000b62cd0 in __pgi_uacc_dataup1 () from /gpfs/gpfs_gl4_16mb/pgi/18.7/linuxpower/18.7/lib/libaccgmp.so
#4  0x0000200000b6351c in __pgi_uacc_dataupx () from /gpfs/gpfs_gl4_16mb/pgi/18.7/linuxpower/18.7/lib/libaccgmp.so
#5  0x0000200000b61310 in __pgi_uacc_dataonb () from /gpfs/gpfs_gl4_16mb/pgi/18.7/linuxpower/18.7/lib/libaccgmp.so
#6  0x0000000010008048 in density () at Hydro/density.c:806
#7  0x000000001003ae68 in init () at CodeBase/init.c:1151
#8  0x000000001002e1d0 in begrun () at CodeBase/begrun.c:274
#9  0x00000000100cb4bc in main (argc=<optimized out>, argv=0x7fffd8ee14c8) at CodeBase/main.c:286
(cuda-gdb) f 6
#6  0x0000000010008048 in density () at Hydro/density.c:806
806               for(i=0;i<NDensityActivePart;i++){
(cuda-gdb) l
801     #ifdef ACC_DENSITY
802           ACC_PRAGMA_DATA(   ACC_COPYIN( DensityActiveParticleList[0:NDensityActivePart], LiteDIn[0:NumPart],WalkLiteP[0:NumPart],WalkLiteSPH[0:NumPart] )    ACC_COPY(LiteDOut[0:NDensityActivePart]) if(DoPrimaryOnGPU))
803           { //this parenthesys is contains the density data region. It is closed at the end of the iterations.
804             if(DoPrimaryOnGPU){
805     #pragma acc parallel loop
806               for(i=0;i<NDensityActivePart;i++){
807                 int MiniNGBList[ACC_GPU_NGB];
808                 int p_target=DensityActiveParticleList[i];
809                 int startnode = All.MaxPart;
810                 int endnode = startnode;
(cuda-gdb)

As you can see, the “#pragma” statement in line 805 is pretty innocent. Additionally, the code works perfectly well without the “:managed” option in the target flag.

What puzzles me is that the error comes from a pgi routine “__c_mcopy4”, so it looks like there is a copy being made somehow?

How can I investigate further what’s happening and what is causing the error?

Thanks.

Hi AntonioR,

It’s a bit difficult to say exactly what’s wrong without having the code myself, but if you can send a reproducing example to PGI Customer Service (trs@pgroup.com), I’d be happy to take a look.

Though from the info shown, I do have a few questions, comments, and possible theories.

First, what’s the compiler feedback messages from this section of code? (i.e. add the flag “-Minfo=accel”)

Also, for “ACC_PRAGMA_DATA”, is this creating a structured (i.e. #pragma acc data) or unstructured (i…e. #pragam acc enter data) data region?

A “__c_mcopy4” call around a parallel loop, is implying that there’s something the compiler is implicitly copying to the device. Where the mcopy call is copying data from virtual memory to a pinned buffer used to perform the data transfer. The segv implies that the address being used is bad or if there’s an out-of-bounds or alignment issue. This doesn’t explain why it would work without unified memory, but it may just be luck that it works without it (i.e. the memory layout may be slightly different as to not trigger the segv).

The compiler info messages will tell you if the compiler is implicitly copying an array or structure. In particular, I’m interested in the “All” structure or class object (I’m not sure which it is). If anything is getting implicitly copied, you might try making this copy explicit by adding it to the above data region. Of if it’s being manually manged at some point higher in the program, try adding the variable to a “present” clause. “present” will tell the compiler to check if the variable is already on the device and, more important here, not generate the implicit data copy. I’ve seen issues when aggregate data structures with dynamic data (ex. a struct with pointers) will cause runtime errors without being in a present clause. Though, this situation wouldn’t cause memcpy to segv, so, may not matter in this case.

Another thing you can do is set the environment variable “PGI_ACC_DEBUG=1”. This will have the OpenACC runtime dump debugging info. It probably wont show why the segv is happening, but might give us clues as to which variable is triggering it.

-Mat