Segmentation fault in pthread_mutex_lock ()

Hi,

I get a segmentation fault in pthread_mutex_lock () when trying to run my openacc code on gpu. On the cpu it runs fine. The segmentation fault happens right at the first acc data copy directive. Do you have any idea what might be wrong ? I can provide further information or access to the code if necessary.

Thank you and regards,
Thomas

Thread 1 “ftg_vdiff_up_te” received signal SIGSEGV, Segmentation fault.
0x00002aaabcce7714 in pthread_mutex_lock () from /lib64/libpthread.so.0
Missing separate debuginfos, use: zypper install libjasper1-debuginfo-1.900.14-195.3.1.x86_64 libjpeg62-debuginfo-62.1.0-30.1.x86_64 libjpeg8-debuginfo-8.0.2-30.3.x86_64 liblzma5-debuginfo-5.0.5-4.852.x86_64 libnuma1-debuginfo-2.0.9-9.1.x86_64 libpython2_7-1_0-debuginfo-2.7.13-27.1.x86_64 libxml2-2-debuginfo-2.9.4-46.3.2.x86_64 libz1-debuginfo-1.2.8-11.1.x86_64
bt
(gdb) #0 0x00002aaabcce7714 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1 0x00002aaaacdaff88 in ?? ()
from /usr/lib64/gcc/x86_64-suse-linux/4.8/…/…/…/…/lib64/libcuda.so.1
#2 0x00002aaaace66471 in ?? ()
from /usr/lib64/gcc/x86_64-suse-linux/4.8/…/…/…/…/lib64/libcuda.so.1
#3 0x00002aaaace665e5 in ?? ()
from /usr/lib64/gcc/x86_64-suse-linux/4.8/…/…/…/…/lib64/libcuda.so.1
#4 0x00002aaaacdb5eb4 in ?? ()
from /usr/lib64/gcc/x86_64-suse-linux/4.8/…/…/…/…/lib64/libcuda.so.1
#5 0x00002aaaacdb7707 in ?? ()
from /usr/lib64/gcc/x86_64-suse-linux/4.8/…/…/…/…/lib64/libcuda.so.1
#6 0x00002aaaacd8a266 in ?? ()
from /usr/lib64/gcc/x86_64-suse-linux/4.8/…/…/…/…/lib64/libcuda.so.1
#7 0x00002aaaacdd79ed in cuInit ()
from /usr/lib64/gcc/x86_64-suse-linux/4.8/…/…/…/…/lib64/libcuda.so.1
#8 0x00002aaaac9a9dd5 in ?? ()
from /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0
#9 0x00002aaaac9a9e31 in ?? ()
from /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0
#10 0x00002aaabcce3c13 in __pthread_once_slow () from /lib64/libpthread.so.0
#11 0x00002aaaac9dc919 in ?? ()
from /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0
#12 0x00002aaaac9a600a in ?? ()
from /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0
#13 0x00002aaaac9a9ceb in ?? ()
from /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0
#14 0x00002aaaac9cbd2a in cudaFree ()
from /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0
#15 0x00002aaaae61aeaf in __pgi_uacc_cuda_initdev ()
from /apps/common/UES/pgi/17.10/linux86-64/17.10/lib/libaccncmp.so
#16 0x00002aaaae402eaa in __pgi_uacc_enumerate ()
from /apps/common/UES/pgi/17.10/linux86-64/17.10/lib/libaccgmp.so
#17 0x00002aaaae4033c3 in __pgi_uacc_initialize ()
from /apps/common/UES/pgi/17.10/linux86-64/17.10/lib/libaccgmp.so
#18 0x00002aaaae3f9e3b in __pgi_uacc_dataenterstart ()
from /apps/common/UES/pgi/17.10/linux86-64/17.10/lib/libaccgmp.so
#19 0x000000000041f0fd in mo_vdiff_upward_sweep::vdiff_up (
kproma=, kbdim=, klev=,
klevm1=, ktrac=, ksfc_type=,
idx_wtr=, pdtime=, pfrc=…, pcfm_tile=…,
aa=…, pcptgz=…, pum1=…, pvm1=…, ptm1=…, pmair=…, pmdry=…,
pqm1=…, pxlm1=…, pxim1=…, pxtm1=…, pgeom1=…, pztkevn=…,
bb=…, pzthvvar=…, pxvar=…, pz0m_tile=…, pkedisp=…, pute_vdf=…,
pvte_vdf=…, pq_vdf=…, pqte_vdf=…, pxlte_vdf=…, pxite_vdf=…,
pxtte_vdf=…, pz0m=…, pthvvar=…, ptke=…, psh_vdiff=…,
pqv_vdiff=…) at …/…/…/src/mo_vdiff_upward_sweep.f90:141
#20 0x000000000040d629 in ftg_test_vdiff_up ()


==24456== Invalid read of size 4
==24456== at 0x16E4E714: pthread_mutex_lock (in /lib64/libpthread-2.22.so)
==24456== by 0x6F16F87: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6FCD470: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6FCD5E4: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6F1CEB3: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6F1E706: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6EF1265: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6F3E9EC: cuInit (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6B10DD4: ??? (in /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0.44)
==24456== by 0x6B10E30: ??? (in /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0.44)
==24456== by 0x16E4AC12: __pthread_once_slow (in /lib64/libpthread-2.22.so)
==24456== by 0x6B43918: ??? (in /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0.44)
==24456== Address 0x3038 is not stack’d, malloc’d or (recently) free’d
==24456==
==24456==
==24456== Process terminating with default action of signal 11 (SIGSEGV)
==24456== Access not within mapped region at address 0x3038
==24456== at 0x16E4E714: pthread_mutex_lock (in /lib64/libpthread-2.22.so)
==24456== by 0x6F16F87: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6FCD470: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6FCD5E4: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6F1CEB3: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6F1E706: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6EF1265: ??? (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6F3E9EC: cuInit (in /usr/lib64/libcuda.so.375.74)
==24456== by 0x6B10DD4: ??? (in /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0.44)
==24456== by 0x6B10E30: ??? (in /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0.44)
==24456== by 0x16E4AC12: __pthread_once_slow (in /lib64/libpthread-2.22.so)
==24456== by 0x6B43918: ??? (in /apps/common/UES/pgi/17.10/linux86-64/2017/cuda/8.0/lib64/libcudart.so.8.0.44)

Hi Thomas,

It looks like the error is occurring when the runtime is trying to initialize your device. Unfortunately, I have not seen this error before so am not sure what’s causing it, though my guess would be that it’s some type of installation or configuration issue with your device.

What CUDA driver and device do you have installed? (you can see this by running “nvidia-smi” or “pgaccelinfo”)

Are you able to run a simple CUDA C program?

-Mat

I compile and link with -ta=nvidia:cc60,cuda8.0 -Mcuda which works for other simple openACC programs.

When using the cray compiler I don’t get any error.

This is the output:

CUDA Driver Version: 8000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.74 Wed Jun 14 01:39:39 PDT 2017

Device Number: 0
Device Name: Tesla P100-PCIE-16GB
Device Revision Number: 6.0
Global Memory Size: 17066885120
Number of Multiprocessors: 56
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1328 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: exclusive-process
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 715 MHz
Memory Bus Width: 4096 bits
L2 Cache Size: 4194304 bytes
Max Threads Per SMP: 2048
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
PGI Compiler Option: -ta=tesla:cc60

Thank you!

Great thanks! So the error must be something to do with the program itself and not the device.

Can you try adding a “acc_init(acc_get_device_type())” call in the main program? This will move the device initialization early in the run and we can see if it’s actually a problem with the device initialization or if the issue is with the first kernel launch.

Also, would I be able to get a reproducing example of the code that I can try? If so, please either post or send information to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me.

-Mat

calling acc_get_device_type() right at the beginning already results in a segmentation fault.

I’ll see if I can provide you an example code.