crash in libGLX_nvidia.so.0 library with driver version 384.98

In a complex java application that mix opencl 1.1 and opengl 4 running on RHEL 6u5 - 64 bits,
I’ve experiment a crash in the driver version 384.98 and 375.20 with a GTX 970 or GTX 750 cards
I’ve no problem with 352.55

(with driver 384.98)
The problem is in libGLX_nvidia.so.0 at address 0xbe726

with objdump -S

be71e: 00
be71f: 0f 95 c2 setne %dl
be722: 84 d2 test %dl,%dl
be724: 75 ea jne be710 <vk_icdGetInstanceProcAddr+0x13930>
be726: 49 8b 84 24 68 09 00 mov 0x968(%r12),%rax
be72d: 00

The gdb stack trace is :
#0 0x0000003aed432925 in raise () from /lib64/libc.so.6
#1 0x0000003aed434105 in abort () from /lib64/libc.so.6
#2 0x00007fb63f44f605 in os::abort(bool) () from /mw/MW_jre_1.8.0_92/lib/amd64/server/libjvm.so
#3 0x00007fb63f5eea63 in VMError::report_and_die() () from /mw/MW_jre_1.8.0_92/lib/amd64/server/libjvm.so
#4 0x00007fb63f454e2f in JVM_handle_linux_signal () from /mw/MW_jre_1.8.0_92/lib/amd64/server/libjvm.so
#5 0x00007fb63f44b5c3 in signalHandler(int, siginfo*, void*) () from /mw/MW_jre_1.8.0_92/lib/amd64/server/libjvm.so
#6
#7 0x00007fb6118d0726 in ?? () from /usr/lib64/libGLX_nvidia.so.0
#8 0x00007fb611869249 in ?? () from /usr/lib64/libGLX_nvidia.so.0
#9 0x00007fb60b23a4cf in ?? () from /usr/lib64/libnvidia-glcore.so.384.98
#10 0x00007fb60b23a8e0 in ?? () from /usr/lib64/libnvidia-glcore.so.384.98
#11 0x00007fb60b27d860 in ?? () from /usr/lib64/libnvidia-glcore.so.384.98
#12 0x00007fb60b238389 in ?? () from /usr/lib64/libnvidia-glcore.so.384.98
#13 0x00007fb6009b125e in ?? () from /usr/lib64/libnvidia-opencl.so.1
#14 0x00007fb6008bf291 in ?? () from /usr/lib64/libnvidia-opencl.so.1
#15 0x00007fb6008a3905 in ?? () from /usr/lib64/libnvidia-opencl.so.1
#16 0x00007fb6008a3ce7 in ?? () from /usr/lib64/libnvidia-opencl.so.1
#17 0x00007fb6008c517d in ?? () from /usr/lib64/libnvidia-opencl.so.1
#18 0x00007fb6008c628e in ?? () from /usr/lib64/libnvidia-opencl.so.1
#19 0x00007fb6008c6df0 in ?? () from /usr/lib64/libnvidia-opencl.so.1
#20 0x00007fb6009dfe18 in ?? () from /usr/lib64/libnvidia-opencl.so.1
#21 0x0000003aed8079d1 in start_thread () from /lib64/libpthread.so.0
#22 0x0000003aed4e8b6d in clone () from /lib64/libc.so.6

Thanks for whatever support you are able to provide.

i add report generated with the nvidia-bug-report.sh script.

nvidia-bug-report.log.gz (257 KB)

Your xorg.conf is a mess, lots of leftovers from an ATI and several sections to add a nVidia. Try to run without it or just use a minimal one like

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 970"
    BusID          "PCI:1:0:0"
EndSection

Though I don’t think that would be the reason for your application crashing the driver. Try to put together a minimal test case.

Thanks for your answer.

Our application crash inside the function clEnqueueAcquireGLObjects.

In which condition this call can crash ?

I’ve checked that :
- before the call, the texture exists and is complete
- i’ve called glfinish/glflush
- glcontext is present on current thread

The same code works on AMD GPU and on NVIDIA GPU with old driver (352.55).
We plan to migrate our system from AMD to NVIDIA but we are blocked on this problem.

Looking at the history of cl/gl coop on nvidia I don’t know if that’s a stable idea.
cuda 1st opencl…