possible .so unload bug

I have encountered a segfault that may be caused by CUDA when it unloads from a shared object.

I’m building CUDA calls into a .so that I then load through the java native interface and call from a java program. It works fine, but upon unloading (when the java program shuts down), I get a segfault. The segfault only occurs when CUDA code is in the library, regardless of whether it is called or not–I can take a regular, working java native library, link it with a CUDA object (without actually calling the CUDA functions, or changing the source in my code), and get the segfault. Identical code works fine in windows; this appears to be a linux-only problem.

I suspect that CUDA is freeing something improperly when it unloads. Has anyone seen anything like this? I would be interested to see if anyone else can replicate this problem. Many thanks for your help.

Details follow:

My system is CentOS 4.4. Java version is:
java version “1.6.0”
Java™ SE Runtime Environment (build 1.6.0-b105)
Java HotSpot™ Server VM (build 1.6.0-b105, mixed mode)

Here is (part of) the error log file:

An unexpected error has been detected by Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00000031, pid=7498, tid=3086899424

Java VM: Java HotSpot™ Server VM (1.6.0-b105 mixed mode)

Problematic frame:

C 0x00000031

If you would like to submit a bug report, please visit:


--------------- T H R E A D ---------------

Current thread is native thread

siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000031

EAX=0x0830dd60, EBX=0xbfffd37c, ECX=0x08166d88, EDX=0x08166d88
ESP=0xbfffd31c, EBP=0xbfffd3a8, ESI=0x90d0f0a0, EDI=0x8fb97700
EIP=0x00000031, CR2=0x00000031, EFLAGS=0x00210296

Top of Stack: (sp=0xbfffd31c)
0xbfffd31c: 8f6fd8d8 08166d88 bfffd37c bfffd37c
0xbfffd32c: 00000000 8f71c70c 08166d88 b7fef7d8
0xbfffd33c: 00000000 00000000 00000000 00000000
0xbfffd34c: 00000000 00000000 00000000 00000000
0xbfffd35c: 008a9fd4 8fb97700 00000001 bfffd3a0
0xbfffd36c: 0089fc66 8fb978ac 8fb82b30 0830b590
0xbfffd37c: 0830b270 00000000 90d1067c 009d8ff4
0xbfffd38c: 00000000 009da380 bfffd3b8 008dc7ce

Instructions: (pc=0x00000031)
[error occurred during error reporting, step 100, id 0xb]

Stack: [0xbffb0000,0xc0000000), sp=0xbfffd31c, free space=308k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C 0x00000031
C [libGPU.so+0x98ad]
C [libGPU.so+0x4607] _init+0x9f
C [libGPU.so+0xa0b6] _fini+0x16
C [ld-linux.so.2+0xc907]
C [libc.so.6+0x2a527] exit+0x77
C [libc.so.6+0x14ded] __libc_start_main+0xdd

Are you using CUBLAS and/or CUFFT? We have identified a known bug with the cleanup in these libraries that exhibits very similar behavior in Python.

No, I’m not using either of those. I’m only linking with cuda and cudart.