__cudaUnregisterFatBinary ? Segmentation fault problem in linux box.

I’m programming an extension for python to use some cuda functions, in a linux 32bit box. The module run ok without any cuda functions, but when I enable cuda functions it execute ok but end with an segmentation fault.

To check what is happening I used the python debugger and the gdb debugger. The following traceback I got at the end of the execution:

font=“Courier” backtrace
#0 0xb7ac701d in ?? () from /usr/local/share/cuda/lib/libcudart.so
#1 0x00000001 in ?? ()
#2 0x000000d9 in ?? ()
#3 0xbff3e6a4 in ?? ()
#4 0xbff3e6e8 in ?? ()
#5 0xb7ab67a6 in __cudaUnregisterFatBinary () from /usr/local/share/cuda/lib/libcudart.so
Backtrace stopped: frame did not save the PC[/font]

Can you help me? <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=’:’(’ />

I got the same traceback using -deviceemu flag.

If you need the code to reproduce it, I can attach it, it’s free :)

I’d be interested in getting a copy of your code and helping you.

Here it is

Read the INSTALL file to compile.

I’m waiting for any question.

pydock.18.tar.gz (360 KB)


I was testing other ways to init with context but it doesnt work.

I wrote a simple test code to compile with nvcc and python. I feel it’s a compiling problem, but I can’t understand why. Here I append the simple code.

If you compile and test it you have Segmentation Fault <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=’:’(’ />

make Module.so nvcc -c -o Module.cu_o Module.cu -deviceemu -I/usr/include/python2.4/ gcc -shared -rdynamic -o Module.so Module.cu_o -deviceemu -L/usr/local/share/cuda/lib -lcuda -lcudart -ltlshook -lcufftemu -lpython2.4 python
Python 2.4.4 (#2, Apr 26 2007, 00:02:45)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.

import Module
Segmentation fault

But if you try to compile directly with nvcc, you cant generate a dynamic shared object.

make Module.so_ nvcc -c -o Module.so_ Module.cu -deviceemu -I/usr/include/python2.4/ -Xcompiler -shared,-rdynamic -L/usr/local/share/cuda/lib -lcuda -lcudart -ltlshook -lcufftemu -lpython2.4 file Module.so_
Module.so_: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

but the file output must be:

$ file Module.so
Module.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), not stripped


Any Idea?
I will continue with my code.

Simple.tar.gz (627 Bytes)

:D ,

I know whats happening. It;s about the library destructor functions defined in cuda/include/crt/host_runtime.h

[font=“Courier”]attribute((destructor)) static void __cudaUnregisterBinary(void)

That function try to clean all resources used in the shared object, but python clean all resources before.

I could use my python module without segfault compiling with gcc argument “-nostartfiles”, but I dont know if that function clean other resources used in the device.

But if I want to use cufft or other library, I got the same segfault. It’s because I cant link these library with that options.

Is possible to remove that functions in the libraries and append a user function to clean the resources the API?

It might help if you call cudaThreadExit() explicitly.


Hi Peter,

thxs, but It doesnt work. :( Any other idea?

I feel CUDA have a list of vector of allocated segments by cudaMalloc, but Python capture thats allocs and dealloc them at the end of the execution, before free libraries & modules. I will write to the python developer team to check it.


We are working on the problem, stay tuned!!!

We are in the process of wrapping all the cuda libraries into python using ctypes and are seeing the same issue - but only with cublas and cufft. This problem wasn’t there in the 0.8 release. Here is a simply python script (no compilation needed) that demonstrates the problem:

############ Begin Python script #############

“”"A simple python script showing segfaults on exit.

This script uses the ctypes packages to dynamically load the CUDA

libraries. We have been using this approach successfully since the

0.8 release of CUDA and it has worked extremely well. This script

will run out of the box with Python 2.5 (ctypes comes with Python 2.5)

and will also run on Python 2.4 if ctypes is downloaded and installed


I don’t think we had this problem with CUDA 0.8.

Run the script by doing:

$ python test_segfault.py


from ctypes import *

Uncomment a particular library to see its effect. It is important

to note that even though these segfault on exit, they can be

used without problems in the meantime.

Either of these cause a segfault on exit.

#libcublas = cdll.LoadLibrary(‘libcublas.so’)

#libcufft = cdll.LoadLibrary(‘libcufft.so’)

This does not segfault on exit.

libcudart = cdll.LoadLibrary(‘libcudart.so’)

############ End Python script #############

I am getting this problem as well, when using CUDA inside a shared library. No matter if I actually do any CUDA stuff, when the application closes I get:

Caught SIGSEGV accessing address 0x61
#0 0x00002b7dfafe8d4f in waitpid () from /lib/libpthread.so.0
#1 0x00002b7dfad4d3f0 in g_on_error_stack_trace ()
#2 0x0000000000402bac in ?? ()
#4 0x0000000000000061 in ?? ()
#5 0x00002aaaab51523b in __cudaUnregisterFatBinary ()
#6 0x00002aaaab73ee62 in __do_global_dtors_aux ()
#7 0x00007fffb0c94390 in ?? ()
#8 0x00002aaaab78dc01 in _fini () from /home/wladimir/lib/libcudawavelet.so
#9 0x0000000000000000 in ?? ()

As am I. My cuda code is linked into a shared library being loaded by java running on 64 bit Suse Linux 10.2.

I’m using CUDA as a shared library as well. And here what I got:

#0  0x00000031 in ?? ()

#1  0xb523def5 in __cudaUnregisterFatBinary () from /usr/local/cuda/lib/libcudart.so

#2  0xb5442a70 in __cudaUnregisterBinary () from /home/.../linux/lib/libmilx-CudaStuff.so

#3  0xb543e430 in __do_global_dtors_aux () from /home/.../linux/lib/libmilx-CudaStuff.so

#4  0xb544688c in _fini () from /home/.../linux/lib/libmilx-CudaStuff.so

#5  0xb7f374ce in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2

#6  0xb72a9299 in exit () from /lib/tls/i686/cmov/libc.so.6

#7  0xb72928d4 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6

#8  0x080517e1 in _start ()

I’m not sure I got a segmentation fault with CUDA 0.8. But I might be wrong. Apparently it’s a known bug. So we just have to wait I reckon.

Has anyone managed to resolve this issue? Just want to check before I spend cycles debugging…

The bug has been fixed in Cuda 1.1. It should be released in November

Great! Thank you.

Thanks from me too… your fast reply saved me a couple of hours for sure.

Hi I am having the same issue with cuda 10.2 any pointers to the solution!
cuda 10.0 is working just fine tho