load dynalic library use CUDA at run time cause segfalt

I am trying to compile a dynamic library with CUDA code, and load it at runtime.

It seems that the code run well, but it crash when I call dlclose(lib_handle);

the .so is OK when it was link to the executable. Only dynamic loading of the library have problem.

the .cu is like:

extern “C” int do_gpu_work()
// allocate and free the device memory is enough to reproduce the crash when call dlclose()
float* d_A;

cudaMalloc((void**)&d_A, size);


the code is like:

#include <dlfcn.h>
int main()

void *lib_handle;
void (*fn)();

lib_handle = dlopen("/usr/local/lib/libcudasotest.so.0.0.0", RTLD_LAZY);

fn=(void (*)())dlsym( lib_handle, “do_gpu_work” );


// segfault with this call

return 0;

I have found the same problem but have narrowed it down a little. The problem only appears for me when compiling to 32bit Linux. I have tested using the 195.17 driver and CUDA 3.0 Toolkit. When running the same code (equivalent to the original post) on Mac or 64bit Linux it does not cause a Segmentation Fault (SIGSEGV). However, if the application is compiled for 32bit Linux or as a 32bit binary on 64bit Linux, the program will fault after the return of main if the dynamic library is unloaded.

Is there anyone from nVidia who can comment on what may be happening here? Simply not including a call to dlclose() is not a viable work around.

Using valgrind I can see a suspicious ioctl warning about uninitialized memory on the first call to a CUDA API function that causes the creation of a CUDA context. Then there is a call into the libcuda.so after it has been unloaded. I’m not sure if they are related.

********************* The ioctl warning ****************************

==22055== Syscall param ioctl(generic) points to uninitialised byte(s)

==22055== at 0xA4B869: ioctl (in /lib/libc-2.5.so)

==22055== by 0x4291BE2: (within /usr/lib/libcuda.so.195.17)

==22055== by 0x4274C4B: (within /usr/lib/libcuda.so.195.17)

==22055== by 0x4248CC8: (within /usr/lib/libcuda.so.195.17)

==22055== by 0x4241196: (within /usr/lib/libcuda.so.195.17)

==22055== by 0x42E65B0: cuCtxCreate (in /usr/lib/libcuda.so.195.17)

==22055== by 0x416DA19: (within /usr/local/cuda/lib/libcudart.so.3.0.8)

==22055== by 0x416E56B: (within /usr/local/cuda/lib/libcudart.so.3.0.8)

==22055== by 0x41504A8: cudaGetSymbolAddress (in /usr/local/cuda/lib/libcudart.so.3.0.8)

==22055== by 0x400BD69: cudaError cudaGetSymbolAddress(void**, int const&) (cuda_runtime.h:311)

==22055== by 0x400BCD8: simengine_runmodel (cudalibtest.cu:40)

==22055== by 0x804A13B: main (main.c:25)

**************************** Unloading of shared libraries followed by segfault **********************************

–22055-- Discarding syms at 0x400A000-0x400F000 in /tmp/cudalibtest.so due to munmap()

–22055-- Discarding syms at 0x4136000-0x417B000 in /usr/local/cuda/lib/libcudart.so.3.0.8 due to munmap()

–22055-- Discarding syms at 0x1C5000-0x2B0000 in /usr/lib/libstdc++.so.6.0.8 due to munmap()

–22055-- Discarding syms at 0x417B000-0x6C3B000 in /usr/lib/libcuda.so.195.17 due to munmap()

–22055-- Discarding syms at 0xAC8000-0xAEF000 in /lib/libm-2.5.so due to munmap()

–22055-- Discarding syms at 0xDD3000-0xDDF000 in /lib/libgcc_s-4.1.2-20080825.so.1 due to munmap()


==22055== Jump to the invalid address stated on the next line

==22055== at 0x4251930: ??? <----------------------------------------- NOTE: This address is in the range for libcuda.so, the CUDA driver, above!!!

==22055== by 0x997E93: (below main) (in /lib/libc-2.5.so)

==22055== Address 0x4251930 is not stack’d, malloc’d or (recently) free’d


==22055== Process terminating with default action of signal 11 (SIGSEGV)

==22055== Access not within mapped region at address 0x4251930

==22055== at 0x4251930: ???

==22055== by 0x997E93: (below main) (in /lib/libc-2.5.so)

I have the same problem here.

I’m using cuda in a dynamic library (opened by scilab) , and when I close scilab, I got a segfault on exit function.
But if I call exit(); in my dynlib, I got no error message (but it’s not a viable way to do things…).

This problem only occur on 32bits build. 64bits builds exit without problem.

I have the same problem with Ubuntu 12.04 (64 bit) and CUDA 4.1. Has anyone fixed this issue?


I had a similar problem and finally managed to narrow it down to a linking error of mine. He are the command lines I used to create the faulty library:

nvcc -c -arch=xxx -fpic *.cu

g++ -shared *.o -o libgpucode.so -L$CUDA_INSTALL_PATH/lib64 -lcudart

And the dynamic library was working perfectly. Simply at unloading type, I experienced crashes with a cryptic message such as “pure virtual function called”. And a colleague of mine finally identified the cause of it to the lack of explicit link to libcuda.so. By adding it in the above linking command line like this:

g++ -shared *.o -o libgpucode.so -L$CUDA_INSTALL_PATH/lib64 -lcudart -lcuda

it solved the issue entirely.

I’m not sure whether it applies to your own problem, but just in case…

Thanks Gilles for your answer. I did try your suggestion but sadly did not change anything for me. I don’t experience any error message about a pure virtual function called though so it might not be the same issue.

I also implemented the simple code given in the first message above. And it does not segfault for me. I guess that it confirms the fact that it does work in 64 bit. I think the issue I have is closest to lebsack’s. The valgrind outputs are very similar.

lebsack> Any chance that you eventually managed to fix it? Do you happen to use Qt in your application?

I ask because valgrind gives me errors during the creation of a QApplication object. I have also analysed the memory with the software TotalView and it finds the following error: “Allocator returned a misaligned block: heap may be corrupted”. I even submitted a bug to Qt (bug report). If I removed the QApplication creation, I have no segmentation fault anymore. But I could just be lucky. Or it’s really unrelated.

Also, to go even further that the valgrind output, the TotalView debbuger tells me that it seems to crash after the unload of libcuda in the call of the function clGetExtensionFunctionAddress (located in libcuda.so). It’s a OpenCL function and I don’t use OpenCL. Any idea of what could call it?

Just to let you know. We found a “fix” to our problem (see valgrind output above). We added a call to a cuda function in our main (we added cuInit(0) even though we use the CUDA runtime API in the rest of our code). What we think is that our main executable now depends on libcuda and libcuda is therefore not unloaded from memory and the crash does not occur. Without this explicit dependence to libcuda, libcuda is unloaded too early and it crashes at the end of our main (maybe a static allocation). Well at least it fixed our issue in debug. In release we still have a segfault,

Maybe our build system is not good enough and we do something we shouldn’t. But it’s very difficult to pinpoint. Has someone already seen this type of problem?