I am running into a ton of weird issues with 5.5 that do not show up with 5.0, even when using the 5.5 driver and the 5.0 toolkit. I have a bunch of internal tests that compile and link against the CUDA runtime and CUBLAS (legacy or V2, doesn’t matter which one) that all use pinned memory but never initialise the device and in particular never use cublas at all: My code makes a runtime decision if a given array is malloc’ed on the CPU, in CPU pinned memory, or on the device.
Basically, I’m testing if I can run all my CPU kernels on arrays allocated through cudaHostAlloc(…,default), which should be perfectly legitimate.
Problem is: All these tests die horribly at shutdown with stack traces originating from within cublas, even though I never ever even initialise a CUDA/CUBLAS context! Example (this is a full trace, it happens after my valgrind-safe application calls ‘exit’):
- /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x2b3d7efd44a0]
- /lib64/ld-linux-x86-64.so.2(+0x15181) [0x2b3d7a316181]
- /lib64/ld-linux-x86-64.so.2(+0xf176) [0x2b3d7a310176]
- /lib/x86_64-linux-gnu/libdl.so.2(+0x152f) [0x2b3d7f65f52f]
- /lib/x86_64-linux-gnu/libdl.so.2(dlclose+0x1f) [0x2b3d7f65f00f]
- /sfw/cuda/5.5/lib64/libcublas.so.5.5(+0x1c0e6c) [0x2b3d7a933e6c]
- /sfw/cuda/5.5/lib64/libcublas.so.5.5(+0x1c0f75) [0x2b3d7a933f75]
- /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0x9d) [0x2b3d7efd9d1d]
- /sfw/cuda/5.5/lib64/libcublas.so.5.5(+0x1dec6) [0x2b3d7a790ec6]
Unfortunately, all my attempts to boil this down to a simple reproducer failed. So my question is twofold: Has anyone ever seen this before? Is there a way to statically link against CUDA/CUBLAS, since the issue is apparently within .so unloading mechanism of the Linux kernel (I’m afraid static linking is unsupported)?
To increase the fun, this only happens in debug builds, it does not happen in optimised builds and it especially does not happen when running through gdb or ddt. Sigh. All cuda/cublas-related functionality that is build into my lib does not use global variables – which are occasionally “invisible” in gdb.
Thanks for shots in the dark, I’m honestly lost…
PS: a few more traces obtained by connecting to the running app with gdb:
(gdb) masterslave () at /home/user/buijssen/nobackup/feastobj/5/feast/fb2/src_coproc/object/pc-ivybridge-linux64-gcc-blas-optNO/masterslave.f90:89 89 call par_postexit (gdb) cont 91 end program masterslave (gdb) cont __libc_start_main (main=0x6b1b84 <main>, argc=2, ubp_av=0x7fffba6785f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffba6785e8) at libc-start.c:258 258 libc-start.c: No such file or directory. (gdb) cont Program received signal SIGSEGV, Segmentation fault. _dl_close (_map=0x600000002) at dl-close.c:757 757 dl-close.c: No such file or directory.
Program received signal SIGSEGV, Segmentation fault. _dl_close (_map=0x600000002) at dl-close.c:757 757 dl-close.c: No such file or directory. (gdb) bt #0 _dl_close (_map=0x600000002) at dl-close.c:757 #1 0x00002b3226c77176 in _dl_catch_error (objname=0xf0d3020, errstring=0xf0d3028, mallocedp=0xf0d3018, operate=0x2b322bbccfe0 <dlclose_doit>, args=0x600000002) at dl-error.c:178 #2 0x00002b322bbcd52f in _dlerror_run (operate=0x2b322bbccfe0 <dlclose_doit>, args=0x600000002) at dlerror.c:164 #3 0x00002b322bbcd00f in __dlclose (handle=0x600000002) at dlclose.c:48 #4 0x00002b322729be6c in ?? () from /sfw/cuda/5.5/lib64/libcublas.so.5.5 #5 0x00002b322729bf75 in ?? () from /sfw/cuda/5.5/lib64/libcublas.so.5.5 #6 0x00002b322b631d1d in __cxa_finalize (d=0x2b322a57f7d0) at cxa_finalize.c:56 #7 0x00002b32270f8ec6 in ?? () from /sfw/cuda/5.5/lib64/libcublas.so.5.5 #8 0x0000000000000018 in ?? () #9 0x00007fffb540e880 in ?? () #10 0x00007fffb540e9b0 in ?? () #11 0x00002b32272c3ca1 in ?? () from /sfw/cuda/5.5/lib64/libcublas.so.5.5 #12 0x00007fffb540e9b0 in ?? () #13 0x00002b3226c7792d in _dl_fini () at dl-fini.c:259 Backtrace stopped: previous frame inner to this frame (corrupt stack?)