cleanup bug in cufft 5.5, can segfault

I experience segfaults in the linux cufft library in CUDA 5.5, but not in 5.0. My use case is linking against libcufft, but not actually ending up using it. A segfault then occurs after main(), as part of the libcufft teardown.

Although an actual segfault is hard to trigger in a small example, the illegal memory access does show up in valgrind. First, the code:

// cufft-bug.cc

#include <cufft.h>

// force g++ to link cufft: we reference cufft, but don't use it.
void *test = (void*)cufftPlan1d;

int main() {
}

Build and run with valgrind. Change the paths to yours of course:

CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
CUDA_DRIVER_LIB_DIR=/usr/lib/nvidia-current

# Note: -lcudart is required in CUDA 5.0, but not in CUDA 5.5
g++ cufft-bug.cc -o cufft-bug -I$CUDA_TOOLKIT_ROOT_DIR/include \
-L$CUDA_DRIVER_LIB_DIR -L$CUDA_TOOLKIT_ROOT_DIR/lib64 \
-lcuda -lcufft -lcudart &&

LD_LIBRARY_PATH=$CUDA_TOOLKIT_ROOT_DIR/lib64:$CUDA_DRIVER_LIB_DIR \
valgrind --track-origins=yes ./cufft-bug

Results in the following for me when using CUDA 5.5:

==10777== Conditional jump or move depends on uninitialised value(s)
==10777==    at 0x51940D8: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x5194204: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x9992D1C: __cxa_finalize (cxa_finalize.c:56)
==10777==    by 0x4E9EEB5: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x51BBF30: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x9992900: __run_exit_handlers (exit.c:78)
==10777==    by 0x9992984: exit (exit.c:100)
==10777==    by 0x9978773: (below main) (libc-start.c:258)
==10777==  Uninitialised value was created by a heap allocation
==10777==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10777==    by 0x5192DDA: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0xA2273FF: pthread_once (pthread_once.S:104)
==10777==    by 0x51BA908: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x5192D94: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x51BBF15: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x4E9E602: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)

The problem seems to be in the initialisation/destruction (order) of static variables, because the error is triggered in the __cxa_atexit mechanism.

My setup:

  • OS: Ubuntu 12.04 (Linux cbt005 3.2.0-48-generic #74-Ubuntu SMP Thu Jun 6 19:43:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)
  • CUDA Toolkit: 5.5
  • SDK: none
  • Host compiler: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
  • System: Dual Xeon E5-2660 @ 2.2 GHz, 256 GB RAM, Dell T620, 2x K10, Intel X79 chipset

Note that I omitted the cuInit(0) call, as it does not seem required to trigger the bug. It can be added, but doing so increases the run time of valgrind significantly, which I found annoying.

(Also, programs ought to be able to run correctly without calling any CUDA functions, of course).

Please file a bug report using the form linked from the registered developer website. Thank you for your help.