I experience segfaults in the linux cufft library in CUDA 5.5, but not in 5.0. My use case is linking against libcufft, but not actually ending up using it. A segfault then occurs after main(), as part of the libcufft teardown.
Although an actual segfault is hard to trigger in a small example, the illegal memory access does show up in valgrind. First, the code:
// cufft-bug.cc
#include <cufft.h>
// force g++ to link cufft: we reference cufft, but don't use it.
void *test = (void*)cufftPlan1d;
int main() {
}
Build and run with valgrind. Change the paths to yours of course:
CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
CUDA_DRIVER_LIB_DIR=/usr/lib/nvidia-current
# Note: -lcudart is required in CUDA 5.0, but not in CUDA 5.5
g++ cufft-bug.cc -o cufft-bug -I$CUDA_TOOLKIT_ROOT_DIR/include \
-L$CUDA_DRIVER_LIB_DIR -L$CUDA_TOOLKIT_ROOT_DIR/lib64 \
-lcuda -lcufft -lcudart &&
LD_LIBRARY_PATH=$CUDA_TOOLKIT_ROOT_DIR/lib64:$CUDA_DRIVER_LIB_DIR \
valgrind --track-origins=yes ./cufft-bug
Results in the following for me when using CUDA 5.5:
==10777== Conditional jump or move depends on uninitialised value(s)
==10777== at 0x51940D8: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777== by 0x5194204: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777== by 0x9992D1C: __cxa_finalize (cxa_finalize.c:56)
==10777== by 0x4E9EEB5: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777== by 0x51BBF30: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777== by 0x9992900: __run_exit_handlers (exit.c:78)
==10777== by 0x9992984: exit (exit.c:100)
==10777== by 0x9978773: (below main) (libc-start.c:258)
==10777== Uninitialised value was created by a heap allocation
==10777== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10777== by 0x5192DDA: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777== by 0xA2273FF: pthread_once (pthread_once.S:104)
==10777== by 0x51BA908: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777== by 0x5192D94: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777== by 0x51BBF15: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777== by 0x4E9E602: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
The problem seems to be in the initialisation/destruction (order) of static variables, because the error is triggered in the __cxa_atexit mechanism.
My setup:
- OS: Ubuntu 12.04 (Linux cbt005 3.2.0-48-generic #74-Ubuntu SMP Thu Jun 6 19:43:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)
- CUDA Toolkit: 5.5
- SDK: none
- Host compiler: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
- System: Dual Xeon E5-2660 @ 2.2 GHz, 256 GB RAM, Dell T620, 2x K10, Intel X79 chipset