cleanup bug in cufft 5.5, can segfault

jjdmol · July 24, 2013, 8:51pm

I experience segfaults in the linux cufft library in CUDA 5.5, but not in 5.0. My use case is linking against libcufft, but not actually ending up using it. A segfault then occurs after main(), as part of the libcufft teardown.

Although an actual segfault is hard to trigger in a small example, the illegal memory access does show up in valgrind. First, the code:

// cufft-bug.cc

#include <cufft.h>

// force g++ to link cufft: we reference cufft, but don't use it.
void *test = (void*)cufftPlan1d;

int main() {
}

Build and run with valgrind. Change the paths to yours of course:

CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
CUDA_DRIVER_LIB_DIR=/usr/lib/nvidia-current

# Note: -lcudart is required in CUDA 5.0, but not in CUDA 5.5
g++ cufft-bug.cc -o cufft-bug -I$CUDA_TOOLKIT_ROOT_DIR/include \
-L$CUDA_DRIVER_LIB_DIR -L$CUDA_TOOLKIT_ROOT_DIR/lib64 \
-lcuda -lcufft -lcudart &&

LD_LIBRARY_PATH=$CUDA_TOOLKIT_ROOT_DIR/lib64:$CUDA_DRIVER_LIB_DIR \
valgrind --track-origins=yes ./cufft-bug

Results in the following for me when using CUDA 5.5:

==10777== Conditional jump or move depends on uninitialised value(s)
==10777==    at 0x51940D8: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x5194204: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x9992D1C: __cxa_finalize (cxa_finalize.c:56)
==10777==    by 0x4E9EEB5: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x51BBF30: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x9992900: __run_exit_handlers (exit.c:78)
==10777==    by 0x9992984: exit (exit.c:100)
==10777==    by 0x9978773: (below main) (libc-start.c:258)
==10777==  Uninitialised value was created by a heap allocation
==10777==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10777==    by 0x5192DDA: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0xA2273FF: pthread_once (pthread_once.S:104)
==10777==    by 0x51BA908: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x5192D94: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x51BBF15: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)
==10777==    by 0x4E9E602: ??? (in /usr/local/cuda-5.5/targets/x86_64-linux/lib/libcufft.so.5.5.11)

The problem seems to be in the initialisation/destruction (order) of static variables, because the error is triggered in the __cxa_atexit mechanism.

My setup:

OS: Ubuntu 12.04 (Linux cbt005 3.2.0-48-generic #74-Ubuntu SMP Thu Jun 6 19:43:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)
CUDA Toolkit: 5.5
SDK: none
Host compiler: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
System: Dual Xeon E5-2660 @ 2.2 GHz, 256 GB RAM, Dell T620, 2x K10, Intel X79 chipset

jjdmol · July 24, 2013, 9:08pm

Note that I omitted the cuInit(0) call, as it does not seem required to trigger the bug. It can be added, but doing so increases the run time of valgrind significantly, which I found annoying.

(Also, programs ought to be able to run correctly without calling any CUDA functions, of course).

njuffa · July 24, 2013, 11:48pm

Please file a bug report using the form linked from the registered developer website. Thank you for your help.

Topic		Replies	Views
Errors and Lockups CUDA Programming and Performance	7	5029	September 19, 2008
load dynalic library use CUDA at run time cause segfalt CUDA Programming and Performance	6	13214	June 22, 2012
[SOLVED] Segfault on RHEL 6.10 (compiled with CUDA 9.1 and static linkage, runs on Ubuntu 16.04) CUDA Programming and Performance	6	713	December 19, 2018
CUFFT_INTERNAL_ERROR while running cufftPlan1d GPU-Accelerated Libraries	9	9354	January 5, 2021
cuda-memcheck failed on cufft library GPU-Accelerated Libraries	8	2188	December 29, 2020
Trivial cuFFT causes cuda-memcheck errors on RTX 2070 SUPER CUDA Programming and Performance	1	921	July 7, 2020
possible .so unload bug CUDA Programming and Performance	2	8021	July 6, 2007
Memory leak in cuFFT (cuda 5.0)? GPU-Accelerated Libraries	8	3584	January 27, 2013
Unspecified launch failure CUDA Programming and Performance	6	10778	April 30, 2013
Cufft bug with prime factor 101? GPU-Accelerated Libraries cufft	3	792	May 2, 2022

cleanup bug in cufft 5.5, can segfault

Related topics