We’ve experienced cuda-gdb
hanging indefinitely and narrowed it down to the attached reproducible.
cuda_gdb_test.zip (2.8 MB)
The example just calls cudaFree(0)
. No functions in the other source files are actually called. Merely their presence in the compiled binary causes the problem.
The application runs fine on its own, but when cuda-gdb is attached the
first cuda API call hangs. This happens consistently and doesn’t appear
to be just a performance issue (it has been left running for over 12 hours).
Log and backtrace:
> cuda-gdb -ex=r test
Reading symbols from test...
Starting program:
/home/mayurp/workspace/cuda_gdb_test/test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Starting...
Calling cudaFree
[Detaching after fork from child process 68616]
[New Thread 0x7fffef394700 (LWP 68620)]
[New Thread 0x7fffeeb93700 (LWP 68621)]
[New Thread 0x7fffee290700 (LWP 68622)]
^C
Thread 1 "test" received signal SIGINT, Interrupt.
0x00007ffff79c5cbd in sendmsg () from /lib64/libpthread.so.0
(cuda-gdb) bt
#0 0x00007ffff79c5cbd in sendmsg () from /lib64/libpthread.so.0
#1 0x00007ffff5791ea0 in ?? () from /lib64/libcuda.so.1
#2 0x00007ffff57969ff in ?? () from /lib64/libcuda.so.1
#3 0x00007ffff5797764 in ?? () from /lib64/libcuda.so.1
#4 0x00007ffff5797976 in ?? () from /lib64/libcuda.so.1
#5 0x00007ffff558e5eb in ?? () from /lib64/libcuda.so.1
#6 0x00007ffff558ebad in ?? () from /lib64/libcuda.so.1
#7 0x00007ffff562defa in ?? () from /lib64/libcuda.so.1
#8 0x00007ffff562e60b in ?? () from /lib64/libcuda.so.1
#9 0x0000000000442e54 in __cudart570 ()
#10 0x00000000004328ce in __cudart615 ()
#11 0x00000000004495e4 in __cudart544 ()
#12 0x000000000044dd4a in __cudart789 ()
#13 0x000000000044dfc4 in __cudart779 ()
#14 0x00000000004403ff in __cudart953 ()
#15 0x0000000000423cbe in __cudart494 ()
#16 0x0000000000463193 in cudaFree ()
#17 0x000000000040450e in main (argc=1, argv=0x7fffffffc7c8) at test.cpp:19
The issue appears to be related to the number of cuda device functions in the compiled binary (checked using cuobjdump -res-usage
). The hang goes away when either:
- Using
__forceinline__
for the functions inmaterial.h
- Removing the first 2 functions from
material.cpp
Notes
- The code in this example is a subset of
redner
, an open source project: GitHub - BachiLi/redner: Differentiable rendering without approximation. - This particular example hangs when compiling with debug symbols (
nvcc
flags-g -G
), however with a larger project the hang (the full redner source), the hang also occurs when only compiling with-lineinfo
. I think this can be explained because of less inlining and therefore more functions in compiled debug binary. - This hang happens either when launching the executable with cuda-gdb or attaching after a cuda API call has been made.
Environment
I’ve reproduced the issue with multiple cuda toolkit and gcc versions on different machines:
- Driver Version: 510.47.04
- CentOS Linux release 7.9.2009
- GPUs (on different machines)
- RTX 6000
- A6000
- CUDA Toolkits:
- 11.2.2
- 11.6
- 11.6.2
- 11.8 (not compatible with the driver version, but just an additional data point)
- GCC versions:
- 4.8.5 20150623 (Red Hat 4.8.5-44)
- 6.3.1 20170216 (Red Hat 6.3.1-3)
- 9.3.1 20200408 (Red Hat 9.3.1-2)
There’s a precompiled binary test
in the attachment incase this is useful.