cuda-gdb hanging indefinitely and narrowed it down to the attached reproducible.
cuda_gdb_test.zip (2.8 MB)
The example just calls
cudaFree(0). No functions in the other source files are actually called. Merely their presence in the compiled binary causes the problem.
The application runs fine on its own, but when cuda-gdb is attached the
first cuda API call hangs. This happens consistently and doesn’t appear
to be just a performance issue (it has been left running for over 12 hours).
Log and backtrace:
> cuda-gdb -ex=r test Reading symbols from test... Starting program: /home/mayurp/workspace/cuda_gdb_test/test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Starting... Calling cudaFree [Detaching after fork from child process 68616] [New Thread 0x7fffef394700 (LWP 68620)] [New Thread 0x7fffeeb93700 (LWP 68621)] [New Thread 0x7fffee290700 (LWP 68622)] ^C Thread 1 "test" received signal SIGINT, Interrupt. 0x00007ffff79c5cbd in sendmsg () from /lib64/libpthread.so.0 (cuda-gdb) bt #0 0x00007ffff79c5cbd in sendmsg () from /lib64/libpthread.so.0 #1 0x00007ffff5791ea0 in ?? () from /lib64/libcuda.so.1 #2 0x00007ffff57969ff in ?? () from /lib64/libcuda.so.1 #3 0x00007ffff5797764 in ?? () from /lib64/libcuda.so.1 #4 0x00007ffff5797976 in ?? () from /lib64/libcuda.so.1 #5 0x00007ffff558e5eb in ?? () from /lib64/libcuda.so.1 #6 0x00007ffff558ebad in ?? () from /lib64/libcuda.so.1 #7 0x00007ffff562defa in ?? () from /lib64/libcuda.so.1 #8 0x00007ffff562e60b in ?? () from /lib64/libcuda.so.1 #9 0x0000000000442e54 in __cudart570 () #10 0x00000000004328ce in __cudart615 () #11 0x00000000004495e4 in __cudart544 () #12 0x000000000044dd4a in __cudart789 () #13 0x000000000044dfc4 in __cudart779 () #14 0x00000000004403ff in __cudart953 () #15 0x0000000000423cbe in __cudart494 () #16 0x0000000000463193 in cudaFree () #17 0x000000000040450e in main (argc=1, argv=0x7fffffffc7c8) at test.cpp:19
The issue appears to be related to the number of cuda device functions in the compiled binary (checked using
cuobjdump -res-usage). The hang goes away when either:
__forceinline__for the functions in
- Removing the first 2 functions from
- The code in this example is a subset of
redner, an open source project: GitHub - BachiLi/redner: Differentiable rendering without approximation.
- This particular example hangs when compiling with debug symbols (
-g -G), however with a larger project the hang (the full redner source), the hang also occurs when only compiling with
-lineinfo. I think this can be explained because of less inlining and therefore more functions in compiled debug binary.
- This hang happens either when launching the executable with cuda-gdb or attaching after a cuda API call has been made.
I’ve reproduced the issue with multiple cuda toolkit and gcc versions on different machines:
- Driver Version: 510.47.04
- CentOS Linux release 7.9.2009
- GPUs (on different machines)
- RTX 6000
- CUDA Toolkits:
- 11.8 (not compatible with the driver version, but just an additional data point)
- GCC versions:
- 4.8.5 20150623 (Red Hat 4.8.5-44)
- 6.3.1 20170216 (Red Hat 6.3.1-3)
- 9.3.1 20200408 (Red Hat 9.3.1-2)
There’s a precompiled binary
test in the attachment incase this is useful.