Cuda-gdb hangs indefinitely on first cuda API call

We’ve experienced cuda-gdb hanging indefinitely and narrowed it down to the attached reproducible.

cuda_gdb_test.zip (2.8 MB)

The example just calls cudaFree(0). No functions in the other source files are actually called. Merely their presence in the compiled binary causes the problem.

The application runs fine on its own, but when cuda-gdb is attached the
first cuda API call hangs. This happens consistently and doesn’t appear
to be just a performance issue (it has been left running for over 12 hours).

Log and backtrace:

> cuda-gdb -ex=r test
Reading symbols from test...
Starting program: 
/home/mayurp/workspace/cuda_gdb_test/test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Starting...
Calling cudaFree
[Detaching after fork from child process 68616]
[New Thread 0x7fffef394700 (LWP 68620)]
[New Thread 0x7fffeeb93700 (LWP 68621)]
[New Thread 0x7fffee290700 (LWP 68622)]
^C
Thread 1 "test" received signal SIGINT, Interrupt.
0x00007ffff79c5cbd in sendmsg () from /lib64/libpthread.so.0
(cuda-gdb) bt
#0  0x00007ffff79c5cbd in sendmsg () from /lib64/libpthread.so.0
#1  0x00007ffff5791ea0 in ?? () from /lib64/libcuda.so.1
#2  0x00007ffff57969ff in ?? () from /lib64/libcuda.so.1
#3  0x00007ffff5797764 in ?? () from /lib64/libcuda.so.1
#4  0x00007ffff5797976 in ?? () from /lib64/libcuda.so.1
#5  0x00007ffff558e5eb in ?? () from /lib64/libcuda.so.1
#6  0x00007ffff558ebad in ?? () from /lib64/libcuda.so.1
#7  0x00007ffff562defa in ?? () from /lib64/libcuda.so.1
#8  0x00007ffff562e60b in ?? () from /lib64/libcuda.so.1
#9  0x0000000000442e54 in __cudart570 ()
#10 0x00000000004328ce in __cudart615 ()
#11 0x00000000004495e4 in __cudart544 ()
#12 0x000000000044dd4a in __cudart789 ()
#13 0x000000000044dfc4 in __cudart779 ()
#14 0x00000000004403ff in __cudart953 ()
#15 0x0000000000423cbe in __cudart494 ()
#16 0x0000000000463193 in cudaFree ()
#17 0x000000000040450e in main (argc=1, argv=0x7fffffffc7c8) at test.cpp:19

The issue appears to be related to the number of cuda device functions in the compiled binary (checked using cuobjdump -res-usage). The hang goes away when either:

  • Using __forceinline__ for the functions in material.h
  • Removing the first 2 functions from material.cpp

Notes

  • The code in this example is a subset of redner, an open source project: GitHub - BachiLi/redner: Differentiable rendering without approximation.
  • This particular example hangs when compiling with debug symbols (nvcc flags -g -G), however with a larger project the hang (the full redner source), the hang also occurs when only compiling with -lineinfo. I think this can be explained because of less inlining and therefore more functions in compiled debug binary.
  • This hang happens either when launching the executable with cuda-gdb or attaching after a cuda API call has been made.

Environment

I’ve reproduced the issue with multiple cuda toolkit and gcc versions on different machines:

  • Driver Version: 510.47.04
  • CentOS Linux release 7.9.2009
  • GPUs (on different machines)
    • RTX 6000
    • A6000
  • CUDA Toolkits:
    • 11.2.2
    • 11.6
    • 11.6.2
    • 11.8 (not compatible with the driver version, but just an additional data point)
  • GCC versions:
    • 4.8.5 20150623 (Red Hat 4.8.5-44)
    • 6.3.1 20170216 (Red Hat 6.3.1-3)
    • 9.3.1 20200408 (Red Hat 9.3.1-2)

There’s a precompiled binary test in the attachment incase this is useful.

Hi @mayurp
Thank you for the report! Could you try using recent Nvidia GPU driver (12.0+)?

This looks like a known issue which was fixed in CUDA 12.0

Hi @AKravets. Thanks for the quick reply.

My organization is unlikely to move to cuda 12 in the near term. Is there a workaround that would allow cuda-gdb to work for 510.47.04 drivers or perhaps newer drivers which still support cuda 11.x?

Is there a workaround that would allow cuda-gdb to work for 510.47.04 drivers

Unfortunately, no

or perhaps newer drivers which still support cuda 11.x?

CUDA applications, compiled for CUDA 11.* should work with CUDA 12 driver, so you can use the (e.g.) 11.8 binary with 12.1 driver.

I have upgraded the drivers to version 530.30.0 and no longer see the hang in the reproducible.

However, I still see an issue with cuda-gdb becoming unresponsive when debugging my full application.
There is a bug in the application causing it to hang. Once the hang happens I attach cuda-gdb. However, I am unable to use Ctrl-C to suspend the application:

> cuda-gdb -p 32749
Attaching to process 32749
[New LWP 336]
[New LWP 337]
[New LWP 338]
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007ffe505fb6c2 in clock_gettime ()
[New Thread 0x7f666378cc00 (LWP 10004)]
[New Thread 0x7f6240e9b700 (LWP 10005)]
[Detaching after fork from child process 10006]
[Thread 0x7f666378cc00 (LWP 10004) exited]
^C^C

Before the hang in the application occurs cuda-gdb seems to work as expected. I set kernel breakpoints, use Ctrl-C to suspend the program and get a backtrace.

It seems like there’s something specific about the application hang that causes the issue with cuda-gdb. I’ve tried rebuilding the app in a newer toolkit versions (upto 12.1) and using newer versions of cuda-gdb.

I see the same problem when launching the app with cuda-gdb.
I also sometimes (but not always) see the below logs in cuda-gdb when pressing Ctrl-C:

Thread 1 "python3" received signal SIGINT, Interrupt.

The application consists of a python wrapper which calls into a native shared library.

Unfortunately I don’t have any code I can share at the moment, but I am wondering if there are still any known issues that might explain what I’m seeing.

Thanks

Hi @mayurp thank you for the update!
To further investigate the hang with 530 could you collect additional logs?

  • Add NVLOG_CONFIG_FILE variable pointing the nvlog.config file (attached). E.g.: NVLOG_CONFIG_FILE=${HOME}/nvlog.config
  • Run the debugging session.
  • You should see the /tmp/debugger.log file created - could you share it with us?

Hi @AKravets,

Here you go. These are the logs when using cuda toolkit 11.8 for building and debugging the app.

debugger_11.8.log (1.5 MB)

Thanks,

Mayur

Hi @mayurp
Thank you for the logs! We are investigating the issue.

Hi @mayurp, sorry for the long dealy.
We suspect that the attach problem might be caused by the application hang (i.e. cuda-gdb makes some CUDA calls when attaching to the application, so if CUDA is in the locked state we cannot attach).

If you still has the setup, could you please try the following:

  • When the application hangs, attach to it using normal GDB
  • Collect backtraces:
thread apply all bt
  • If possible share the output of the command above.