Cuda-gdb hangs, any method to know why?

compile the executable megdnn_test based on cmake managed project, i have set -g -G to nvcc flags and have read cuda-gdb docs, but cuda-gdb hangs.
is there any methods to know why or how to debug the hangs problem? or would i supply more information about this issue?

(base) tangke@jy-apu-engine-test-ba220200319:~/MegBrain/build/dnn/test$ cuda-gdb megdnn_test
NVIDIA (R) CUDA Debugger
11.1 release
Portions Copyright (C) 2007-2020 NVIDIA Corporation
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
On the next run, the CUDA kernel launches will be blocking.
--Type <RET> for more, q to quit, c to continue without paging--
Reading symbols from megdnn_test...
(cuda-gdb) r --gtest_filter="CUDA.REGION*FORWARD*"
Starting program: /home/tangke/MegBrain/build/dnn/test/megdnn_test --gtest_filter="CUDA.REGION*FORWARD*"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Note: Google Test filter = CUDA.REGION*FORWARD*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from CUDA
[ RUN      ] CUDA.REGION_RESTRICTED_CONV_FORWARD_LARGE_FILTER
[Detaching after fork from child process 59976]
[New Thread 0x7fffaa8c3700 (LWP 59986)]
[New Thread 0x7fffaa0c2700 (LWP 59987)]
[Context Create of context 0x55557076be70 on Device 0]
[New Thread 0x7fffa97bf700 (LWP 59988)]

Hi @ijpq,
Thank you for your report!

I see that you are using old version of CUDA and CUDA Toolkit (11.1) - could you try running the same test using latest CUDA 12.0: https://developer.nvidia.com/cuda-downloads

Also could you share the output of the nvidia-smi command on your machine.

this project haven’t support cuda12.0 build toolchain. is there any method to analysis the problem?

the nvidia-smi output: (with the following envs:export CUDA_DEVICE_ORDER='PCI_BUS_ID' export CUDA_VISIBLE_DEVICES='4')

or may cuda toolkit version compatibility should i check first in carefully? is it a crucial point?

Hi!

I see that you are using CUDA Driver 11.7, so you can:

Binaries built for older CUDA versions can still be run / debugger on a newer CUDA Driver, so you can:

  • Build your application using CUDA 11.1 (or the latest compilation toolchain your project supports)
  • Run and debug using CUDA 12.0 driver and toolkit.

Since CUDA 11.1 Toolkit is no longer supported we will not be releasing any updates to it (so would not be able to fix the issue, you encounter, in that toolkit version)

sorry for reply late.
i have tried debugging through cuda-gdb 11.7, and the problem solved!

it takes 3mins to show echo approximately with no hangs, thanks a lot.

it’s important to keep driver and toolkit version in consistency, but i dismiss the key point.

Glad it worked for you! Thank you for letting us know!