On a Linux server, when I try to attach a running process with cuda-gdb, I would receive such an error after lots of reading symbol from a .so lines.
But the cuda-gdb seems attached to the target process, however it losed all things as a CUDA gdb instead of a vanilla gdb. I can’t inspect anything related to CUDA, but I can still look into CPU-side things like threads, as the following figure shown.
And this error only appears when attach to a running process. If I run a process with cuda-gdb,
cuda-gdb python myscript.py
then it just works and I could inspect things like cuda contexts.
Info about my server:
- OS: Ubuntu 20.04
- Kernel: Linux 5.4.0-169-generic
- Driver version: 545.23.08
- Device: NVIDIA H100
If any additional information is required for diagnosing, please let me know, thanks in advance!
Hi @odyssey471
To help us identify the issue, could you provide a few more details about your environment:
- Please share the complete
cuda-gdb output when attempting to attach.
- Please re-run the attach with additional logging enabled:
- Add
NVLOG_CONFIG_FILE variable pointing the nvlog.config file (attached). E.g.: NVLOG_CONFIG_FILE=${HOME}/nvlog.config
nvlog.config (539 Bytes)
- Start the application.
- Run the cuda-gdb attach command.
- Wait for it to fail.
- You should see the
/tmp/debugger.log file created - could you share it with us?
1 Like
Hi @AKravets , thanks for the detailed instruction!
I tried to grab the additional logging like this:
python myscript.py &
NVLOG_CONFIG_FILE=./nvlog.config cuda-gdb -p <PID>
after cuda-gdb failed, I typed quit, and there is no /tmp/debugger.log
Are there any mistakes of my operations?
Hi @odyssey471,
Please also provide the same environment variable to the process you are debugging:
export NVLOG_CONFIG_FILE=`pwd`/nvlog.config
python myscript.py &
cuda-gdb -p <PID>
1 Like
Hi, @AKravets
I tried this and still nothing happened. Besides, I also tried add UseStdout to the nvlog.config, and got no additional output too.
Ok, no problem. Could you just share the full log of the cuda-gdb -p <PID> command?
ok I would try to give a version without sensitive information to conform the data privacy policy of my company.
Hi @AKravets
By using newer driver (570.172.08) and newer cuda-gdb (12.0, previous version is 10.1 from Ubuntu 20.04 apt sources), I solved this issue.
Thanks for your help!