Attach failed due to the internal driver error 0x414300000002e with cuda-gdb

On a Linux server, when I try to attach a running process with cuda-gdb, I would receive such an error after lots of reading symbol from a .so lines.

But the cuda-gdb seems attached to the target process, however it losed all things as a CUDA gdb instead of a vanilla gdb. I can’t inspect anything related to CUDA, but I can still look into CPU-side things like threads, as the following figure shown.

And this error only appears when attach to a running process. If I run a process with cuda-gdb,

cuda-gdb python myscript.py

then it just works and I could inspect things like cuda contexts.

Info about my server:

  • OS: Ubuntu 20.04
  • Kernel: Linux 5.4.0-169-generic
  • Driver version: 545.23.08
  • Device: NVIDIA H100

If any additional information is required for diagnosing, please let me know, thanks in advance!

Hi @odyssey471
To help us identify the issue, could you provide a few more details about your environment:

  • Please share the complete cuda-gdb output when attempting to attach.
  • Please re-run the attach with additional logging enabled:
    • Add NVLOG_CONFIG_FILE variable pointing the nvlog.config file (attached). E.g.: NVLOG_CONFIG_FILE=${HOME}/nvlog.config
      nvlog.config (539 Bytes)
    • Start the application.
    • Run the cuda-gdb attach command.
    • Wait for it to fail.
    • You should see the /tmp/debugger.log file created - could you share it with us?
1 Like

Hi @AKravets , thanks for the detailed instruction!

I tried to grab the additional logging like this:

python myscript.py &
NVLOG_CONFIG_FILE=./nvlog.config cuda-gdb -p <PID>

after cuda-gdb failed, I typed quit, and there is no /tmp/debugger.log

Are there any mistakes of my operations?

Hi @odyssey471,

Please also provide the same environment variable to the process you are debugging:

export NVLOG_CONFIG_FILE=`pwd`/nvlog.config
python myscript.py &
cuda-gdb -p <PID>
1 Like

Hi, @AKravets

I tried this and still nothing happened. Besides, I also tried add UseStdout to the nvlog.config, and got no additional output too.

Ok, no problem. Could you just share the full log of the cuda-gdb -p <PID> command?

ok I would try to give a version without sensitive information to conform the data privacy policy of my company.

Hi @AKravets

By using newer driver (570.172.08) and newer cuda-gdb (12.0, previous version is 10.1 from Ubuntu 20.04 apt sources), I solved this issue.

Thanks for your help!