Cuda-gdb crashes upon attaching to a crashed process

CUDA Exception: Warp Illegal Address

cuda-gdb has received a SIGSEGV and will attempt to get its own backtrace.

   cuda-gdb| segv_handler() +0x4a

…pthread.so.0| ???
cuda-gdb| fputs_maybe_filtered() +0xa0
cuda-gdb| fputs_styled() +0xa3
cuda-gdb| cli_ui_out::do_field_string() +0xc8
cuda-gdb| ui_out::field_string() +0x4b
cuda-gdb| print_exception_origin() +0x288
cuda-gdb| cuda_nat_linux<amd64_linux_nat_target>::wait() +0x34d
cuda-gdb| thread_db_target::wait() +0x3e
cuda-gdb| target_wait() +0x34
cuda-gdb| do_target_wait() +0x6d0
cuda-gdb| wait_for_inferior() +0xd6
cuda-gdb| cuda_nat_attach() +0x2be
cuda-gdb| attach_post_wait() +0x34
cuda-gdb| do_all_inferior_continuations() +0x35
cuda-gdb| inferior_event_handler() +0x30
cuda-gdb| fetch_inferior_event() +0x426
cuda-gdb| gdb_wait_for_event() +0x4e5
cuda-gdb| gdb_do_one_event() +0x47
cuda-gdb| wait_sync_command_done() +0x1c
cuda-gdb| catch_command_errors() +0x3f
cuda-gdb| gdb_main() +0xd49
cuda-gdb| main() +0x25
…64/libc.so.6| __libc_start_main() +0xf5
cuda-gdb| _start() +0x29

±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla PG500-216 On | 00000000:3B:00.0 Off | 0 |
| N/A 46C P0 44W / 250W | 22979MiB / 32510MiB | 100% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Tesla PG500-216 On | 00000000:61:00.0 Off | 0 |
| N/A 45C P0 39W / 250W | 22979MiB / 32510MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 Tesla PG500-216 On | 00000000:86:00.0 Off | 0 |
| N/A 45C P0 37W / 250W | 22979MiB / 32510MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 Tesla PG500-216 On | 00000000:DB:00.0 Off | 0 |
| N/A 45C P0 38W / 250W | 22985MiB / 32510MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 194822 C …202211041153/jre/bin/java 22963MiB |
| 1 N/A N/A 194822 C …202211041153/jre/bin/java 22963MiB |
| 2 N/A N/A 194822 C …202211041153/jre/bin/java 22963MiB |
| 3 N/A N/A 194822 C …202211041153/jre/bin/java 22969MiB |
±----------------------------------------------------------------------------+

cuda-gdb -v
NVIDIA (R) CUDA Debugger
11.2 release
Portions Copyright (C) 2007-2020 NVIDIA Corporation
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

I think this only happens on optimized code, i.e. when i add -G, i don’t seem to see it anymore.

Hi @jacek.tomaka
I see that you are using CUDA 11.4, but CUDA Toolkit 11.2. Could you try running the same scenario with cuda-gdb 11.4 (CUDA Toolkit 11.4 Downloads | NVIDIA Developer )?

Also, could you provide more details about the process, you are trying to attach:

  • Is it crashing in CPU code or in GPU code?
  • Could you paste the full cuda-gdb attach log?