How can I add symbols information when debugging a GPU core dump with cuda-gdb?

I got a device core dump file, with no symbols in it, as -G was not used when compiling.
I want to add symbol information to cuda-gdb to make it possible to find the bugs (the bug cannot be reproduced easily).
I found that there’s a --symbols option for cuda-gdb. I tried recompiling the program with -G, and ran cuda-gdb with --symbols=my-new-progrom, and loaded the GPU code by command:

(cuda-gdb) target cudacore core_xxxx.nvcudmp

cuda-gdb hung when I tried to print global memory with symbols, like:

(cuda-gdb) p (xxx::some_type *) 0x7f648670d600

Any idea?
The version of cuda-gdb is 11.4.

Hi @heibaidaolx123
Thank you for your report! To help us identify the issue could you clarify a few things:

  • Can you print the same global memory without adding the symbols?
p/x 0x7f648670d600
  • What GPU are you using? Could you paste the nvidia-smi output?
  • How did you obtain the address?

@AKravets

  • Can you print the same global memory without adding the symbols?
(cuda-gdb) p/x 0x7f648670d600
$1 = 0x7f648670d600
  • What GPU are you using? Could you paste the nvidia-smi output?
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A40          On   | 00000000:01:00.0 Off |                    0 |
|  0%   45C    P0   118W / 300W |  28439MiB / 45634MiB |     74%      Default |
|                               |                      |                  N/A |
  • How did you obtain the address?

I got the address by CPU core dump.

Hi @heibaidaolx123,
Unfortunately the use-case you are describing is not supported. When re-compiling the program with -G flag the compiler also disables some of the optimization passes, which results in a different binary generated (not counting the debug information), so it’s not possible to use symbols from binary, compiler with -G option for the binary, compiled without it.

--device-debug                             (-G)                              
        Generate debug information for device code. Turns off all optimizations.
        Don't use for profiling; use -lineinfo instead.

You would have to generate coredump for the program built with -G.