Cuda-gdb run failed, but gdb run success

Hi,
Here is my “Segmentation fault” stack below:

#0  0x00007fff5b0e7f64 in ?? () from /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1
#1  0x00007fff5b0e8c26 in InitializeInjection () from /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1
#2  0x00007fffb2b7efde in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007fffb28db91c in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007fffb2994210 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5  0x00007fffb29ef408 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6  0x00007fffef0813f8 in __cudart514 () from /home/ckt1010/adlab/output/lib/libailice_v3.so.5.5
#7  0x00007fffef081488 in __cudart1325 () from /home/ckt1010/adlab/output/lib/libailice_v3.so.5.5
#8  0x00007fffb02654df in __pthread_once_slow () from /usr/lib/x86_64-linux-gnu/libpthread.so.0
#9  0x00007fffef0cb379 in __cudart1597 () from /home/ckt1010/adlab/output/lib/libailice_v3.so.5.5
#10 0x00007fffef078237 in __cudart512 () from /home/ckt1010/adlab/output/lib/libailice_v3.so.5.5
#11 0x00007fffef0a4fb1 in cudaFree () from /home/ckt1010/adlab/output/lib/libailice_v3.so.5.5
#12 0x00007fffef011d2a in ailice_v3::initializeAiliceV3(ailice_v3::AiliceV3InitArgs const&) ()
   from /home/ckt1010/adlab/output/lib/libailice_v3.so.5.5
#13 0x0000555555d63fb9 in adlab::prediction::main (argc=1, argv=0x7fffffffcca8)
    at modules/prediction/framework/main.cc:346
#14 0x0000555555d6538e in main (argc=1, argv=0x7fffffffcca8) at modules/prediction/framework/main.cc:425

But when I use gdb or run directly it work fine.
CUDA version: 11.4
Thanks!
BR/Tim

Moved to cuda-gdb forum.

Hi @ckt1010
Thank you for your report! To help us identify the issue, could you provide a few more details about your environment:

  • Output of the nvidia-smi command
  • Output of the cuda-gdb --version command
  • Re-run the debugging scenario with additional logging enabled:
    • Add NVLOG_CONFIG_FILE variable pointing the nvlog.config file (attached). E.g.: NVLOG_CONFIG_FILE=${HOME}/nvlog.config
      nvlog.config (539 Bytes)

    • Run the debugging session.

    • You should see the /tmp/debugger.log file created - could you share it with us?

Hi AKravets,
Thanks for reply.
nvidia-smi:

Tue Oct 24 09:46:34 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   53C    P8    37W / 300W |   2822MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

cuda-gdb --version:

NVIDIA (R) CUDA Debugger
11.4 release
Portions Copyright (C) 2007-2021 NVIDIA Corporation
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

/tmp/debugger.log

09:58:01:709|inf|50|1196349|dbg_cuda            |                                                            - InitializeInjection

Your cuda-gdb (distributed as a part of CUDA toolkit) seems to be too old for your GPU driver (you are using 11.4 toolkit with 12.0 driver). Could you try installing 12.0 CUDA toolkit: https://developer.nvidia.com/cuda-12-0-0-download-archive ?

Hi AKravets,
But I need use libtorch 1.10 which is fixed with cuda 11.4, so I can’t upgrade my cuda. Is there any other way to fix it?

Hi! You just need to use cuda-gdb binary from 12.0 toolkit. You should be able to install it next to the existing 11.4 toolkit. This link might help: Installing additional CUDA versions

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.