Single stepping until exit from function, which has no line number information

andriy.roshchenko · March 8, 2024, 6:25pm

I try to debug two kernels in the same source file. One kernel I can successfully enter and print variables’ values. Another kernel reports: Single stepping until exit from function _ZN3gpu7kernels27modify_impl_inverses_kernelILi8EdjjEEvPPT0_PKS2_PKT2_S9_T1_S7_S7_, which has no line number information.

I can reproduce this problem on two servers with Ubuntu 22.04. My colleague has similar setup with me on similar machine with Ubuntu 22.04 but the same build debugs with no problems for them.
On the server with RH8 everything works.

andriy@server:/sandbox$ cuda-gdb --args   ../out/build/sim-sr2-debug/tests/test_drs_scaling  --gtest_filter=DRSScaling.TestDRSxMXSPR024
NVIDIA (R) CUDA Debugger
CUDA Toolkit 12.3 release
Portions Copyright (C) 2007-2023 NVIDIA Corporation
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../out/build/sim-sr2-debug/tests/test_drs_scaling...
(cuda-gdb) b aim_device_kernels.cuh:29
Breakpoint 1 at 0x107730: file /cmg/andriy/repos/gpusol.lib/src/../include/aim_device_kernels.cuh, line 36.
(cuda-gdb) b aim_device_kernels.cuh:196
Breakpoint 2 at 0x10e6e2: aim_device_kernels.cuh:196. (6 locations)
(cuda-gdb) run
Starting program: /cmg/andriy/repos/gpusol.lib/out/build/sim-sr2-debug/tests/test_drs_scaling --gtest_filter=DRSScaling.TestDRSxMXSPR024
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Running main() from /cmg/andriy/repos/gpusol.lib/external/googletest/googletest/src/gtest_main.cc
Note: Google Test filter = DRSScaling.TestDRSxMXSPR024
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from DRSScaling
[ RUN      ] DRSScaling.TestDRSxMXSPR024
[New Thread 0x7fffb1f7b000 (LWP 315462)]
[New Thread 0x7fffb0c96000 (LWP 315463)]
[Detaching after fork from child process 315464]
AMGX version 2.4.0
Built on Feb 27 2024, 16:40:57
Compiled with CUDA Runtime 12.3, using CUDA driver 12.3

Reading system from file: /cmg/andriy/repos/gpusol.lib/sandbox/../matrices/IMEX/mxspr024[it=4 ncyc=1].hdf
[New Thread 0x7fffaafde000 (LWP 315476)]
Device ID: 0 -- Compute Capability : 9.0
                Number of SMs: 132; Warp Size: 32
Device ID: 0 -- Compute Capability : 9.0
                Number of SMs: 132; Warp Size: 32
[Switching focus to CUDA kernel 0, grid 9, block (0,0,0), thread (0,0,0), device 0, sm 128, warp 0, lane 0]

Thread 1 "test_drs_scalin" hit Breakpoint 1, gpu::kernels::fill_eqtrow<unsigned int, unsigned int><<<(3696,1,1),(32,1,1)>>> (rhs_offsets=0x7ffd83e02a00, num_rows=123, eqtrow=0x7ffd83e04c00) at /cmg/andriy/repos/gpusol.lib/src/../include/aim_device_kernels.cuh:29
29	        auto stride = blockDim.x * gridDim.x;
(cuda-gdb) n
31	        for (auto isol = index; isol < num_rows; isol += stride) {
(cuda-gdb) continue
Continuing.
[Switching focus to CUDA kernel 1, grid 19, block (0,0,0), thread (0,0,0), device 0, sm 128, warp 0, lane 0]

Thread 1 "test_drs_scalin" hit Breakpoint 1, gpu::kernels::fill_eqtrow<unsigned int, unsigned int><<<(3696,1,1),(32,1,1)>>> (rhs_offsets=0x7ffd83e16e00, num_rows=123, eqtrow=0x7ffd83e19000) at /cmg/andriy/repos/gpusol.lib/src/../include/aim_device_kernels.cuh:29
29	        auto stride = blockDim.x * gridDim.x;
(cuda-gdb) continue
Continuing.

Total number of unknowns = 607

SPMV Diag Kernel: MAX_BLOCKS = 4 NUM_BLOCKS = 38 THREADS_PER_BLOCK = 128
SPMV Diag Kernel: EQUATIONS_PER_BLOCK = 16 THREADS_PER_EQUATION = 8
SPMV Diag Kernel: Launched blocks of size 128. Theoretical occupancy: 0.250000
SPMV Kernel: ROWS_PER_BLOCK = 25 THREADS_PER_ROW = 5
SPMV Kernel: Max 9 blocks of size 128. Theoretical occupancy: 0.562500.
SPMV Kernel: Actual 5 blocks of size 128. Actual occupancy: 0.312500


PRE ABF: |b| =   2.3027702480588856E+05
PRE ABF: |b-A*x_sim| =   2.7582258389031086E+01
PRE ABF: |b-A*x_sim| / |b| =   1.1977859455271963E-04


DRS SCALING:=======Ranges of diagonal sizes=======
DRS SCALING: Range[1] of size 5 starts at block 0
DRS SCALING: Range[2] of size 1 starts at block 121
DRS SCALING: Total blocks = 123
[Switching focus to CUDA kernel 2, grid 45, block (0,0,0), thread (0,0,0), device 0, sm 128, warp 0, lane 0]

Thread 1 "test_drs_scalin" hit Breakpoint 2, 0x00007ffd3f6beaf0 in void gpu::kernels::modify_impl_inverses_kernel<8, double, unsigned int, unsigned int>(double**, double const*, unsigned int const*, unsigned int const*, unsigned int, unsigned int, unsigned int)<<<(4,1,1),(256,1,1)>>> ()
(cuda-gdb) n
Single stepping until exit from function _ZN3gpu7kernels27modify_impl_inverses_kernelILi8EdjjEEvPPT0_PKS2_PKT2_S9_T1_S7_S7_,
which has no line number information.
[Switching focus to CUDA kernel 3, grid 45, block (0,0,0), thread (32,0,0), device 0, sm 128, warp 1, lane 0]

Thread 1 "test_drs_scalin" hit Breakpoint 2, 0x00007ffd3f6beaf0 in void gpu::kernels::modify_impl_inverses_kernel<8, double, unsigned int, unsigned int>(double**, double const*, unsigned int const*, unsigned int const*, unsigned int, unsigned int, unsigned int)<<<(4,1,1),(256,1,1)>>> ()
(cuda-gdb)

What can be the issue?
Is there something wrong with my environment?

Appreciate any suggestions.

AKravets · March 11, 2024, 8:28am

Hi @andriy.roshchenko,
Thank you for reporting the issue! Could you please share additional information to help us identify the problem?

nvidia-smi command output (ideally on both machines - where debugging is working and where it’s not)
Additional logs from the debugger (ideally for both machines as well):
- Add NVLOG_CONFIG_FILE variable pointing the nvlog.config file (attached). E.g.: NVLOG_CONFIG_FILE=${HOME}/nvlog.config
  nvlog.config (539 Bytes)
- Run the debugging session.
- You should see the /tmp/debugger.log file created - could you share it with us?

andriy.roshchenko · March 12, 2024, 9:51pm

Hi @AKravets,

I am able to reproduce both states - when I am able to debug all kernels and when I can debug only some of them - on the same machine and the same code base.

In order to be able to debug all kernels I must enable these two snippets of code in the main .cu file:

#if defined(_DEBUG) && 1
    amgx::thrust::device_vector<aim_matrix::rhs_offset_type> rhs_offsets((aimm.RHSOffsets()), 
                       (aimm.RHSOffsets()+10));
    printf("%s (%d):\n", __FUNCTION__, __LINE__);
    for (size_t i = 0; i < 5; i+=2)
    {
        int a = rhs_offsets[i], b = rhs_offsets[i+1];
        printf("%d %d\n", a, b);
    }
#endif

#if defined(_DEBUG) && 1
    rhs_offsets.assign((aimm.RHSOffsets()), 
                       (aimm.RHSOffsets()+10));
    printf("%s (%d):\n", __FUNCTION__, __LINE__);
    for (size_t i = 0; i < 5; i+=2)
    {
        int a = rhs_offsets[i], b = rhs_offsets[i+1];
        printf("%d %d\n", a, b);
    }
#endif

Note. This code has nothing to do with the kernels I try to debug. It is located in a completely different unit of translation and is CPU code mostly.

nvidia-smi command output:
nvidia-smi.log (1.7 KB)
Successful debug attempt
debugger-OK.log (3.8 MB)
console-OK.log (3.9 KB)
Failed debug attempt
debugger.log (4.0 MB)
console.log (5.3 KB)

AKravets · March 13, 2024, 10:54am

@andriy.roshchenko
Thank you very much for the logs. We are investigating the issue.

user72837 · March 13, 2024, 12:42pm

i meet the same problem. Move my kernel to a single compile unit helps, but still cannot debug when the kernel is long.

user72837 · March 13, 2024, 12:50pm

One observation is when the kernel goes too long (although most lines are comments) the debugger just doesn’t work.

Topic		Replies	Views
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	54262	August 1, 2011
Cuda-GDB doesn't hit breakpoints inside kernel/ if the kernel is in a library and then linked to the executable CUDA-GDB vscode , cuda-gdb	9	3401	April 13, 2023
Cannot debug inside GPU CUDA Programming and Performance	4	1178	September 8, 2010
Can you debug kernel code in gdb? CUDA Programming and Performance	0	2425	December 17, 2007
Debugging CUDA How to step into a kernel function CUDA Programming and Performance	5	16164	June 4, 2021
[BUG REPORT] CUDA-GDB refuses to step into GPU Kernel CUDA-GDB	2	160	October 28, 2025
Break points ignored and does not step into cuda Kernels. CUDA-GDB	2	1372	August 7, 2017
cuda-gdb problem with function call in kernel? CUDA Programming and Performance	0	2112	December 6, 2011
Heratic behaviour of cuda-gdb Nsight Visual Studio Code Edition cuda , kernel , cuda-gdb	2	661	January 22, 2024
cuda-gdb CUDA Programming and Performance	2	8245	January 7, 2010

Single stepping until exit from function, which has no line number information

Related topics