Cuda-gdb script execution

I would like to write a script for cuda-gdb for automated debugging. The script I have is of the following form:

set pagination off
set logging file SOME_OUTPUT_FILE
set logging off

set cuda break_on_launch application

break SOME_FILE_PATH: SOME_LINE_NUMBER
    cuda kernel 0 block (0,0,0) thread (0,0,0)
    DO_SOMETHING
end

run

Essentially, I am trying to run something at each break point. However, I got Invalid coordinates. CUDA focus unchanged as I ran cuda-gdb -x SOME_SCRITP SOME_APP. What is the correct way to realize my purpose?

Hi @Ziqi

You need to use the commands gdb command to have certain actions performed on each breakpoint hit (Break Commands (Debugging with GDB) ). E.g. your script might look as follows:

...
break SOME_FILE_PATH: SOME_LINE_NUMBER
commands 1
cuda kernel 0 block (0,0,0) thread (0,0,0)
DO_SOMETHING
end
...

I compiled my code as shown in the following message:

Compiling ../ArrayMPA/src/mpa_utility.cu to make object file ../ArrayMPA/Debug/mpa_utility.x86-64.o.
-------------------------------------------------------------------
nvcc -x cu --compiler-options "-fPIC" --expt-relaxed-constexpr --gpu-architecture=sm_86 -Xcudafe --display_error_number -DPREFETCH=0 -DUSE_DCBX=0 -DX86 -D_LITTLE_ENDIAN_=1 -D__LITTLE_ENDIAN__=1  -DCPLUSPLUS_2011   -DDEBUG=2  -DDETECTION_DEBUG -DCHECKCUDAERROR -DGOOGLE_CUDA=1 -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -DUNIT_TEST=1  -g -G -O0  -I../../../../../../../../../../../../../usr/local/include  -I../ArrayMPA  -I../ArrayMPA/include  -I../../../../../../../../../../../../../usr/local/cuda/targets/x86_64-linux/include -I/usr/local/lib/python3.8/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=1 -DEIGEN_MAX_ALIGN_BYTES=64 -I ../../../../scripts/../3dparty/libpng -I ../../../../scripts/../3dparty/zlib -I../../../ops/CommonUtils//../..//shared -I../../../ops/CommonUtils//../..//ops/CommonUtils -I../../../ops/CommonUtils//../..//ops/CommonUtils/Png --keep --keep-dir ../ArrayMPA/Debug -c ../ArrayMPA/src/mpa_utility.cu -o ../ArrayMPA/Debug/mpa_utility.x86-64.o

Again I saw the following optimized-out message:

Thread 1 "TestMPA.x86-64" hit Breakpoint 1, generateCandidates_allCaAtOnce_idxCaFrameMerged_sharedMem_warpConvergence_knl<<<(8,8,48),(32,2,1)>>> (pTestFrames=0x702000000, numFrame=8, frameHeight=1520, frameWidth=896, pCareAreas=0x7320ca800, pPitchPerCA=0x7320cc800, pMaxNumCandidatesPerCA=0x7320cd000, pHamSincPerCA=0x732000000, numCA=6, truncNum=7, subCaHeight=64, pCandidatesPerPixel=0x73420cb40, pNumCandidatesPerPixel=0x7320ce200) at ../ArrayMPA/src/mpa_utility.cu:1374
1374            float pitch = pPitchPerCA[idxCA];
[Switching focus to CUDA kernel 0, grid 6, block (1,0,0), thread (1,0,0), device 0, sm 0, warp 0, lane 1]
1374            float pitch = pPitchPerCA[idxCA];
$1 = <optimized out>

What is weird is that this optimized-out message is random for the same executable, e.g., the message didn’t show up in my first run, and showed up in the second. My cuda-gdb script was written as follows:

set pagination off
set logging file gdb.output
set logging off

set breakpoint pending on

break /home/zfan/sandbox/Virgo-Algo-Container-3.0/Blazer/MercuryImageComputer/KT/leaf/Virgo/src/ops/RefGen/ArrayMPA/src/mpa_utility.cu:1374
commands 1
cuda kernel 0 block (1,0,0) thread (1,0,0)
print maxNumCandidates
end

and I ran it with

sudo cuda-gdb -x ../mpa_debug_script --args ./TestMPA.x86-64 ../debug_data frame_info.txt 1

As I entered nvcc -V, I got

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

Further, as I entered cuda-gdb --version, I got

NVIDIA (R) CUDA Debugger
11.7 release
Portions Copyright (C) 2007-2022 NVIDIA Corporation
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Is it a known bug in nvcc or cuda-gdb in CUDA 11.7?

Ok, so looks like the scripting is working for you now, but there is an issue with <optimized out> displayed for certain variables.

Without having the repro (or at least the relevant kernel code) it’s hard to tell what exactly might got wrong here. So:

Is there a container that supports the newest CUDA version and a corresponding Tensorflow? We not only need CUDA, but also a version of Tensorflow for our product.

As for the kernel, I cannot provide it because of corporate IP. Unfortunately, in most cases (except for learning cases of students), source code cannot be released on this forum. I guess Nvidia may need a way to reproduce the issue and investigate independently of customers/developers.