I’m following the guide in the documentation Getting Started with the CUDA Debugger :: NVIDIA Nsight VSCE Documentation and setup VS Code with Nsight plugin. I’m on the most recent version of VS Code (Version: 1.79.2 (Universal)) and Nsight (v2023.2.32964508). I was able to build the matrixMul code and launch the cuda-gdb
debugger. However, I get the following error message in VS Code: “A Unwinder should return gdb.UnwindInfo instance.”. This shows for all aspects of the call stack, the locals don’t contain any values with lots of variables showing “<optimized out>” and when I hover over variables in the CUDA kernel, it shows it a blank black box. I have the -g -G nvcc flags set as that is part of the tutorial based on make dbg=1, but have included the nvcc command output as well.
Here is the output from the cuda-gdb terminal:
NVIDIA (R) CUDA Debugger
11.8 release
Portions Copyright (C) 2007-2022 NVIDIA Corporation
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff529a000 (LWP 1350557)]
[Detaching after fork from child process 1350558]
[New Thread 0x7fffe941a000 (LWP 1350581)]
[New Thread 0x7fffe8c19000 (LWP 1350582)]
Thread 1 "matrixMul" hit Breakpoint 3,
Thread 1 "matrixMul" hit Breakpoint 1, MatrixMulCUDA<32><<<(20,10,1),(32,32,1)>>> (C=, A=, B=, wA=, wB=) at matrixMul.cu:70
70 int aBegin = wA * BLOCK_SIZE * by;
cuda block (0, 0, 0) thread (0, 0, 0)
CUDA focus unchanged.
cuda block (0, 0, 0) thread (0, 0, 0)
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
70 int aBegin = wA * BLOCK_SIZE * by;
cuda block (0, 0, 0) thread (0, 0, 0)
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
70 int aBegin = wA * BLOCK_SIZE * by;
cuda block (0, 0, 0) thread (0, 0, 0)
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
70 int aBegin = wA * BLOCK_SIZE * by;
cuda block (0, 0, 0) thread (0, 0, 0)
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
70 int aBegin = wA * BLOCK_SIZE * by;
Here is nvcc build command:
nvcc -ccbin g++ -I../../../Common -m64 -g -G -Xcompiler -O0 -Xptxas -O0 -lineinfo -O0 --threads 0 --std=c++11 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -o matrixMul.o -c matrixMul.cu
nvcc -ccbin g++ -m64 -g -G -Xcompiler -O0 -Xptxas -O0 -lineinfo -O0 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -o matrixMul matrixMul.o
I added -lineinfo from a SO post that mentioned it but it appears that -G overrides that.
Here is a screenshot of the local variables (I’m only able to add one screenshot).
I’m connected via remote-ssh plugin to an A100x8 Ubuntu 20.04.5 server.