Debugging device code does not work

Q_2 · April 22, 2013, 5:10pm

Hello,

I’m programming a shared library (with several source files) containing several cuda kernels. I’d like to debug the device code, but it’s not working and I can’t figure out why…

I’m compiling the source files which contain cuda code with the -g -G options. After that I’m creating a shared library object from all the object files and move the shared library in a different folder (but I’m using the “directory” command in gdb, in order to make sure that gdb knows where the source files are)

I’m able to debug host code, but gdb does not stop in kernel code.
Does anyone know what I’m doing wrong? I tried googling, but I didn’t find a solution and I’m getting more and more frustrated…

Thanks in advance for any help!

Q_2 · April 23, 2013, 9:04am

Some additional information:

I have 2 graphic cards and I'm using one of them for my displays and the other solely for CUDA stuff (I'm able to debug simple programs)
I'm using CUDA 5.0 on ubuntu 12.04
My Kernel is using some inline functions

When I try to step through a Kernel, I get the following output from cuda-gdb:
first step:

namesp::__wrapper__device_stub_myKernel<namesp::someclass> (__cuda_0=@0x7fffa0fbb018, 
    __cuda_1=@0x7fffa0fbb010, __cuda_2=@0x7fffa0fbb008, __cuda_3=@0x7fffa0fbb004, __cuda_4=@0x7fffa0fbb000, 
    __cuda_5=@0x7fffa0fbaffc, __cuda_6=@0x7fffa0fbb040, __cuda_7=@0x7fffa0fbb048, __cuda_8=@0x7fffa0fbb050, 
    __cuda_9=@0x7fffa0fbaff8, __cuda_10=@0x7fffa0fbb058, __cuda_11=@0x7fffa0fbaff4, __cuda_12=@0x7fffa0fbaff0, 
    __cuda_13=@0x7fffa0fbafec, __cuda_14=@0x7fffa0fbafe8, __cuda_15=@0x7fffa0fbb060, __cuda_16=@0x7fffa0fbb068)
    at mySource.cudafe1.stub.c:662
662	template<> __specialization_static void __wrapper__device_stub_myKernel< ::namesp::someclass>(  _ZN10namesp12someclassE *&__cuda_0, _ZN10namesp14someclassMSE *&__cuda_1,uint32_t *&__cuda_2,const uint32_t &__cuda_3,const uint32_t &__cuda_4,const uint32_t &__cuda_5,const uint32_t &__cuda_6,unsigned *&__cuda_7,const int &__cuda_8,const float &__cuda_9,const int &__cuda_10,const float &__cuda_11,const float &__cuda_12,const float &__cuda_13,const float &__cuda_14,float *&__cuda_15,const uint32_t &__cuda_16){__device_stub__ZN10namesp19myKernelINS_12someclassEEEvPT_PNS_1someclassMSEPjjjjjS6_ififfffPfj( __cuda_0,__cuda_1,__cuda_2,__cuda_3,__cuda_4,__cuda_5,__cuda_6,__cuda_7,__cuda_8,__cuda_9,__cuda_10,__cuda_11,__cuda_12,__cuda_13,__cuda_14,__cuda_15,__cuda_16);}}

second step:

__device_stub__ZN10namesp19myKernelINS_12someclassEEEvPT_PNS_14someclassMSEPjjjjjS6_ififfffPfj (
    __par0=0x5008c0000, __par1=0x5007f0000, __par2=0x5006c0000, __par3=3, __par4=10, __par5=512, __par6=7680, 
    __par7=0x5006c3000, __par8=73, __par9=0.300000012, __par10=1392, __par11=1319.91235, __par12=1323.49597, 
    __par13=714.734375, __par14=487.081604, __par15=0x501720000, __par16=128) at mySource.cudafe1.stub.c:660
660	static void __device_stub__ZN10namesp19myKernelINS_12someclassEEEvPT_PNS_14someclassMSEPjjjjjS6_ififfffPfj( _ZN10namesp12someclassE *__par0,  _ZN10namesp14someclassMSE *__par1, uint32_t *__par2, const uint32_t __par3, const uint32_t __par4, const uint32_t __par5, const uint32_t __par6, unsigned *__par7, const int __par8, const float __par9, const int __par10, const float __par11, const float __par12, const float __par13, const float __par14, float *__par15, const uint32_t __par16){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 16UL);__cudaSetupArgSimple(__par3, 24UL);__cudaSetupArgSimple(__par4, 28UL);__cudaSetupArgSimple(__par5, 32UL);__cudaSetupArgSimple(__par6, 36UL);__cudaSetupArgSimple(__par7, 40UL);__cudaSetupArgSimple(__par8, 48UL);__cudaSetupArgSimple(__par9, 52UL);__cudaSetupArgSimple(__par10, 56UL);__cudaSetupArgSimple(__par11, 60UL);__cudaSetupArgSimple(__par12, 64UL);__cudaSetupArgSimple(__par13, 68UL);__cudaSetupArgSimple(__par14, 72UL);__cudaSetupArgSimple(__par15, 80UL);__cudaSetupArgSimple(__par16, 88UL);__cudaLaunch(((char *)((void ( *)( _ZN10namesp12someclassE *,  _ZN10namesp14someclassMSE *, uint32_t *, const uint32_t, const uint32_t, const uint32_t, const uint32_t, unsigned *, const int, const float, const int, const float, const float, const float, const float, float *, const uint32_t))namesp::myKernel<namesp::someclass> )));}namespace namesp{

when I step again, I’m back in my host code. During the steps, cuda-gdb tells me: “Focus not set on any active CUDA kernel.” (when using the “cuda kernel” command in gdb)

As you can see, I’m using c+±templates, but I already tried the same without templates and debugging didn’t work either.

I also tried reducing the block-size, but I still could not debug.

Any suggestions?

vacaloca · April 23, 2013, 11:03pm

Haven’t used cuda-gdb, so pardon my ignorance. Perhaps the problem resides in the fact that the code is in a shared library. If you can, try and see if you can make some of it as a standalone executable and see if you can debug it that way. Before you even do that, try cuda-gdb with a simple ‘hello world’ type example and see if that works and go from there. Hope it helps. :)

Q_2 · April 24, 2013, 8:14am

Thanks for your comment!
I forgot to mention that I successfully debugged a kernel in a shared library already (I wrote a small test program and a shared library for this purpose), so debugging a shared library shouldn’t be a general problem.
I’m loading the symbols with the “sharedlibrary”-command and in my test program/shared library this works fine.

Q_2 · July 10, 2013, 3:11pm

I think I figured out, what the problem is:

I’m using set auto-solib-add off in my .cuda-gdbinit-file.

cuda-gdb can’t debug kernels with auto-solib-add off on start-up of the executable.
When I set auto-solib-add on after start-up, debugging of kernels still does not work, even when loading all symbols of all shared libraries.
In this case (when loading the shared symbols of all shared libraries) I get the following error when continuing:

The CUDA driver has hit an internal error.
Error code: 0x19007a00000001c
Further execution or debugging is unreliable.

I tried “set stop-on-solib-events 1” to make sure there is no shared library which is loaded on start-up and unloaded directly afterwards.

cuda-gdb obviously does not work with auto-solib-add off, which is pretty annoying in my case, as I don’t want cuda-gdb to load the symbols of ALL shared libraries (because it slows down the execution of my application dramatically (~100 shared libraries)).

I hope NVIDIA will fix this problem in the near future.

Q_2 · July 10, 2013, 3:12pm

blank

geoffg · July 10, 2013, 10:14pm

Thanks for posting the issue.

To be able to use ‘set auto-solib-add off’ with currently released versions of cuda-gdb, you can do the following:

(1) Compile your application to explicitly link against libcuda. Example: nvcc -g -G myapp.cu -o myapp -lcuda
(2) Set a breakpoint at main (‘break main’), then type ‘run’.
(3) Type ‘sharedlibrary libcuda’
(4) Proceed with the debug session. You should now be able to break into kernels and use the debugger as normal.

We will look into an automatic solution (to avoid the manual workaround above) for a future release of cuda-gdb.

Q_2 · July 11, 2013, 3:15pm

Thanks a lot for your reply, geoffg.

I tried your workaround and it works perfectly fine! Thanks for this very useful hint.
An automatic solution would be better, but the workaround is much better than nothing!

Another feature that would be great to have is python support for cuda-gdb.

Topic		Replies	Views
cuda-gdb cannot break in device code CUDA Programming and Performance	2	1859	April 12, 2011
Cuda-gdb does not work in wsl2 CUDA-GDB cuda , wsl	11	2408	November 7, 2023
Cuda-gdb crash when trying to debug kernel launched through `cudaLaunchCooperativeKernel` CUDA-GDB cuda-gdb	11	2397	April 29, 2024
Unable to properly debug inside vscode Nsight Visual Studio Code Edition	4	1707	May 16, 2023
Ubuntu 8.10 and CUDA-GDB debugger failed CUDA Programming and Performance	4	7936	January 6, 2009
Try to Debug CUDA code launched by Python script (interfaced by Cython) Updated Nsight Visual Studio Code Edition cuda , nsight , python	5	3086	January 15, 2024
cuda-gdb on eclipse (cdt) CUDA Programming and Performance	11	7704	July 18, 2011
attach cuda-gdb to a running process failed CUDA-GDB	10	3101	November 29, 2017
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53734	August 1, 2011
VS Code CUDA GDB "The editor could not be opened because the file was not found" Nsight Visual Studio Code Edition vscode	5	1001	March 31, 2024

Debugging device code does not work

Related topics