How to debug kernel code writing with PTX

I know the cuda-gdb give us the chance to debug kernel code lively, it can break a source line, or print a symbol value in kernel according to the XXX_kernel.cu file.

But if there is no XXX_kernel.cu file, all kernel code is compiled from a PTX (file or string), can cuda-gdb still accomplish this task? For example, when cuda-gdb suspend a device, can it print to the terminal which line of PTX file the current device PC point to?

Here is the case:

root@lab-desktop:/home/lab/cudaGdb/obj# /home/lab/cudaGdb/obj/gdb/gdb /home/lab/NVIDIA_GPU_Computing_SDK/C/bin/linux/debug/matrixMul
NVIDIA ® CUDA Debugger
3.1 beta release

This GDB was configured as “i686-pc-linux-gnu”…
Using host libthread_db library “/lib/tls/i686/cmov/libthread_db.so.1”.
(cuda-gdb) b matrixMul
Breakpoint 1 at 0x804b32e: file ./matrixMul_kernel.cu, line 41.
(cuda-gdb) r
Starting program: /home/lab/NVIDIA_GPU_Computing_SDK/C/bin/linux/debug/matrixMul
[Thread debugging using libthread_db enabled]
[New process 8266]
[ matrixMul ]
/home/lab/NVIDIA_GPU_Computing_SDK/C/bin/linux/debug/matrixMul Starting…
[New Thread -1210976560 (LWP 8266)]
Device 0: “GeForce 9500 GT” with Compute 1.1 capability
Using Matrix Sizes: A(32 x 64), B(32 x 64), C(32 x 64)
[Launch of CUDA Kernel 0 on Device 0]
[Switching to CUDA Kernel 0 (<<<(0,0),(0,0,0)>>>)]

Breakpoint 1, matrixMul <<<(2,4),(16,16,1)>>> (C=0x104000, A=0x100000,
B=0x102000, wA=32, wB=32) at matrixMul_kernel.cu:43
43 int bx = blockIdx.x;
(cuda-gdb) si
0x08e95d28 43 int bx = blockIdx.x;
(cuda-gdb) p bx
warning: Variable is not live at this point. Returning garbage value.
$1 = 0
(cuda-gdb) l
38 ////////////////////////////////////////////////////////////////////////////////
39 global void
40 matrixMul( float* C, float* A, float* B, int wA, int wB)
41 {
42 // Block index
43 int bx = blockIdx.x;
44 int by = blockIdx.y;
45
46 // Thread index
47 int tx = threadIdx.x;
(cuda-gdb) p tx
warning: Variable is not live at this point. Returning garbage value.
$2 = 0
(cuda-gdb) n
44 int by = blockIdx.y;
(cuda-gdb) n
47 int tx = threadIdx.x;
(cuda-gdb) p by
$3 = 0
(cuda-gdb)

Everything is good, but if I change matrixMul with matrixMulDrv…

root@lab-desktop:/home/lab/cudaGdb/obj# /home/lab/cudaGdb/obj/gdb/gdb /home/lab/NVIDIA_GPU_Computing_SDK/C/bin/linux/debug/matrixMulDrv
NVIDIA ® CUDA Debugger
3.1 beta release

(cuda-gdb) b matrixMul
Function “matrixMul” not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (matrixMul) pending.
(cuda-gdb) r
Starting program: /home/lab/NVIDIA_GPU_Computing_SDK/C/bin/linux/debug/matrixMulDrv
[Thread debugging using libthread_db enabled]
[New process 8669]
[ matrixMulDrv ]

Device 0: “GeForce 9500 GT” with Compute 1.1 capability
findModulePath found file at </home/lab/NVIDIA_GPU_Computing_SDK/C/bin/linux/debug/…/…/…/src/matrixMulDrv/data/matrixMul_kernel.ptx>
initCUDA loading module: </home/lab/NVIDIA_GPU_Computing_SDK/C/bin/linux/debug/…/…/…/src/matrixMulDrv/data/matrixMul_kernel.ptx>
Breakpoint 2 at 0x8233d90
Pending breakpoint “matrixMul” resolved
[New Thread -1219905840 (LWP 8669)]
PTX JIT log:
: Considering profile ‘compute_11’ for gpu=‘sm_11’ in ‘cuModuleLoadDataEx_4’
;
[Launch of CUDA Kernel 0 on Device 0]

[Switching to CUDA Kernel 0 (<<<(0,0),(0,0,0)>>>)]

Breakpoint 2, 0x08233d90 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb) l
66
67 static char sSDKsample = “matrixMulDrv”;
68
69 ////////////////////////////////////////////////////////////////////////////////
70 // Program main
71 ////////////////////////////////////////////////////////////////////////////////
72 int
73 main(int argc, char
* argv)
74 {
75 printf("[ %s ]\n", sSDKsample);
(cuda-gdb) si
0x08233d98 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
0x08233da0 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
0x08233da8 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
0x08233db0 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
0x08233db8 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
0x08233dc0 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
0x08233dc8 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
0x08233dd0 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
0x08233dd8 in matrixMul <<<(8,5),(16,16,1)>>> ()
(cuda-gdb)
(cuda-gdb) p tx
No symbol “tx” in current context.
(cuda-gdb) p r1
No symbol “r1” in current context.
(cuda-gdb) p r0
No symbol “r0” in current context.
(cuda-gdb) p __cudaparm_matrixMul_A
No symbol “__cudaparm_matrixMul_A” in current context.
(cuda-gdb) p __cuda___cuda_local_var_22536_39_Bs20
No symbol “__cuda___cuda_local_var_22536_39_Bs20” in current context.
(cuda-gdb) l
76
77 runTest(argc, argv);
78
79 cutilExit(argc, argv);
80 }
81
82 ////////////////////////////////////////////////////////////////////////////////
83 //! Run a simple test for CUDA
84 ////////////////////////////////////////////////////////////////////////////////
85 void
(cuda-gdb)

No debug information about PTX can be found in cuda-gdb. The same thing happened to official example: ptxjit matrixMulDrv matrixMulDynlinkJIT vectorAddDrv.

So all I want to know is whether there exist a way to debug the device code (kernel) in ptx format? Since the ptxas support the “-g” option?

What I usually do when debugging with driver API is to compile the source to a cubin file (with debug information) and then load the cubin instead of the ptx file. This works reasonably well, although the line numbers are sometimes a bit off (I suspect thats because of some include files, which might change the line numbers.)

cuda-gdb does not support source level PTX debugging yet.