Its possible to use cuda-gdb from python, assuming you only need to debug the C/C++ portion. I don’t know of a debugger that can jump from debugging python to debugging CUDA C++
To debug a CUDA C/C++ library function called from python, the following is one possibility, inspired from this article:
- For this walk through, I will use the t383.py and t383.cu files verbatim from this answer:
I’ll be using CUDA 10, python 2.7.5, on CentOS7
- Compile your CUDA C/C++ library using the -G and -g switches, as you would to do ordinary debug:
$ nvcc -Xcompiler -fPIC -std=c++11 -shared -arch=sm_60 -G -g -o t383.so t383.cu -DFIX
- We’ll need two terminal sessions for this. I will refer to them as session 1 and session 2. In session 1, start your python interpreter:
- In session 2, find the process ID associated with your python interpreter (replace USER with your actual username):
$ ps -ef |grep USER
USER 23221 22694 0 23:55 pts/0 00:00:00 python
In the above example, 23221 is the process ID for the python interpreter (use man ps for help)
- In session 2, start cuda-gdb so as to attach to that process ID:
$ cuda-gdb -p 23221
... (lots of spew here)
- In session 2, at the (cuda-gdb) prompt, set a breakpoint at a desired location in your CUDA C/C++ library. For this example, we will set a breakpoint at one of the first lines of kernel code, line 70 in the t383.cu file. If you haven’t yet loaded the library (we haven’t, in this walk through), then cuda-gdb will point this out and ask you if you want to make the breakpoint pending on a future library load. Answer y to this (alternatively, before starting this cuda-gdb session, you could have run your python script once from within the interpreter, as we will do in step 7 below. This would load the symbol table for the library and avoid this prompt). After the breakpoint is set, we will issue the continue command in cuda-gdb in order to get the python interpreter running again:
(cuda-gdb) break t383.cu:70
No symbol table is loaded. Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (t383.cu:70) pending.
- In session 1, run your python script:
1, 1, 1, 1, 1,
1, 1, 1, 1,
1, 1, 1, 1,
- our python interpreter has now halted (and is unresponsive), because in session 2 we see that the breakpoint has been hit:
[New Thread 0x7fdb0ffff700 (LWP 23589)]
[New Thread 0x7fdb0f7fe700 (LWP 23590)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
Thread 1 "python" hit Breakpoint 1, update_water_flow<<<(1,1,1),(1024,1,1)>>> (
d_updated_water_flow_map=0x7fdb00800800, SIZE_X=4, SIZE_Y=4) at t383.cu:70
70 int col = index % SIZE_X;
and we see that the breakpoint is at line 70 of our library (kernel) code, just as expected. ordinary C/C++ cuda-gdb debug can proceed at this point within session 2, as long as you stay within the library function.
When you are finished debugging (you may need to remove any breakpoints set) you can once again type continue in session 2, to allow control to return to the python interpreter in session 1, and for your application to finish.