Debugging Optix code with CUDA 11.6 cuda-gdb locks desktop GUI

drwootton1 · January 26, 2022, 3:07am

I saw the notice about cuda-gdb supporting Optix code debugging so gave it a try, running Fedora 35, driver 510.39.01, CUDA 11.6 with a RTX 3060 GPU.

The debugger seems to work well. I put an assert in a miss program and when the assert triggered, the debugger showed code reasonably close to the assert.

One thing I encountered is that when I run my program, my desktop GUI blocks for between 5-10 seconds, so I can’t do anything, unblocks and runs a bit, then blocks again, until the assert triggers, then the desktop behaves normally. I also noticed that nvidia-smi -l shows the GPU at 100% load until the assert triggers.

I am using the same GPU to run my program and to run my Linux KDE desktop GUI, since I have only one GPU in the system.

I also tried issuing the set cuda software_preemption on command even though the cuda-gdb reference says it’s not required for RTX 3060, with no effect.

Is using the same GPU for running the desktop and the Optix program a limitation? Do I need to do something like ssh to my Fedora system and run cuda-gdb thru the ssh session?

dhart · January 29, 2022, 3:46pm

Hi @drwootton1,

I haven’t had a chance to try to reproduce this yet, but wanted to mention a couple of things anyway.

Using a remote debug setup will indeed at least make the lockup experience less painful, but at some cost to convenience. If you have two machines or a spare GPU, another option is to run your display with a different GPU than your debug session.

I remember some amount of short stalls when using cuda-gdb, but perhaps not as big as 5-10 seconds, and asking around my team nobody recalls seeing such large stalls. I will try to confirm or deny next week. My understanding is that when breaking and stepping, cuda-gdb copies GPU memory to the host in order to be able to display info on memory, registers, instructions for all threads, so it was never particularly fast. I suppose the stall time may depend on what’s going on in your program and how much active memory there is.

One thing you can try is debugging an OptiX SDK sample to see if the stall is similar. Another thing to check is using cuda-gdb on one of the CUDA SDK samples. These tests would at least tell you whether your stall is specific to your program and/or specific to an OptiX program.

–
David.

drwootton1 · January 29, 2022, 7:58pm

I tried the unmodified optixTriangle Optix 7.4 sample and see the same long pauses.
I set up a VNC session from a laptop and the machine that has the RTX 3060 had long hangs where the display locked up.

optixTriangle did use the VNC display to display its image and the VNC session did not lock up. The cuda-gdb session did run quite slowly and it took a while to reach the breakpoint I set in a miss program. So this does work a little better.

drwootton1 · January 31, 2022, 2:54am

I get the same hang behavior running a CUDA program which does not use Optix at all. I did not see this behavior trying to use cuda-gdb to debug Optix programs prior to updating to CUDA 11.6.

So this might be due to the new cuda-gdb in CUDA 11.6, the new driver, 510.39.01, or possibly an updated Linux kernel 5.15.16-200.fc35.x86_64.

I also don’t seem to see this consistently with CUDA code. I thought I saw it yesterday, then it went away, came back today.

Topic		Replies	Views
cuda-gdb hang and compiled program spewing nonsense CUDA Programming and Performance	7	2258	February 15, 2011
cuda-gdb hangs CUDA-GDB	12	8415	May 23, 2014
cuda-gdb hangs in the CUDA 2.3 beta CUDA Programming and Performance	0	1120	June 30, 2009
Cuda-gdb crashing in OptiX application Nsight Visual Studio Code Edition cuda-gdb , optix	7	818	February 19, 2025
Debugging is broken after updating to Cuda 12.1 OptiX	10	2380	May 26, 2023
cuda-gdb performance CUDA Programming and Performance	12	6970	June 15, 2009
Unable to disassemble Optix GPU code CUDA-GDB	4	1895	January 4, 2022
163x performance boost on Fedora 28 vs Windows 10? CUDA Setup and Installation	7	533	February 1, 2019
GPU Clock Idle when desktop Locked Linux	3	810	January 25, 2022
CPU hangs when calling thrust::copy_if CUDA Programming and Performance	14	2589	August 10, 2015

Debugging Optix code with CUDA 11.6 cuda-gdb locks desktop GUI

Related topics