Debugging Dynamic Parallelism and preemption mode

Trying to write some code that utilizes dynamic parallelism, the only settings I’ve changed has been to set “Generate Relocatable Device Code” to “Yes”, set “Code Generation” to “compute_35,sm_35”, and add “cudadevrt.lib” as an additional dependency. All these were necessary just to get code with a device-side kernel launch to build.

Unfortunately this seems to break the Nsight debugger. I now get the message (popping up from Nsight)

CUDA Dynamic Parallelism debugging is not supported in preemption mode. Breakpoints will be disabled

Not sure what exactly this means, tried with Aero disabled and same results. I was under the impression that debugging was supposed to still work with dynamic parallelism, is this not the case? Is there some other setting to change to get both dynamic parallelism and Nsight debugging to work?

Based on another thread describing how debugging works, I think the problem here is that you are trying to debug the kernel on the same GPU that is running your display. This didn’t used to be possible at all, but NVIDIA found a way to allow some level of preemption of the GPU device so that it can halt your program in the debugger and still redraw the display. It sounds like this trick (since the card doesn’t really support preemption like a CPU) doesn’t yet work for dynamic parallelism.

The most direct solution I think is to get a second cheap GPU to use exclusively for your display, but hopefully someone knowledgeable about CUDA on Windows will chime in here…

Nsight Visual Studio Edition single GPU debugging mode (“preemption mode”) is not yet supported for applications using CUDA Dynamic Parallelism. Nsight VSE can debug CDP applications using remote debugging or local debugging where the CDP capable device is headless or configured to use the Tesla Compute Cluster (TCC) driver.

Hey Greg, thanks for the response, just to clarify, can a second GPU be added to the local machine (not the same as the CUDA GPU) to drive the display and allow for debugging of dynamic parallelism?


You an use any GPU to drive the display. The CC 3.5 device must be headless or configured to use the TCC driver.


Any idea when/if debugging dynamic parallelism will be supported in “preemption mode”? The nsight page brags about debugging dynamic parallelism with absolutely no caveats.

Thanks for the hint Greg,

I have the setup mentioned by randallr - I have headless Titan (no monitor hooked to it). My screen is being driven by a small AMD Radeon video card. And yet, when trying to debug a Dynamic Parallelism example I am experiencing the error.

I ensured I have the latest display driver (not the Tesla one). Do I understand correctly that I should be able to debug Dynamic Parallelism examples?

Also, when my computation is taking more than a few seconds, Windows (7) restarts my video driver. From what I understand, this is expected Windows behavior - is it correct?

Do I need to install the TCC driver to be able to debug and code against the Titan in this setup?



Hi Anton, I have the same setup and the same issue. Were you able to resolve yours?

Kind regards,

To resolve the issue of Windows 7 restarting the video driver, you need to disable TDR. See:

Make sure you’re using the latest CUDA 5.5 and the Parallel NSight that comes with it if you’re attempting to debug any code that has dynamic parallelism features also. There’s no concept of a TCC driver for anything other than a K20 or K20X, so if you have a Titan or a GK208-based NVIDIA card that supports CC 3.5, it might require to have a different (NVIDIA) GPU to drive the display if the same preemption error is still present.

I had a mix of an ATI (driving display) and NVIDIA GPU (CUDA) at one point and for a specific software at the time (Jacket) they did not play nice together, so that itself might be the issue.