Skip GPU routines in Nsight debugging

I am writing a software that runs several GPU kernels one by one, listed as follows:

kernel1<<<blocks, threads>>>();
kernel2<<<blocks, threads>>>();
kernel3<<<blocks, threads>>>();
kernel4<<<blocks, threads>>>();

I am thinking of using Cuda debugging for kernel3, however, when I start the debugger, it takes very long time for the debugger to complete running kernel1 and kernel2 before reaching kernel3.

Is there any way for the debugger to skip ‘looking into details’ for kernel1 and kernel2, such that it can reach kernel3 for faster debugging work?


hi Simon Tong,
if you just set BP in Kernel3, it should just stop at kernel3. but if your kernel3 depends on kernel1/2 results, maybe you need to snap the result of Kernel1/2 and then load it directly via stub function.

if (Stage == “Stage1”) {
kernel1<<<blocks, threads>>>();
kernel2<<<blocks, threads>>>();
} elseif (Stage == “Stage2”) {
load result();
kernel3<<<blocks, threads>>>();
kernel4<<<blocks, threads>>>();
} else {


Thanks Victor,

In other words, what we can do is use programming skills to skip kernel1 and kernel2 when debugging kernel3, using saved results of output from kernel1 and kernel2.

it depends on you requirement. you can also use the condition BP or Trace point to skip debugging kernel1 and kernel2. But these can’t solve the long time for complete running kernel1 and kernel2.
so programming skills is better for yours.