Nsight profile and cudaGraphInstantiateFlagDeviceLaunch

I am trying to profile an application that make use of CUDA Graph with self tail launch (e.g. I have a persistent graph that loops forever.

I instantiate the graph with cudaGraphInstantiate(&graph_exec, run_graph, cudaGraphInstantiateFlagDeviceLaunch);
And then relaunch it inside a kernel:

__global__ void looper(executor_data *data) {
auto g = cudaGetCurrentGraphExec();
if (g) {
   int ret = cudaGraphLaunch(g, cudaStreamGraphTailLaunch);
   }
}

When I run my application in the profiler I cannot go beyond the cudaGraphInstantiate call, with no clear error output.
This works fine when run as standalone application.

Is it supposed to work or is it a limitation of Nsight Systems?

Unfortunately you have hit a deficiency in CUPTI, which is what Nsys uses under the covers to get CUDA graph information. It currently cannot handle graphs launched from the GPU side. They are working on getting that fixed, and we will update our shipping CUPTI in Nsys as soon as that is available.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.