I am trying to profile an application that make use of CUDA Graph with self tail launch (e.g. I have a persistent graph that loops forever.
I instantiate the graph with cudaGraphInstantiate(&graph_exec, run_graph, cudaGraphInstantiateFlagDeviceLaunch);
And then relaunch it inside a kernel:
__global__ void looper(executor_data *data) {
auto g = cudaGetCurrentGraphExec();
if (g) {
int ret = cudaGraphLaunch(g, cudaStreamGraphTailLaunch);
}
}
When I run my application in the profiler I cannot go beyond the cudaGraphInstantiate
call, with no clear error output.
This works fine when run as standalone application.
Is it supposed to work or is it a limitation of Nsight Systems?