Call graph view in Nsight Compute

It is possible to see the kernel call graphs (which kernel calls another kernel) in nsight compute? I see some pictures on the web, but I can not find any option for that in “nv-nsight-cu-cli --help”.
Any idea?

Are you referring to the “CUDA Task Graph Profiling” image on https://developer.nvidia.com/nsight-compute? This feature shows exporting the current CUDA graph to svg/graphviz format. This is possible in the UI from the Resources tools window. It’s not available in the CLI, as it doesn’t track the graph state.

Yes I mean that.
Do you mean that I can use simple profiling, e.g. “nv-nsight-cu-cli -o outpufile ./app” and then open outputfile via the visual nsight and then go to Resources tools?

No, Resources tracking is only available in the Interactive Profile activity, which can be launched from the UI. In this activity, you can interactively step through your application between CUDA API calls and kernel launches, similar to a debugger. Whenever you are suspended, the Resources window shows the current state of all tracked resources, including CUDA graphs.

https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#quick-start-interactive
https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#tool-window-resources

I did that. Please see the picture below:
https://pasteboard.co/JpZ6p9L.png

After some steps, I exported an SVG file, but it is very small file with about 500 bytes.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
 "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.39.20160612.1140 (20161225.0304)
 -->
<!-- Title: dot Pages: 1 -->
<svg width="8pt" height="8pt"
 viewBox="0.00 0.00 8.00 8.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 4)">
<title>dot</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-4 4,-4 4,4 -4,4"/>
</g>
</svg>

Opening that in a browser shows a blank page. What should I do more?

Your application doesn’t appear to use CUDA graphs, at least not for the kernels you are showing. This is indicated by the “Graphs” resources window being empty, and your kernel not being launched from a “cuGraphLaunch” or similar API call. Instead, it appears to be a common kernel launch.

The API Stream window for a CUDA graphs https://developer.nvidia.com/blog/cuda-graphs/ launch would show something similar to:

,12,cuGraphInstantiate_v2,,CUDA_SUCCESS(0),"(0x7fff1ea595c0{0x1159068}, 0x11541e8, 0x0, , 0)",,,,
13,cuGraphLaunch,,,"(0x1159068, 0x1140280)",,,,
14,MEMSET N:0->0[SRC] G:0->0[SRC],,cudaSuccess(0),"dst: 0x7f543bc00000, pitch: 0, val: 0, elementSize: 1, dim: 14 x 1",,,,
15,KERNEL N:1->1[SRC] G:0->0[SRC],hello_world,,"grid: 1 x 1 x 1, block: 1 x 1 x 1, sharedMemBytes: 0",,,,
1 Like

OK. So that means my code must use an API which is appropriate for viewing the call graphs. In fact I am using a benchmark, so I am not able to modify that.
At least I got the point now. Thank you.

Correct. CUDA graphs is a special API to make kernels and memcpys known to the driver as a pre-defined graph, which can then be executed multiple times for better performance. This is independent of traditional CUDA kernel launches.

As for your notion of “call graphs”, “which kernel calls another kernel”, I am not certain what you are referring to, since normally kernels are launched from the CPU using e.g. <<<>>> or an API call, and don’t call each other, except when using CUDA Dynamic Parallelism https://developer.nvidia.com/blog/cuda-dynamic-parallelism-api-principles/ Showing individual child kernels from CDP launches is not supported, the complete tree is collapsed into into one kernel when reported in Nsight Compute.

1 Like

Maybe I should use a better word… I meant kernel call orders. For example multiple kernels exists in the different paths in if…else. So, I may see kernel1->kernel4->kernel6->kernel2->…

To see the ordered timeline of API calls and/or kernel in the UI or CLI, it’s best to use Nsight Systems https://developer.nvidia.com/nsight-systems

1 Like

@felix_dt
When I work with nsight systems, I can not see the kernel order. As you can see in the picture https://pasteboard.co/JxBcTv0.jpg although the timeline shows the kernel calls, however, some of them are shown in parallel maybe because of small time differences.

I think the serialized trace view you are looking for is only available on the command line via additional scripts shipped with Nsight Systems (you can collect the report from the UI or the CLI, but the script output will be on the command line).

The script you are looking for should be https://docs.nvidia.com/nsight-systems/UserGuide/index.html#gputrace
More info can also be found in https://docs.nvidia.com/nsight-systems/UserGuide/index.html#existing-report-scripts

1 Like

Right. That is what I am looking for.