Call graph view in Nsight Compute

mahmood.nt · September 7, 2020, 12:49pm

It is possible to see the kernel call graphs (which kernel calls another kernel) in nsight compute? I see some pictures on the web, but I can not find any option for that in “nv-nsight-cu-cli --help”.
Any idea?

felix_dt · September 7, 2020, 12:57pm

Are you referring to the “CUDA Task Graph Profiling” image on https://developer.nvidia.com/nsight-compute? This feature shows exporting the current CUDA graph to svg/graphviz format. This is possible in the UI from the Resources tools window. It’s not available in the CLI, as it doesn’t track the graph state.

mahmood.nt · September 7, 2020, 1:04pm

Yes I mean that.
Do you mean that I can use simple profiling, e.g. “nv-nsight-cu-cli -o outpufile ./app” and then open outputfile via the visual nsight and then go to Resources tools?

felix_dt · September 7, 2020, 1:08pm

No, Resources tracking is only available in the Interactive Profile activity, which can be launched from the UI. In this activity, you can interactively step through your application between CUDA API calls and kernel launches, similar to a debugger. Whenever you are suspended, the Resources window shows the current state of all tracked resources, including CUDA graphs.

mahmood.nt · September 7, 2020, 1:47pm

I did that. Please see the picture below:
https://pasteboard.co/JpZ6p9L.png

After some steps, I exported an SVG file, but it is very small file with about 500 bytes.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
 "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.39.20160612.1140 (20161225.0304)
 -->
<!-- Title: dot Pages: 1 -->
<svg width="8pt" height="8pt"
 viewBox="0.00 0.00 8.00 8.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 4)">
<title>dot</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-4 4,-4 4,4 -4,4"/>
</g>
</svg>

Opening that in a browser shows a blank page. What should I do more?

felix_dt · September 7, 2020, 1:59pm

Your application doesn’t appear to use CUDA graphs, at least not for the kernels you are showing. This is indicated by the “Graphs” resources window being empty, and your kernel not being launched from a “cuGraphLaunch” or similar API call. Instead, it appears to be a common kernel launch.

The API Stream window for a CUDA graphs https://developer.nvidia.com/blog/cuda-graphs/ launch would show something similar to:

,12,cuGraphInstantiate_v2,,CUDA_SUCCESS(0),"(0x7fff1ea595c0{0x1159068}, 0x11541e8, 0x0, , 0)",,,,
13,cuGraphLaunch,,,"(0x1159068, 0x1140280)",,,,
14,MEMSET N:0->0[SRC] G:0->0[SRC],,cudaSuccess(0),"dst: 0x7f543bc00000, pitch: 0, val: 0, elementSize: 1, dim: 14 x 1",,,,
15,KERNEL N:1->1[SRC] G:0->0[SRC],hello_world,,"grid: 1 x 1 x 1, block: 1 x 1 x 1, sharedMemBytes: 0",,,,

mahmood.nt · September 7, 2020, 2:03pm

OK. So that means my code must use an API which is appropriate for viewing the call graphs. In fact I am using a benchmark, so I am not able to modify that.
At least I got the point now. Thank you.

felix_dt · September 7, 2020, 2:10pm

Correct. CUDA graphs is a special API to make kernels and memcpys known to the driver as a pre-defined graph, which can then be executed multiple times for better performance. This is independent of traditional CUDA kernel launches.

As for your notion of “call graphs”, “which kernel calls another kernel”, I am not certain what you are referring to, since normally kernels are launched from the CPU using e.g. <<<>>> or an API call, and don’t call each other, except when using CUDA Dynamic Parallelism https://developer.nvidia.com/blog/cuda-dynamic-parallelism-api-principles/ Showing individual child kernels from CDP launches is not supported, the complete tree is collapsed into into one kernel when reported in Nsight Compute.

mahmood.nt · September 7, 2020, 2:31pm

Maybe I should use a better word… I meant kernel call orders. For example multiple kernels exists in the different paths in if…else. So, I may see kernel1->kernel4->kernel6->kernel2->…

felix_dt · September 7, 2020, 2:35pm

To see the ordered timeline of API calls and/or kernel in the UI or CLI, it’s best to use Nsight Systems NVIDIA Nsight Systems | NVIDIA Developer

mahmood.nt · October 27, 2020, 3:12pm

@felix_dt
When I work with nsight systems, I can not see the kernel order. As you can see in the picture Pasteboard - Uploaded Image although the timeline shows the kernel calls, however, some of them are shown in parallel maybe because of small time differences.

felix_dt · October 28, 2020, 9:32am

I think the serialized trace view you are looking for is only available on the command line via additional scripts shipped with Nsight Systems (you can collect the report from the UI or the CLI, but the script output will be on the command line).

The script you are looking for should be User Guide :: Nsight Systems Documentation
More info can also be found in User Guide :: Nsight Systems Documentation

mahmood.nt · October 28, 2020, 2:19pm

Right. That is what I am looking for.

Topic		Replies	Views
Call stack is visible/captured only for some CUDA kernels (broken backtraces) Profiling Linux Targets	5	1468	December 29, 2022
Missing kernels in NSight Profiling Nsight Visual Studio Edition	4	2017	October 2, 2015
Kernel profiling missing Nsight Visual Studio Edition	10	4579	April 14, 2017
Crash when profiling with "Kernel Launches and Memory Operations" Nsight Visual Studio Edition	7	3625	February 5, 2015
Nsight Compute not detecting kernel launch Nsight Compute profiling	13	3083	May 6, 2021
User defined Custom CUDA node profiling using Nsight Jetson TX1	16	1902	October 18, 2021
Nsight-Compute returns “No kernels were profiled” warning Nsight Compute	9	1460	July 27, 2023
Dependency analysis in Nsight Profiling Linux Targets	4	888	March 10, 2023
How can I profile both kernel and cuda APIs hardware usage and application total duration Nsight Compute	5	422	March 27, 2024
Kernel call stack Profiling Linux Targets	6	1017	March 21, 2023

Call graph view in Nsight Compute

Related topics