CUPTI Tracing CUDA Graph Problems

Im currently developing a profiler based on CUPTI Callback API and CUPTI RangeProfiling API,my tensorrt engine file include a cuda graph,and are there any ways to track engine‘s cuda graph data using CUPTI Callback API and CUPTI RangeProfiling API?

Hi,
CUPTI does support graph profiling, though with a few limitations:

  • Only kernel nodes are profiled.
  • Conditional nodes within the graph are skipped.
  • Nodes that launch device graphs are not supported.
  • Graph profiling is restricted to kernel replay mode; application replay and user replay modes are not supported.
  • To enable graph node profiling, CUPTI breaks the graph chaining.

If you’re comfortable with aggregated profiling data for the entire graph, you can use user range mode. In this mode, you wrap the graph launch APIs with cuptiRangeProfilerPush() at the entry callback and cuptiRangeProfilerPop() at the exit callback.

Please note that Range Profiling APIs only report kernel-level profiling data. If you need additional graph-related metrics, consider using the CUPTI Activity APIs.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.