Hi everyone,
I’m working with the CUPTI (CUDA Profiling Tools Interface) and I have encountered a challenge regarding kernel profiling. In the Activity API, I can retrieve detailed information about profiled kernels, including their names and start/end timestamps. However, when I use the Profiling API, the kernels are identified by unique IDs instead of names.
My goal is to correlate the performance counters obtained from the Profiling API with the kernel names from the Activity API.
Is there a recommended way to combine these two pieces of information? Specifically, can I reliably map the unique kernel IDs from the Profiling API back to the names provided by the Activity API? Any insights or examples on how to achieve this would be greatly appreciated!
Thanks in advance for your help!
As CUPTI profiling works at context level, so for a multi-ctx application, when we enable profiling for a single context it will only profile the kernels which are launched in that context. For e.g.
kernelA<<>>() <- ctx1
kernelB<<>>() <- ctx1
kernelC<<>>() <- ctx2
kernelD<<>>() <- ctx1
when profiling is enabled for ctx1, we get profiling data for kernelA (range index 0), kernelB (range index 1), kernelC (range index2).
For correlating the range data to kernel, you can use the approach mentioned in callback_profiling sample (refer ProfilingCallbackHandler function) shipped in the CUPTI package, where we use CUPTI callback APIs for getting the kernel launch sequence and the kernel name and on which ctx the kernel is launched. Then maintain a table which will have the kernel launch sequence and finally you can map the range index to kernel launch.
Thank you for your response. I am currently trying to combine the Activity API with the Profiling API to collect both activity data and low-level performance counters. I am following the pattern in the callback_profiling
sample by enabling profiling for the contexts when they are created using the Callback API. Additionally, I enable some activity collection, such as concurrent kernel collection, by calling cuptiActivityEnable(CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL)
.
The problem is that after I call cuptiProfilerEnableProfiling(&enableProfilingParams)
, the Activity API seems to stop working. I tried calling cuptiActivityFlushAll
to retrieve the records in the buffer, but nothing happens (The registered callback function is not invoked). However, if I don’t enable profiling with cuptiProfilerEnableProfiling(&enableProfilingParams)
, the activity buffer works as expected.
I am wondering if the Activity API and Profiling API are not designed to work together, or if there might be a bug in my code. Thanks in advance!