I understand that with the newer Turing cards (e.g. Tesla T4, RTX 2060/70/80), the Event/Metric APIs are being deprecated in favor of the Profiling API. As a result, the
event_sampling CUPTI sample program no longer works; if you attempt to run it you will get an error message like this:
$ ./event_sampling ... event_sampling.cu:86:Error CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED for CUPTI API function 'cuptiSetEventCollectionMode'.
From what I can see, there are two new CUPTI sample programs that show how to do Range based profiling (
autorange_profiling). However, Range profiling is not the same use-case as what
event_sampling shows. In particular, event_sampling show cases the following features I’m interested in that I don’t believe the Profiling API provides:
- Time-periodic sampling: Time-periodic (e.g. every 50 milliseconds, NOT after each kernel invocation) sampling of GPU hardware counters, with hardware counters aggregating across multiple kernel runs.
No serialization of kernels: Concurrently executing kernels are not serialized during hardware counter sampling. This is provided by the call to
cuptiSetEventCollectionMode(context, CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS), as documented in 1.5. CUPTI Event API
- Transparent to profiled code: Transparent to code being profiled; a separate thread can enable hardware event counter sampling without having to add Begin()/End() annotations to the measured code region like with the Profiling API’s Range profiling feature.
Does there exist equivalent functionality in the new Profiling API that satisfies these above features?
Thanks in advance!