CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.7 is now available for download in the NVIDIA Registered Developer Program.
The NVIDIA® CUDA Profiling Tools Interface (CUPTI) is a dynamic library that enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides a set of APIs targeted at ISVs creating profilers and other performance optimization tools:
- the Activity API,
- the Callback API,
- the Event API,
- the Metric API,
- the Profiler API,
- the PC Sampling API, and
- the Checkpoint API
Using these CUPTI APIs, independent software developers can create profiling tools that provide low and deterministic profiling overhead on the target system, while giving insight into the CPU and GPU behavior of CUDA applications.
CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.7 includes these improvements and updates:
CUPTI has made the following changes as part of the CUDA Toolkit 11.7 release:
- A new activity kind
CUPTI_ACTIVITY_KIND_GRAPH_TRACEand activity record
CUpti_ActivityGraphTraceare introduced to represent the execution for a graph without giving visibility about the execution of its nodes. This is intended to reduce overheads involved in tracing each node separately. This activity can only be enabled for drivers of version 515 and above.
- A new API
cuptiActivityEnableAndDumpis added to provide snapshot of certain activities like device, context, stream, NVLINK and PCIE at any point during the profiling session.
- Added sample cupti_correlation to show correlation between CUDA APIs and corresponding GPU activities.
- Added sample cupti_trace_injection to show how to build an injection library using the activity and callback APIs which can be used to trace any CUDA application.
CUPTI has made the following fixes as part of the CUDA Toolkit 11.7 release:
- Fixed corruption in the function name for PC Sampling API records.
- Fixed incorrect timestamps for GPU activities when user calls the API
cuptiActivityRegisterTimestampCallbackin the late CUPTI attach scenario.
- Fixed incomplete records for device to device memcopies in the late CUPTI attach scenario. This issue manifests mainly when application has a mix of CUDA graph and normal kernel launches.
For more information on CUPTI for CUDA Toolkit 11.7 , including features, requirements, documentation and support, please visit the CUPTI Overview page
To download this version, get it as part of the CUDA Toolkit CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.7 .