CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.7 is now available

CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.7 is now available for download in the NVIDIA Registered Developer Program.

The NVIDIA® CUDA Profiling Tools Interface (CUPTI) is a dynamic library that enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides a set of APIs targeted at ISVs creating profilers and other performance optimization tools:

  • the Activity API,
  • the Callback API,
  • the Event API,
  • the Metric API,
  • the Profiler API,
  • the PC Sampling API, and
  • the Checkpoint API

Using these CUPTI APIs, independent software developers can create profiling tools that provide low and deterministic profiling overhead on the target system, while giving insight into the CPU and GPU behavior of CUDA applications.

CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.7 includes these improvements and updates:

New Features

    CUPTI has made the following changes as part of the CUDA Toolkit 11.7 release:
    • A new activity kind CUPTI_ACTIVITY_KIND_GRAPH_TRACE and activity record CUpti_ActivityGraphTrace are introduced to represent the execution for a graph without giving visibility about the execution of its nodes. This is intended to reduce overheads involved in tracing each node separately. This activity can only be enabled for drivers of version 515 and above.
    • A new API cuptiActivityEnableAndDump is added to provide snapshot of certain activities like device, context, stream, NVLINK and PCIE at any point during the profiling session.
    • Added sample cupti_correlation to show correlation between CUDA APIs and corresponding GPU activities.
    • Added sample cupti_trace_injection to show how to build an injection library using the activity and callback APIs which can be used to trace any CUDA application.

Resolved Issues

    CUPTI has made the following fixes as part of the CUDA Toolkit 11.7 release:
    • Fixed corruption in the function name for PC Sampling API records.
    • Fixed incorrect timestamps for GPU activities when user calls the API cuptiActivityRegisterTimestampCallback in the late CUPTI attach scenario.
    • Fixed incomplete records for device to device memcopies in the late CUPTI attach scenario. This issue manifests mainly when application has a mix of CUDA graph and normal kernel launches.

Requirements

For more information on CUPTI for CUDA Toolkit 11.7 , including features, requirements, documentation and support, please visit the CUPTI Overview page

To download this version, get it as part of the CUDA Toolkit CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.7 .

Latest PRODUCT INFO