CUPTI Sample tutorial wrong

While Im reading and learning CUPTI Sample Code,I got this question:
The “callback_profiling” sample which is intended to show how to use CUPTI Callback API to profile CUDA Runtime/Driver/Resources API data,while its code has nothing to do with callback API,and after i compiled and run this sample,i actually got this:

Compute Capability of Device: 8.6
Launching kernel: blocks 196, thread/block 256

Range Name Metric Name Metric Value

0 sm__ctas_launched.sum 196

So this is really weird,is this sample code wrongly named?

Also there is a tutorial based on CUPTI Sample code on github:GitHub - eunomia-bpf/cupti-tutorial: Tutorials for NVIDIA CUPTI samples

In callback_profiling section,according to its readme.md,the result after compilation and running is supposed to be this:

=== CUPTI Callback Profiling Results ===

CUDA Runtime API Calls:
  cudaMalloc: 3 calls, total: 145μs, avg: 48.3μs
  cudaMemcpy: 6 calls, total: 2.1ms, avg: 350μs
  cudaLaunchKernel: 100 calls, total: 5.2ms, avg: 52μs
  cudaDeviceSynchronize: 1 call, total: 15.3ms, avg: 15.3ms

CUDA Driver API Calls:
  cuCtxCreate: 1 call, total: 125μs, avg: 125μs
  cuModuleLoad: 1 call, total: 2.3ms, avg: 2.3ms
  cuLaunchKernel: 100 calls, total: 4.8ms, avg: 48μs

Performance Metrics:
  GPU Utilization: 78.5%
  Memory Bandwidth: 245.2 GB/s
  Cache Hit Rate: 92.3%

Total Profiling Overhead: 0.8ms (0.5% of total execution time)

This really confused me since there is little detailed tutorial about cupti in internet,if the github tutorial is wrong or nvidia’s sample code is wrongly named,i think it should be corrected,or this will continously confusing other new learners of cupti.

Thanks for your help,and looking for your reply!

The callback_profiling sample utilizes both the CUPTI Callback APIs and the legacy CUPTI Profiler APIs. The term "profiling” used in this context, refers to analysing kernel workloads using performance metrics.

You’re correct that the sample output shown in the GitHub repository for callback_profiling is inaccurate.

Please note: This repository is not officially maintained by NVIDIA.

For an example of using the CUPTI Callback APIs to capture timestamps for various runtime API calls, please refer to the CUPTI callback_timestamp sample.