While Im reading and learning CUPTI Sample Code,I got this question:
The “callback_profiling” sample which is intended to show how to use CUPTI Callback API to profile CUDA Runtime/Driver/Resources API data,while its code has nothing to do with callback API,and after i compiled and run this sample,i actually got this:
Compute Capability of Device: 8.6
Launching kernel: blocks 196, thread/block 256Range Name Metric Name Metric Value
0 sm__ctas_launched.sum 196
So this is really weird,is this sample code wrongly named?
Also there is a tutorial based on CUPTI Sample code on github:GitHub - eunomia-bpf/cupti-tutorial: Tutorials for NVIDIA CUPTI samples
In callback_profiling section,according to its readme.md,the result after compilation and running is supposed to be this:
=== CUPTI Callback Profiling Results ===
CUDA Runtime API Calls:
cudaMalloc: 3 calls, total: 145μs, avg: 48.3μs
cudaMemcpy: 6 calls, total: 2.1ms, avg: 350μs
cudaLaunchKernel: 100 calls, total: 5.2ms, avg: 52μs
cudaDeviceSynchronize: 1 call, total: 15.3ms, avg: 15.3ms
CUDA Driver API Calls:
cuCtxCreate: 1 call, total: 125μs, avg: 125μs
cuModuleLoad: 1 call, total: 2.3ms, avg: 2.3ms
cuLaunchKernel: 100 calls, total: 4.8ms, avg: 48μs
Performance Metrics:
GPU Utilization: 78.5%
Memory Bandwidth: 245.2 GB/s
Cache Hit Rate: 92.3%
Total Profiling Overhead: 0.8ms (0.5% of total execution time)
This really confused me since there is little detailed tutorial about cupti in internet,if the github tutorial is wrong or nvidia’s sample code is wrongly named,i think it should be corrected,or this will continously confusing other new learners of cupti.
Thanks for your help,and looking for your reply!