There is an nvidia researcher whose cupti code I was hacking on. You can find it here.
It works with 2 metrics with CUDA 10 and dd 416.34 under Win7/64.
However, when you use any of these metrics it fails and says
Metric value retrieval failed for metric warp_execution_efficiency. (for example).
I was using the CUPTI callback_metric example to guide me. If you use any of the above metrics with it,
the sample app works.
I eventually discovered that the new sample code makes reference to CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000
which is something that didnt exist 4-5 years ago when the researcher created his tool.
It would be great to get an example that is up to date for all the metrics.