How to use NVPROF on code compiled with NVRTC?

bz1 · October 21, 2018, 3:53am

Hi,
I have been using NVPROF to collect all 113 performance counters from my kernels
that run on a TitanV. I was never able to get CUPTI to give me all the counters the way NVPROF does.

Now I am using NVRTC (with JITIFY) to compile my custom kernels on the fly. How can I get NVPROF
to give me the same 113 performance counters for this NVRTC/JITIFY case?
–Bob

rbischof · October 26, 2018, 12:21pm

Sorry you are having trouble with this.
nvprof works well with nvrtc kernels on with CUDA 10.0 nvprof here.
What toolkit are you using?
Is it possible for you to provide a minimal reproducer?

Also, it is possible for CUPTI to collect all counters. Unfortunately there is not an example in the sample code provided in the toolkit. We’ll get you a some example code soon.

bz1 · October 26, 2018, 5:32pm

I was using CUDA 9.x under Win7/64.
I will switch to CUDA 10 today.
It would be magnificent if CUPTI could retrieve all of the counters that NVPROF returns.

bz1 · October 29, 2018, 1:02am

There is an nvidia researcher whose cupti code I was hacking on. You can find it here.

It works with 2 metrics with CUDA 10 and dd 416.34 under Win7/64.

"inst_per_warp",
"branch_efficiency",

However, when you use any of these metrics it fails and says
“warp_execution_efficiency”,
“warp_nonpred_execution_efficiency”,
“inst_replay_overhead”,

Metric value retrieval failed for metric warp_execution_efficiency. (for example).

I was using the CUPTI callback_metric example to guide me. If you use any of the above metrics with it,
the sample app works.

I eventually discovered that the new sample code makes reference to CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000
which is something that didnt exist 4-5 years ago when the researcher created his tool.

It would be great to get an example that is up to date for all the metrics.

bz1 · October 30, 2018, 5:08pm

rbischof – were you able to hack out the example code for all the counters?

Sanjiv.Satoor · November 1, 2018, 7:03am

There is no direct API to query supported events from CUPTI. Following steps can be used for the same:

CUptiResult cuptiDeviceGetNumEventDomains ( CUdevice device, uint32_t* numDomains ): Get the number of domains for a device.
CUptiResult cuptiDeviceEnumEventDomains ( CUdevice device, size_t* arraySizeBytes, CUpti_EventDomainID* domainArray ): Get the event domains for a device.
CUptiResult cuptiEventDomainGetNumEvents ( CUpti_EventDomainID eventDomain, uint32_t* numEvents ): Get number of events in a domain.
CUptiResult cuptiEventDomainEnumEvents ( CUpti_EventDomainID eventDomain, size_t* arraySizeBytes, CUpti_EventID* eventArray ): Get the events in a domain.

Refer the CUPTI document: https://docs.nvidia.com/cuda/cupti/group__CUPTI__EVENT__API.html

Let us know if you need any additional information.

bz1 · November 1, 2018, 7:29am

ssatoor,
Thanks, but I was well aware of the API you described. My responses showed how I was
hacking away at the example code. The callback must be significantly more complex when
there is more than 1 event or metric being monitored. On 10/26, rbischof mentioned that he would
send an example soon. I guess he gave up?
–bz

rbischof · November 1, 2018, 11:26pm

Sorry for the delay (which continues unfortunately). We haven’t forgotten about you and working to get this sample code to you.

bz1 · November 2, 2018, 3:08am

Cool. Thanks for the feedback.

Sanjiv.Satoor · November 16, 2018, 8:45am

We have updated the CUPTI sample code on github : https://github.com/srvm/cupti_profiler.
It includes fixes for the issues reported on this post.
It also includes query and collection of all the supported metrics.

To profile specific metric you can comment the following line in examples/demo.cu:
#define PROFILE_ALL_EVENTS_METRICS 1

Thanks for your feedback.

Topic		Replies	Views
What granularity can I obtain via nvidia profiler Visual Profiler and nvprof	1	2632	July 23, 2013
Command line profiler and metrics CUDA Programming and Performance	0	714	January 30, 2012
NVTX Domain Events in CUPTI, CUDA 5.0 CUDA Programming and Performance	2	1021	March 11, 2013
Strange difference between CUPTI results and nvprof CUPTI – CUDA Profiler Tools Interface	3	1226	December 8, 2019
API can measure or query values of performance counters CUDA Programming and Performance	5	1546	August 1, 2017
NVIDIA® CUDA Profiler Tools Interface (CUPTI) 2019.1 is now available CUPTI – CUDA Profiler Tools Interface	0	1929	March 1, 2019
Loss of CUPTI counter cookbook CUDA Programming and Performance	4	1262	September 12, 2013
CUPTI counter overflow in nvprof [BUG] Visual Profiler and nvprof	1	1826	April 23, 2013
How to measure all available metrics/events in one command line with nvprof Visual Profiler and nvprof	1	6265	October 24, 2013
GPU performance counters CUDA Programming and Performance	6	1986	April 3, 2013

How to use NVPROF on code compiled with NVRTC?

Related topics