How to use CUPTI to get average instruction execution time?

bz1 · March 14, 2018, 5:16am

I would like to get the average instruction execution time. I think I need to use CUPTI to do this (if it is even possible).

I compiled and ran 4 of the cupti examples (callback_metric, callback_timestamp, pc_sampling, sass_source_map)

I also read through the CUPTI.pdf and I looked through the cupti.h, cupti_events.h, cupti_metrics.h.

The sass_source_map came closest to what I needed. I was able to correlate the SASS instructions
(using nvdisasm) back to the source code (I happened to need that). I can now see the number
of times that each instruction is executed … but I also need the average duration too.

Any ideas how to do this?

–Bob

Device Name: TITAN V
SOURCE_LOCATOR SrcLctrId 2, File C:/Projects/cupti_sass2src/cupti_sass2src/kernel.cu Line 1
FUCTION functionId 1, moduleId 9, name _Z9transposePfPKf
INSTRUCTION_EXECUTION srcLctr 2, corr 202, functionId 1, pc 0
notPredOffthread_inst_executed 0, thread_inst_executed 15872, inst_executed 496

INSTRUCTION_EXECUTION srcLctr 2, corr 202, functionId 1, pc 10
notPredOffthread_inst_executed 15872, thread_inst_executed 15872, inst_executed 496

SOURCE_LOCATOR SrcLctrId 3, File C:/Projects/cupti_sass2src/cupti_sass2src/kernel.cu Line 14
INSTRUCTION_EXECUTION srcLctr 3, corr 202, functionId 1, pc 20
notPredOffthread_inst_executed 15872, thread_inst_executed 15872, inst_executed 496

INSTRUCTION_EXECUTION srcLctr 3, corr 202, functionId 1, pc 30
notPredOffthread_inst_executed 15872, thread_inst_executed 15872, inst_executed 496

SOURCE_LOCATOR SrcLctrId 4, File C:/Projects/cupti_sass2src/cupti_sass2src/kernel.cu Line 15
INSTRUCTION_EXECUTION srcLctr 4, corr 202, functionId 1, pc 40
notPredOffthread_inst_executed 15872, thread_inst_executed 15872, inst_executed 496

BulatZiganshin · March 14, 2018, 9:52pm

what is the execution time? latency or throughput?

bz1 · March 15, 2018, 1:13am

Well, I would have accepted the average number of clock cycles to execute the instruction.
I assume that would include any latency.

Sanjiv.Satoor · March 15, 2018, 4:53pm

I would like to get the average instruction execution time.
Are you looking for the average per instruction or the average for a kernel across all instructions?

bz1 · March 15, 2018, 9:43pm

The average per instruction.

Do you have an idea that would get me the data I want?

bz1 · March 16, 2018, 1:40am

Hello … NVidia … Could someone please answer my question?

Sanjiv.Satoor · March 16, 2018, 8:12am

We do not support any metric for average execution time per instruction.

But you can look at the PC sampling feature which gives the number of samples for each instruction with various stall reasons. Using this information you can pinpoint portions of your kernel that are introducing latencies and the reason for the latency.

This is supported on GPU devices with compute capability 5.2 and higher (excluding mobile devices).

For CUPTI refer [url]CUPTI :: CUDA Toolkit Documentation or for Visual Profiler refer [url]Profiler :: CUDA Toolkit Documentation

bz1 · March 20, 2018, 6:03pm

I’ve already explored the callback_metric, callback_timestamp, sass_source_map and pc_sampling examples.
I wish people would stop trying to predict what I want to do with the data. I’m not interested in using the pc_sampling data to identify reasons for latency. Is there any sort of surrogate for the average execution time per instruction using CUPTI? I realize there is no direct metric for what I am looking for. I was hoping CUPTI would help me derive it indirectly (if need be).

Topic		Replies	Views
Can We use CUPTI for Run-Time Analysis of Cuda Applications for GPU Metrics CUPTI – CUDA Profiler Tools Interface	4	1007	January 15, 2024
How to use CUPTI to get metrics for the Device Attributes CUPTI – CUDA Profiler Tools Interface cuda , samples , profiling	2	1073	December 13, 2022
Get launch kernel response time by CUPTI CUPTI – CUDA Profiler Tools Interface	7	1423	May 9, 2023
Where can i find detail information of all the metrics and concept in the Nsight Compute? CUDA Programming and Performance	2	417	August 31, 2022
How to get the exec. time inner the kernel function? Nsight Compute cuda , kernel , profiling	6	1083	February 27, 2023
Is there any way to profile instruction count of cuda program? CUDA Programming and Performance	0	231	February 17, 2024
different results with cupti and nvprof. CUPTI – CUDA Profiler Tools Interface	2	871	March 31, 2020
CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 12.3 is now available CUPTI – CUDA Profiler Tools Interface	7	1097	March 5, 2024
PTX instruction statistics collector CUDA Programming and Performance	0	523	August 20, 2013
how to interpret profiler "instruction" and "timestamp" columns? CUDA Programming and Performance	0	747	April 23, 2009

How to use CUPTI to get average instruction execution time?

Related topics