I would like to get the average instruction execution time. I think I need to use CUPTI to do this (if it is even possible).
I compiled and ran 4 of the cupti examples (callback_metric, callback_timestamp, pc_sampling, sass_source_map)
I also read through the CUPTI.pdf and I looked through the cupti.h, cupti_events.h, cupti_metrics.h.
The sass_source_map came closest to what I needed. I was able to correlate the SASS instructions
(using nvdisasm) back to the source code (I happened to need that). I can now see the number
of times that each instruction is executed … but I also need the average duration too.
Any ideas how to do this?
–Bob
Device Name: TITAN V
SOURCE_LOCATOR SrcLctrId 2, File C:/Projects/cupti_sass2src/cupti_sass2src/kernel.cu Line 1
FUCTION functionId 1, moduleId 9, name _Z9transposePfPKf
INSTRUCTION_EXECUTION srcLctr 2, corr 202, functionId 1, pc 0
notPredOffthread_inst_executed 0, thread_inst_executed 15872, inst_executed 496
INSTRUCTION_EXECUTION srcLctr 2, corr 202, functionId 1, pc 10
notPredOffthread_inst_executed 15872, thread_inst_executed 15872, inst_executed 496
SOURCE_LOCATOR SrcLctrId 3, File C:/Projects/cupti_sass2src/cupti_sass2src/kernel.cu Line 14
INSTRUCTION_EXECUTION srcLctr 3, corr 202, functionId 1, pc 20
notPredOffthread_inst_executed 15872, thread_inst_executed 15872, inst_executed 496
INSTRUCTION_EXECUTION srcLctr 3, corr 202, functionId 1, pc 30
notPredOffthread_inst_executed 15872, thread_inst_executed 15872, inst_executed 496
SOURCE_LOCATOR SrcLctrId 4, File C:/Projects/cupti_sass2src/cupti_sass2src/kernel.cu Line 15
INSTRUCTION_EXECUTION srcLctr 4, corr 202, functionId 1, pc 40
notPredOffthread_inst_executed 15872, thread_inst_executed 15872, inst_executed 496