CUPTI automatic callbacks


I want to build my own tool to analyze the performance of GPU kernels automatically. This tool should be applied or attached to many applications. Since I am lazy to rewrite the applications’ source code, I ask myself whether there is a runtime solution, like LD_PRELOAD?


What do you mean by analyzing the performance automatically? Do you want to collect the timing information, or hardware performance counters/metrics or both?

You can write a GPU performance analysis tool based on CUPTI interface. User can inject the CUPTI based shared library into the target application using LD_PRELOAD. Please refer to the post CUPTI activity API and child processes - #8 by mjain