Run-time GPU profiling

Hi,

I am using nvprof to profile the GPU workload. However, I am wondering if there is a way to monitor GPU workload on run time in more detail.
For example, I run two vision applications on Tegra A and want to run a separate monitoring task is launched to capture traces on run time. The capture data provides how many # of SM, # of blocks, and # of thread are being used by each kernel. I am thinking to use Cupti APIs but it would be nice if there is a way to get the run-time profile information without modifying the application codes.

Thanks,

Dear ibaek,
Please use tegra system profiler to get timeline view of kernel launch calls, kernel overlapping and other kernel information

Hi,

Thank you for your opinion. Yes tegra system profiler is a awesome tool. However, the tegra system profiler is also collecting data and show the result after offline. I am looking for a methodology for both tegra-based board and x86-based GPU to get the GPU trace information on run-time. For example, two vision applications are processing a image simultaneously using a GPU device, and a separate c/c++ monitoring module/task can capture and show statistics for the GPU executions in real time. It doesn’t have to provide a visualization. I need to know what APIs need to be called to capture and what the data structure the captured data have. Is Cupti the only way?

Thanks,

Deae ibaek,
Yes. You can try cupti. Also, you can use tegrastats to monitor gpu utilization like nvidia-smi