Understanding Profiling gpu-trace output of Inference - TFTRT

im.anupk83 · January 29, 2019, 10:19am

Hi, I am using TFTRT/example/Image classification application from GIT. Running inference for Resnet50 2 iterations 1 warm up and batch 4 on Imagenet. I am trying to profile using nvprof and load timeline trace data into a visualizer to understand the gpu execution and when inference happens.
The o/p shows that there are several kernel calls(7~8Sec) in the beginning (what exactly is happening here?) and at last it seems like inference happens since i see 2 instances of similar kernel calls in the end of the run. Can some one elaborate what is happening in the beginning for 7~8sec?

NVES · January 29, 2019, 5:10pm

hello,

For TensorRT, each layer may launch one or more kernels to perform its operations. The exact kernels launched depends on the optimized network and the hardware present. Depending on the choices of the builder, there may be many additional operations that reorder data interspersed with layer computations. Some reformat operations may be implemented as device-to-device memory copies, others with custom kernels.

So using nvprof to decode the kernel names back to layers in the original network can be complicated. When interpreting results from the profiler, it is recommended to start with the IProfiler interface to get per-layer timing information before using nvprof to get per-kernel timing information.

One way to limit the scope of nvprof is to:
First phase
Structure the application to build and then serialize the engines in one phase.
Second phase
Load the serialized engines and run inference in a second phase.
Third phase
Run nvprof on this second phase only.

im.anupk83 · January 30, 2019, 3:42am

Hi NVES, Thanks for the reply, So basically at a high level is my approximation that inference is happening a the end is wrong?
I have seen the above IProfiler thing in the nvprof documentation. But i cannot imagine nor could find documentation on Iprofiler to add it to the TFTRT Example codes which are basically TF codes. Can you help me point out how to implement this Iprofiler and where in the TFTRT code.

Thanks in advance.

NVES · January 30, 2019, 3:58pm

I think for TFTRT, you can use tensorboard or tensorflow timeline?

Topic		Replies	Views
How to check layer precision? TensorRT	4	3297	September 1, 2022
Profiling TensorRT network on TX2 - nvprof vs. IProfiler Jetson TX2	3	1504	October 18, 2021
Low Compute utilization of converted TensorFlow model during inference Jetson TX2	19	1695	October 18, 2021
Performance discrepancy using TensorRT engines TensorRT tensorrt	3	659	October 5, 2021
How to profile only a portion of TensorRT application with Nsight Systems? TensorRT tensorrt , nsight	3	1717	October 12, 2021
Slow inference UNet Industrial TF-TRT TensorRT tensorrt , tensorflow	1	458	July 2, 2023
Why my inference time is so long when using trtexec - FP16? Jetson TX2 jetson-inference	4	1956	October 18, 2021
TensorRT frames processing speed increases with increase in number of frames Jetson Nano tensorrt	4	818	October 15, 2021
[TAO] use trt of tao on tensorrt , process infer happened repeated calls TAO Toolkit tensorrt , tao	5	699	November 7, 2022
Tlt-infer is slow TAO Toolkit	13	830	October 12, 2021

Understanding Profiling gpu-trace output of Inference - TFTRT

Related topics