What is meaning of time output from TensorRT?

chilin.cs07 · August 25, 2021, 8:35am

Hi, after I using TensorRT to inference, I get the time output from TensorRT.

Conv_0 + Relu_1: 0.08096ms
MaxPool_2: 0.017728ms
Conv_3 + Relu_4: 0.019552ms
Conv_5 + Relu_6: 0.050528ms
Conv_7: 0.0408ms
Conv_8 + Add_9 + Relu_10: 0.046816ms
Conv_11 + Relu_12: 0.043648ms
Conv_13 + Relu_14: 0.049952ms
Conv_15 + Add_16 + Relu_17: 0.046208ms
Conv_18 + Relu_19: 0.043552ms
Conv_20 + Relu_21: 0.05072ms
Conv_22 + Add_23 + Relu_24: 0.046048ms
Conv_22 + Add_23 + Relu_24 output reformatter 0: 0.030784ms
Conv_25 + Relu_26: 0.063904ms
Conv_27 + Relu_28: 0.073728ms
Conv_29: 0.036672ms
Conv_30 + Add_31 + Relu_32: 0.069056ms
Conv_33 + Relu_34: 0.038528ms
Conv_35 + Relu_36: 0.0736ms
Conv_37 + Add_38 + Relu_39: 0.042816ms
Conv_40 + Relu_41: 0.038368ms
Conv_42 + Relu_43: 0.073536ms
Conv_44 + Add_45 + Relu_46: 0.04272ms
Conv_47 + Relu_48: 0.038208ms
Conv_49 + Relu_50: 0.074272ms
Conv_51 + Add_52 + Relu_53: 0.04304ms
Conv_54 + Relu_55: 0.061504ms
Conv_56 + Relu_57: 0.134048ms
Conv_58: 0.061344ms
Conv_59 + Add_60 + Relu_61: 0.108416ms
Conv_59 + Add_60 + Relu_61 output reformatter 0: 0.011712ms
Conv_62 + Relu_63: 0.05712ms
Conv_64 + Relu_65: 0.076192ms
Conv_66 + Add_67 + Relu_68: 0.060896ms
Conv_69 + Relu_70: 0.056576ms
Conv_71 + Relu_72: 0.076096ms
Conv_73 + Add_74 + Relu_75: 0.061312ms
Conv_76 + Relu_77: 0.057568ms
Conv_78 + Relu_79: 0.076704ms
Conv_80 + Add_81 + Relu_82: 0.061728ms
Conv_83 + Relu_84: 0.056448ms
Conv_85 + Relu_86: 0.076192ms
Conv_87 + Add_88 + Relu_89: 0.06032ms
Conv_90 + Relu_91: 0.057344ms
Conv_92 + Relu_93: 0.076192ms
Conv_94 + Add_95 + Relu_96: 0.061024ms
Conv_97 + Relu_98: 0.093664ms
Conv_99 + Relu_100 input reformatter 0: 0.00848ms
Conv_99 + Relu_100: 0.256128ms
Conv_101 input reformatter 0: 0.006944ms
Conv_101: 0.057824ms
Conv_102 + Add_103 + Relu_104: 0.208928ms
Conv_105 + Relu_106: 0.095936ms
Conv_107 + Relu_108: 0.103552ms
Conv_109 + Add_110 + Relu_111: 0.10384ms
Conv_112 + Relu_113: 0.09488ms
Conv_114 + Relu_115: 0.1032ms
Conv_116 + Add_117 + Relu_118: 0.10224ms
GlobalAveragePool_119: 0.008736ms
Gemm_121: 0.036608ms

The output is from below code snippet:

def do_inference(engine, pics_1, h_input_1, d_input_1, h_output, d_output, stream, batch_size):
   """
   This is the function to run the inference
   Args:
      engine      : Path to the TensorRT engine 
      pics_1      : Input images to the model.  
      h_input_1   : Input in the host         
      d_input_1   : Input in the device 
      h_output_1  : Output in the host 
      d_output_1  : Output in the device 
      stream      : CUDA stream
      batch_size  : Batch size for execution time
   """

   load_images_to_buffer(pics_1, h_input_1)

   with engine.create_execution_context() as context:

      """ copy data from host to device """
      cuda.memcpy_htod_async(d_input_1, h_input_1, stream)

      # Run inference.
      context.profiler = trt.Profiler()
      context.execute(batch_size=1, bindings=[int(d_input_1), int(d_output)])

      # Transfer predictions back from the GPU.
      cuda.memcpy_dtoh_async(h_output, d_output, stream)

      # Synchronize the stream
      stream.synchronize()
      
      # Return the host output.
      
      return h_output

output = do_inference(
    engine    = resnet_trt,
    pics_1    = img,
    h_input_1 = host_input,
    d_input_1 = device_input,
    h_output  = host_output,
    d_output  = device_output,
    stream    = stream,
    batch_size= 1,
)

Does the time represent real inference time of every layer? What is different between inference time I got from NVIDIA Nsight Systems and the time above? Which one is correct inference time?

(The whole source code you can get from my github if you need, I use RESNET50 pretrained model from Pytorch and export to ONNX format, finally export to TensorRT and do inference)

NVES · August 25, 2021, 11:17am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Topic		Replies	Views
How to show every layer inference time in tensorRT? TensorRT	1	919	March 27, 2021
Is it possible to know how much time each layer takes on TensorRT? TensorRT	3	1031	April 27, 2022
Request for trtexec ouput explaination Jetson AGX Xavier tensorrt	5	1922	August 11, 2021
TensorRT inference Time TensorRT	1	817	September 20, 2018
Print average model inference time using tensorrt executable TensorRT tensorrt	2	480	November 7, 2022
Why my inference time is so long when using trtexec - FP16? Jetson TX2 jetson-inference	4	2105	October 18, 2021
TensorRT Latency measurement Jetson TX1	2	949	October 18, 2021
The first inference using tensorRT model takes far longer time than that using tensorflow model TensorRT	0	701	November 13, 2020
TensorRT waiting after inference seemingly for no reason TensorRT tensorrt , cuda , performance , python	12	1715	October 20, 2022
Inference time speed up? TensorRT	1	514	August 3, 2021

What is meaning of time output from TensorRT?

check_model.py

Related topics