Hi, after I using TensorRT to inference, I get the time output from TensorRT.
Conv_0 + Relu_1: 0.08096ms
MaxPool_2: 0.017728ms
Conv_3 + Relu_4: 0.019552ms
Conv_5 + Relu_6: 0.050528ms
Conv_7: 0.0408ms
Conv_8 + Add_9 + Relu_10: 0.046816ms
Conv_11 + Relu_12: 0.043648ms
Conv_13 + Relu_14: 0.049952ms
Conv_15 + Add_16 + Relu_17: 0.046208ms
Conv_18 + Relu_19: 0.043552ms
Conv_20 + Relu_21: 0.05072ms
Conv_22 + Add_23 + Relu_24: 0.046048ms
Conv_22 + Add_23 + Relu_24 output reformatter 0: 0.030784ms
Conv_25 + Relu_26: 0.063904ms
Conv_27 + Relu_28: 0.073728ms
Conv_29: 0.036672ms
Conv_30 + Add_31 + Relu_32: 0.069056ms
Conv_33 + Relu_34: 0.038528ms
Conv_35 + Relu_36: 0.0736ms
Conv_37 + Add_38 + Relu_39: 0.042816ms
Conv_40 + Relu_41: 0.038368ms
Conv_42 + Relu_43: 0.073536ms
Conv_44 + Add_45 + Relu_46: 0.04272ms
Conv_47 + Relu_48: 0.038208ms
Conv_49 + Relu_50: 0.074272ms
Conv_51 + Add_52 + Relu_53: 0.04304ms
Conv_54 + Relu_55: 0.061504ms
Conv_56 + Relu_57: 0.134048ms
Conv_58: 0.061344ms
Conv_59 + Add_60 + Relu_61: 0.108416ms
Conv_59 + Add_60 + Relu_61 output reformatter 0: 0.011712ms
Conv_62 + Relu_63: 0.05712ms
Conv_64 + Relu_65: 0.076192ms
Conv_66 + Add_67 + Relu_68: 0.060896ms
Conv_69 + Relu_70: 0.056576ms
Conv_71 + Relu_72: 0.076096ms
Conv_73 + Add_74 + Relu_75: 0.061312ms
Conv_76 + Relu_77: 0.057568ms
Conv_78 + Relu_79: 0.076704ms
Conv_80 + Add_81 + Relu_82: 0.061728ms
Conv_83 + Relu_84: 0.056448ms
Conv_85 + Relu_86: 0.076192ms
Conv_87 + Add_88 + Relu_89: 0.06032ms
Conv_90 + Relu_91: 0.057344ms
Conv_92 + Relu_93: 0.076192ms
Conv_94 + Add_95 + Relu_96: 0.061024ms
Conv_97 + Relu_98: 0.093664ms
Conv_99 + Relu_100 input reformatter 0: 0.00848ms
Conv_99 + Relu_100: 0.256128ms
Conv_101 input reformatter 0: 0.006944ms
Conv_101: 0.057824ms
Conv_102 + Add_103 + Relu_104: 0.208928ms
Conv_105 + Relu_106: 0.095936ms
Conv_107 + Relu_108: 0.103552ms
Conv_109 + Add_110 + Relu_111: 0.10384ms
Conv_112 + Relu_113: 0.09488ms
Conv_114 + Relu_115: 0.1032ms
Conv_116 + Add_117 + Relu_118: 0.10224ms
GlobalAveragePool_119: 0.008736ms
Gemm_121: 0.036608ms
The output is from below code snippet:
def do_inference(engine, pics_1, h_input_1, d_input_1, h_output, d_output, stream, batch_size):
"""
This is the function to run the inference
Args:
engine : Path to the TensorRT engine
pics_1 : Input images to the model.
h_input_1 : Input in the host
d_input_1 : Input in the device
h_output_1 : Output in the host
d_output_1 : Output in the device
stream : CUDA stream
batch_size : Batch size for execution time
"""
load_images_to_buffer(pics_1, h_input_1)
with engine.create_execution_context() as context:
""" copy data from host to device """
cuda.memcpy_htod_async(d_input_1, h_input_1, stream)
# Run inference.
context.profiler = trt.Profiler()
context.execute(batch_size=1, bindings=[int(d_input_1), int(d_output)])
# Transfer predictions back from the GPU.
cuda.memcpy_dtoh_async(h_output, d_output, stream)
# Synchronize the stream
stream.synchronize()
# Return the host output.
return h_output
output = do_inference(
engine = resnet_trt,
pics_1 = img,
h_input_1 = host_input,
d_input_1 = device_input,
h_output = host_output,
d_output = device_output,
stream = stream,
batch_size= 1,
)
Does the time represent real inference time of every layer? What is different between inference time I got from NVIDIA Nsight Systems and the time above? Which one is correct inference time?
(The whole source code you can get from my github if you need, I use RESNET50 pretrained model from Pytorch and export to ONNX format, finally export to TensorRT and do inference)