Thank you for your reply
This onnx model (backbone + neck)
yolov5s-6.0-qat.onnx (27.7 MB)
and this onnx model(backbone + neck + 3x detection head(YoloLayer_TRT plugin))
yolov5s-6.0-qat-yolo-op.onnx (27.7 MB)
and this is log when i build trt engine:
root@d741691190a8:/workspace/tensorrt/bin# ./trtexec --onnx=/root/workspace/onnx/yolov5s-6.0-qat-yolo-op.onnx --workspace=10240 --int8 --saveEngine=/root/yolov5s-6.0-qat-int8.engine --plugins=/root/workspace/plugins/YoloLayer_TRT_v6.0/build/libyolo.so &&&& RUNNING TensorRT.trtexec [TensorRT v8003] # ./trtexec --onnx=/root/workspace/onnx/yolov5s-6.0-qat.onnx --workspace=10240 --int8 --saveEngine=/root/yolov5s-6.0-qat-int8.engine --plugins=/root/workspace/plugins/YoloLayer_TRT_v6.0/build/libyolo.so
[11/15/2021-10:37:08] [I] === Model Options ===
[11/15/2021-10:37:08] [I] Format: ONNX
[11/15/2021-10:37:08] [I] Model: /root/workspace/onnx/yolov5s-6.0-qat-yolo-op.onnx
[11/15/2021-10:37:08] [I] Output:
[11/15/2021-10:37:08] [I] === Build Options ===
[11/15/2021-10:37:08] [I] Max batch: explicit
[11/15/2021-10:37:08] [I] Workspace: 10240 MiB
[11/15/2021-10:37:08] [I] minTiming: 1
[11/15/2021-10:37:08] [I] avgTiming: 8
[11/15/2021-10:37:08] [I] Precision: FP32+INT8
[11/15/2021-10:37:08] [I] Calibration: Dynamic
[11/15/2021-10:37:08] [I] Refit: Disabled
[11/15/2021-10:37:08] [I] Sparsity: Disabled
[11/15/2021-10:37:08] [I] Safe mode: Disabled
[11/15/2021-10:37:08] [I] Restricted mode: Disabled
[11/15/2021-10:37:08] [I] Save engine: /root/yolov5s-6.0-qat-int8.engine
[11/15/2021-10:37:08] [I] Load engine:
[11/15/2021-10:37:08] [I] NVTX verbosity: 0
[11/15/2021-10:37:08] [I] Tactic sources: Using default tactic sources
[11/15/2021-10:37:08] [I] timingCacheMode: local
[11/15/2021-10:37:08] [I] timingCacheFile:
[11/15/2021-10:37:08] [I] Input(s)s format: fp32:CHW
[11/15/2021-10:37:08] [I] Output(s)s format: fp32:CHW
[11/15/2021-10:37:08] [I] Input build shapes: model
[11/15/2021-10:37:08] [I] Input calibration shapes: model
[11/15/2021-10:37:08] [I] === System Options ===
[11/15/2021-10:37:08] [I] Device: 0
[11/15/2021-10:37:08] [I] DLACore:
[11/15/2021-10:37:08] [I] Plugins: /root/workspace/plugins/YoloLayer_TRT_v6.0/build/libyolo.so
[11/15/2021-10:37:08] [I] === Inference Options ===
[11/15/2021-10:37:08] [I] Batch: Explicit
[11/15/2021-10:37:08] [I] Input inference shapes: model
[11/15/2021-10:37:08] [I] Iterations: 10
[11/15/2021-10:37:08] [I] Duration: 3s (+ 200ms warm up)
[11/15/2021-10:37:08] [I] Sleep time: 0ms
[11/15/2021-10:37:08] [I] Streams: 1
[11/15/2021-10:37:08] [I] ExposeDMA: Disabled
[11/15/2021-10:37:08] [I] Data transfers: Enabled
[11/15/2021-10:37:08] [I] Spin-wait: Disabled
[11/15/2021-10:37:08] [I] Multithreading: Disabled
[11/15/2021-10:37:08] [I] CUDA Graph: Disabled
[11/15/2021-10:37:08] [I] Separate profiling: Disabled
[11/15/2021-10:37:08] [I] Time Deserialize: Disabled
[11/15/2021-10:37:08] [I] Time Refit: Disabled
[11/15/2021-10:37:08] [I] Skip inference: Disabled
[11/15/2021-10:37:08] [I] Inputs:
[11/15/2021-10:37:08] [I] === Reporting Options ===
[11/15/2021-10:37:08] [I] Verbose: Disabled
[11/15/2021-10:37:08] [I] Averages: 10 inferences
[11/15/2021-10:37:08] [I] Percentile: 99
[11/15/2021-10:37:08] [I] Dump refittable layers:Disabled
[11/15/2021-10:37:08] [I] Dump output: Disabled
[11/15/2021-10:37:08] [I] Profile: Disabled
[11/15/2021-10:37:08] [I] Export timing to JSON file:
[11/15/2021-10:37:08] [I] Export output to JSON file:
[11/15/2021-10:37:08] [I] Export profile to JSON file:
[11/15/2021-10:37:08] [I]
[11/15/2021-10:37:08] [I] === Device Information ===
[11/15/2021-10:37:08] [I] Selected Device: Tesla T4
[11/15/2021-10:37:08] [I] Compute Capability: 7.5
[11/15/2021-10:37:08] [I] SMs: 40
[11/15/2021-10:37:08] [I] Compute Clock Rate: 1.59 GHz
[11/15/2021-10:37:08] [I] Device Global Memory: 15109 MiB
[11/15/2021-10:37:08] [I] Shared Memory per SM: 64 KiB
[11/15/2021-10:37:08] [I] Memory Bus Width: 256 bits (ECC enabled)
[11/15/2021-10:37:08] [I] Memory Clock Rate: 5.001 GHz
[11/15/2021-10:37:08] [I]
[11/15/2021-10:37:08] [I] TensorRT version: 8003
[11/15/2021-10:37:08] [I] Loading supplied plugin library: /root/workspace/plugins/YoloLayer_TRT_v6.0/build/libyolo.so
[11/15/2021-10:37:08] [I] [TRT] [MemUsageChange] Init CUDA: CPU +328, GPU +0, now: CPU 335, GPU 1083 (MiB)
[11/15/2021-10:37:08] [I] Start parsing network model
[11/15/2021-10:37:08] [I] [TRT] ----------------------------------------------------------------
[11/15/2021-10:37:08] [I] [TRT] Input filename: /root/workspace/onnx/yolov5s-6.0-qat-yolo-op.onnx
[11/15/2021-10:37:08] [I] [TRT] ONNX IR version: 0.0.7
[11/15/2021-10:37:08] [I] [TRT] Opset version: 12
[11/15/2021-10:37:08] [I] [TRT] Producer name: pytorch
[11/15/2021-10:37:08] [I] [TRT] Producer version: 1.10
[11/15/2021-10:37:08] [I] [TRT] Domain:
[11/15/2021-10:37:08] [I] [TRT] Model version: 0
[11/15/2021-10:37:08] [I] [TRT] Doc string:
[11/15/2021-10:37:08] [I] [TRT] ----------------------------------------------------------------
[11/15/2021-10:37:09] [I] [TRT] No importer registered for op: YoloLayer_TRT. Attempting to import as plugin.
[11/15/2021-10:37:09] [I] [TRT] Searching for plugin: YoloLayer_TRT, plugin_version: 1, plugin_namespace:
[11/15/2021-10:37:09] [I] [TRT] Successfully created plugin: YoloLayer_TRT
[11/15/2021-10:37:09] [I] Finish parsing network model
[11/15/2021-10:37:09] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 367, GPU 1085 (MiB)
[11/15/2021-10:37:09] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[11/15/2021-10:37:09] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 367 MiB, GPU 1091 MiB
[11/15/2021-10:37:09] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes.
[11/15/2021-10:37:11] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +496, GPU +212, now: CPU 891, GPU 1303 (MiB)
[11/15/2021-10:37:11] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +169, GPU +204, now: CPU 1060, GPU 1507 (MiB)
[11/15/2021-10:37:11] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[11/15/2021-10:38:51] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/15/2021-10:38:53] [I] [TRT] Total Host Persistent Memory: 131200
[11/15/2021-10:38:53] [I] [TRT] Total Device Persistent Memory: 9769472
[11/15/2021-10:38:53] [I] [TRT] Total Scratch Memory: 0
[11/15/2021-10:38:53] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 34 MiB, GPU 4 MiB
[11/15/2021-10:38:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1078, GPU 1529 (MiB)
[11/15/2021-10:38:53] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1078, GPU 1539 (MiB)
[11/15/2021-10:38:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1078, GPU 1523 (MiB)
[11/15/2021-10:38:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1078, GPU 1507 (MiB)
[11/15/2021-10:38:53] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1050 MiB, GPU 1507 MiB
[11/15/2021-10:38:53] [I] [TRT] Loaded engine size: 18 MB
[11/15/2021-10:38:53] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1054 MiB, GPU 1495 MiB
[11/15/2021-10:38:54] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1063, GPU 1515 (MiB)
[11/15/2021-10:38:54] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1063, GPU 1523 (MiB)
[11/15/2021-10:38:54] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1063, GPU 1507 (MiB)
[11/15/2021-10:38:54] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1063 MiB, GPU 1507 MiB
[11/15/2021-10:38:54] [I] Engine built in 105.872 sec.
[11/15/2021-10:38:54] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1013 MiB, GPU 1501 MiB
[11/15/2021-10:38:54] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1013, GPU 1509 (MiB)
[11/15/2021-10:38:54] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1013, GPU 1517 (MiB)
[11/15/2021-10:38:54] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1013 MiB, GPU 1541 MiB
[11/15/2021-10:38:54] [I] Created input binding for inputs.1 with dimensions 1x3x640x640
[11/15/2021-10:38:54] [I] Created output binding for output with dimensions 1x6001x1x1
[11/15/2021-10:38:54] [I] Starting inference
[11/15/2021-10:38:57] [I] Warmup completed 61 queries over 200 ms
[11/15/2021-10:38:57] [I] Timing trace has 1685 queries over 3.00539 s
[11/15/2021-10:38:57] [I]
[11/15/2021-10:38:57] [I] === Trace details ===
[11/15/2021-10:38:57] [I] Trace averages of 10 runs:
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 2.46257 ms - Host latency: 2.88812 ms (end to end 4.68096 ms, enqueue 0.908342 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 2.47009 ms - Host latency: 2.89539 ms (end to end 4.70007 ms, enqueue 0.904158 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 2.45307 ms - Host latency: 2.87979 ms (end to end 4.67335 ms, enqueue 0.906158 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 2.46927 ms - Host latency: 2.89522 ms (end to end 4.68937 ms, enqueue 0.922183 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 2.39604 ms - Host latency: 2.81891 ms (end to end 4.27952 ms, enqueue 0.926205 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 2.47107 ms - Host latency: 2.90086 ms (end to end 4.70127 ms, enqueue 0.913327 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 2.45857 ms - Host latency: 2.88517 ms (end to end 4.67025 ms, enqueue 0.924789 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 2.43994 ms - Host latency: 2.86543 ms (end to end 4.44682 ms, enqueue 0.915842 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75262 ms - Host latency: 2.17934 ms (end to end 3.32601 ms, enqueue 0.896097 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70791 ms - Host latency: 2.13787 ms (end to end 3.18519 ms, enqueue 0.808716 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70157 ms - Host latency: 2.12982 ms (end to end 3.16102 ms, enqueue 0.800952 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72601 ms - Host latency: 2.14966 ms (end to end 3.23792 ms, enqueue 0.754947 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71359 ms - Host latency: 2.14054 ms (end to end 3.18638 ms, enqueue 0.813907 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72821 ms - Host latency: 2.15646 ms (end to end 3.21668 ms, enqueue 0.799905 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71794 ms - Host latency: 2.14929 ms (end to end 3.19819 ms, enqueue 0.804492 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72185 ms - Host latency: 2.15359 ms (end to end 3.19664 ms, enqueue 0.791705 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73903 ms - Host latency: 2.17161 ms (end to end 3.2324 ms, enqueue 0.828564 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7116 ms - Host latency: 2.13887 ms (end to end 3.21057 ms, enqueue 0.761945 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71721 ms - Host latency: 2.1476 ms (end to end 3.21354 ms, enqueue 0.78523 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73155 ms - Host latency: 2.16293 ms (end to end 3.25398 ms, enqueue 0.76814 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72757 ms - Host latency: 2.16324 ms (end to end 3.23115 ms, enqueue 0.80824 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73378 ms - Host latency: 2.16926 ms (end to end 3.22949 ms, enqueue 0.767999 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72326 ms - Host latency: 2.16119 ms (end to end 3.22219 ms, enqueue 0.796423 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70734 ms - Host latency: 2.14218 ms (end to end 3.23209 ms, enqueue 0.808447 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71291 ms - Host latency: 2.14811 ms (end to end 3.26492 ms, enqueue 0.760291 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73419 ms - Host latency: 2.17388 ms (end to end 3.25743 ms, enqueue 0.788818 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74154 ms - Host latency: 2.18034 ms (end to end 3.25358 ms, enqueue 0.785052 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73814 ms - Host latency: 2.18118 ms (end to end 3.24785 ms, enqueue 0.827289 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73383 ms - Host latency: 2.17649 ms (end to end 3.22078 ms, enqueue 0.808478 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73724 ms - Host latency: 2.17607 ms (end to end 3.24731 ms, enqueue 0.791174 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73226 ms - Host latency: 2.17265 ms (end to end 3.22481 ms, enqueue 0.811847 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72568 ms - Host latency: 2.16622 ms (end to end 3.1043 ms, enqueue 0.809778 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72687 ms - Host latency: 2.16203 ms (end to end 3.19713 ms, enqueue 0.794348 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70698 ms - Host latency: 2.15123 ms (end to end 3.18903 ms, enqueue 0.783514 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7009 ms - Host latency: 2.13716 ms (end to end 3.18851 ms, enqueue 0.771063 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71765 ms - Host latency: 2.15759 ms (end to end 3.20422 ms, enqueue 0.764685 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73049 ms - Host latency: 2.17151 ms (end to end 3.2319 ms, enqueue 0.809314 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72497 ms - Host latency: 2.16329 ms (end to end 3.22203 ms, enqueue 0.781958 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74196 ms - Host latency: 2.17874 ms (end to end 3.25093 ms, enqueue 0.782599 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74083 ms - Host latency: 2.17872 ms (end to end 3.25724 ms, enqueue 0.793689 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71788 ms - Host latency: 2.15322 ms (end to end 3.2288 ms, enqueue 0.777338 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7041 ms - Host latency: 2.13721 ms (end to end 3.22368 ms, enqueue 0.776337 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70598 ms - Host latency: 2.13944 ms (end to end 3.2342 ms, enqueue 0.772162 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7194 ms - Host latency: 2.15726 ms (end to end 3.21677 ms, enqueue 0.799792 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71606 ms - Host latency: 2.15634 ms (end to end 3.19388 ms, enqueue 0.781934 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72277 ms - Host latency: 2.15841 ms (end to end 3.213 ms, enqueue 0.791626 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71848 ms - Host latency: 2.15723 ms (end to end 3.20485 ms, enqueue 0.809058 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74012 ms - Host latency: 2.17684 ms (end to end 3.24817 ms, enqueue 0.805762 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73181 ms - Host latency: 2.17139 ms (end to end 3.14062 ms, enqueue 0.802454 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72764 ms - Host latency: 2.16294 ms (end to end 3.19863 ms, enqueue 0.776746 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74413 ms - Host latency: 2.18441 ms (end to end 3.26802 ms, enqueue 0.798413 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70585 ms - Host latency: 2.14016 ms (end to end 3.03787 ms, enqueue 0.78728 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7248 ms - Host latency: 2.1608 ms (end to end 3.22479 ms, enqueue 0.783289 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.721 ms - Host latency: 2.1605 ms (end to end 3.20961 ms, enqueue 0.805579 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71895 ms - Host latency: 2.15479 ms (end to end 3.09076 ms, enqueue 0.762048 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72096 ms - Host latency: 2.15941 ms (end to end 3.213 ms, enqueue 0.812122 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73877 ms - Host latency: 2.17378 ms (end to end 3.25221 ms, enqueue 0.725769 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71537 ms - Host latency: 2.15378 ms (end to end 3.19872 ms, enqueue 0.80658 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73531 ms - Host latency: 2.17061 ms (end to end 3.22424 ms, enqueue 0.776575 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.78342 ms - Host latency: 2.22501 ms (end to end 3.35361 ms, enqueue 0.787805 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.77834 ms - Host latency: 2.21953 ms (end to end 3.325 ms, enqueue 0.817163 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.78391 ms - Host latency: 2.21887 ms (end to end 3.33635 ms, enqueue 0.739929 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75836 ms - Host latency: 2.19734 ms (end to end 3.28196 ms, enqueue 0.795691 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75923 ms - Host latency: 2.19419 ms (end to end 3.30419 ms, enqueue 0.780859 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74238 ms - Host latency: 2.18334 ms (end to end 3.25237 ms, enqueue 0.780078 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7377 ms - Host latency: 2.17584 ms (end to end 3.24358 ms, enqueue 0.813245 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74385 ms - Host latency: 2.18247 ms (end to end 3.25457 ms, enqueue 0.786707 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74905 ms - Host latency: 2.18354 ms (end to end 3.26614 ms, enqueue 0.762488 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75084 ms - Host latency: 2.1896 ms (end to end 3.25221 ms, enqueue 0.788098 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72946 ms - Host latency: 2.1696 ms (end to end 3.23962 ms, enqueue 0.814478 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73198 ms - Host latency: 2.17209 ms (end to end 3.2285 ms, enqueue 0.78291 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73817 ms - Host latency: 2.17472 ms (end to end 3.24117 ms, enqueue 0.782507 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76046 ms - Host latency: 2.19462 ms (end to end 3.30406 ms, enqueue 0.756299 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74453 ms - Host latency: 2.18051 ms (end to end 3.25906 ms, enqueue 0.795117 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75411 ms - Host latency: 2.19076 ms (end to end 3.29264 ms, enqueue 0.782178 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72218 ms - Host latency: 2.15886 ms (end to end 3.22402 ms, enqueue 0.777478 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71416 ms - Host latency: 2.1495 ms (end to end 3.21195 ms, enqueue 0.774658 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74603 ms - Host latency: 2.18376 ms (end to end 3.27072 ms, enqueue 0.774426 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73496 ms - Host latency: 2.17218 ms (end to end 3.23713 ms, enqueue 0.760938 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73494 ms - Host latency: 2.17576 ms (end to end 3.24827 ms, enqueue 0.796875 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73944 ms - Host latency: 2.18206 ms (end to end 3.24739 ms, enqueue 0.818616 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73345 ms - Host latency: 2.17292 ms (end to end 3.22999 ms, enqueue 0.805249 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7668 ms - Host latency: 2.20939 ms (end to end 3.28459 ms, enqueue 0.816943 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7912 ms - Host latency: 2.23387 ms (end to end 3.34916 ms, enqueue 0.79812 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.77559 ms - Host latency: 2.21742 ms (end to end 3.32604 ms, enqueue 0.826233 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.77119 ms - Host latency: 2.21051 ms (end to end 3.30624 ms, enqueue 0.791223 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76114 ms - Host latency: 2.19991 ms (end to end 3.29858 ms, enqueue 0.790479 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75033 ms - Host latency: 2.18738 ms (end to end 3.25836 ms, enqueue 0.800879 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70786 ms - Host latency: 2.14562 ms (end to end 3.21227 ms, enqueue 0.776501 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70946 ms - Host latency: 2.15382 ms (end to end 3.18446 ms, enqueue 0.801501 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72653 ms - Host latency: 2.16436 ms (end to end 3.22648 ms, enqueue 0.77179 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74288 ms - Host latency: 2.18427 ms (end to end 3.24706 ms, enqueue 0.795288 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73682 ms - Host latency: 2.17751 ms (end to end 3.23599 ms, enqueue 0.818542 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72874 ms - Host latency: 2.16743 ms (end to end 3.22871 ms, enqueue 0.787537 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71444 ms - Host latency: 2.15433 ms (end to end 3.18534 ms, enqueue 0.813879 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71544 ms - Host latency: 2.15676 ms (end to end 3.18699 ms, enqueue 0.834766 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72136 ms - Host latency: 2.16114 ms (end to end 3.18674 ms, enqueue 0.788281 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74988 ms - Host latency: 2.18982 ms (end to end 3.26364 ms, enqueue 0.790784 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74419 ms - Host latency: 2.18225 ms (end to end 3.26066 ms, enqueue 0.814709 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71771 ms - Host latency: 2.15283 ms (end to end 3.2293 ms, enqueue 0.768884 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72081 ms - Host latency: 2.15642 ms (end to end 3.24896 ms, enqueue 0.777563 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70541 ms - Host latency: 2.13822 ms (end to end 3.20719 ms, enqueue 0.742102 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70753 ms - Host latency: 2.14305 ms (end to end 3.22402 ms, enqueue 0.76969 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73838 ms - Host latency: 2.17627 ms (end to end 3.26338 ms, enqueue 0.785083 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7406 ms - Host latency: 2.18411 ms (end to end 3.25 ms, enqueue 0.795068 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74661 ms - Host latency: 2.18625 ms (end to end 3.26609 ms, enqueue 0.776318 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7292 ms - Host latency: 2.17002 ms (end to end 3.21924 ms, enqueue 0.830469 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7363 ms - Host latency: 2.17627 ms (end to end 3.23652 ms, enqueue 0.773511 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71931 ms - Host latency: 2.15735 ms (end to end 3.21343 ms, enqueue 0.79646 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73037 ms - Host latency: 2.17222 ms (end to end 3.2408 ms, enqueue 0.787695 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75481 ms - Host latency: 2.1916 ms (end to end 3.28757 ms, enqueue 0.74873 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75125 ms - Host latency: 2.19097 ms (end to end 3.25464 ms, enqueue 0.811353 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76099 ms - Host latency: 2.19775 ms (end to end 3.30613 ms, enqueue 0.753589 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75195 ms - Host latency: 2.1896 ms (end to end 3.28555 ms, enqueue 0.803369 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75168 ms - Host latency: 2.18955 ms (end to end 3.27085 ms, enqueue 0.778052 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73992 ms - Host latency: 2.1772 ms (end to end 3.24905 ms, enqueue 0.794873 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74214 ms - Host latency: 2.18108 ms (end to end 3.25603 ms, enqueue 0.768677 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76194 ms - Host latency: 2.20027 ms (end to end 3.28638 ms, enqueue 0.798462 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75933 ms - Host latency: 2.20205 ms (end to end 3.27732 ms, enqueue 0.800757 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76143 ms - Host latency: 2.19922 ms (end to end 3.2865 ms, enqueue 0.770215 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75527 ms - Host latency: 2.19399 ms (end to end 3.27375 ms, enqueue 0.791772 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73728 ms - Host latency: 2.18071 ms (end to end 3.23457 ms, enqueue 0.800806 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74785 ms - Host latency: 2.18401 ms (end to end 3.26929 ms, enqueue 0.76062 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73442 ms - Host latency: 2.17095 ms (end to end 3.2355 ms, enqueue 0.797681 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70359 ms - Host latency: 2.14182 ms (end to end 3.19155 ms, enqueue 0.771143 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72373 ms - Host latency: 2.16013 ms (end to end 3.22339 ms, enqueue 0.755078 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75598 ms - Host latency: 2.19402 ms (end to end 3.28696 ms, enqueue 0.796362 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.71399 ms - Host latency: 2.14856 ms (end to end 3.22329 ms, enqueue 0.78064 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70833 ms - Host latency: 2.14067 ms (end to end 3.05164 ms, enqueue 0.75398 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75129 ms - Host latency: 2.18652 ms (end to end 3.27588 ms, enqueue 0.743359 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72737 ms - Host latency: 2.16621 ms (end to end 3.10933 ms, enqueue 0.787988 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74927 ms - Host latency: 2.18555 ms (end to end 3.29324 ms, enqueue 0.764258 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72827 ms - Host latency: 2.16946 ms (end to end 3.23789 ms, enqueue 0.812793 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73057 ms - Host latency: 2.16917 ms (end to end 3.23389 ms, enqueue 0.784375 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.68862 ms - Host latency: 2.12178 ms (end to end 3.19524 ms, enqueue 0.75105 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72349 ms - Host latency: 2.15669 ms (end to end 3.23311 ms, enqueue 0.770264 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75339 ms - Host latency: 2.19128 ms (end to end 3.31284 ms, enqueue 0.782617 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75596 ms - Host latency: 2.19648 ms (end to end 3.28557 ms, enqueue 0.824048 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76411 ms - Host latency: 2.20298 ms (end to end 3.2925 ms, enqueue 0.788916 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.77209 ms - Host latency: 2.20862 ms (end to end 3.3147 ms, enqueue 0.792188 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74829 ms - Host latency: 2.18196 ms (end to end 3.18284 ms, enqueue 0.792749 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76697 ms - Host latency: 2.20659 ms (end to end 3.30898 ms, enqueue 0.791064 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74839 ms - Host latency: 2.18662 ms (end to end 3.25935 ms, enqueue 0.800342 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76477 ms - Host latency: 2.20327 ms (end to end 3.299 ms, enqueue 0.768286 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75398 ms - Host latency: 2.19626 ms (end to end 3.27725 ms, enqueue 0.824731 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74929 ms - Host latency: 2.1887 ms (end to end 3.2613 ms, enqueue 0.794849 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7519 ms - Host latency: 2.18904 ms (end to end 3.27183 ms, enqueue 0.768384 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74189 ms - Host latency: 2.18342 ms (end to end 3.23733 ms, enqueue 0.786353 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73767 ms - Host latency: 2.17776 ms (end to end 3.25283 ms, enqueue 0.788623 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74531 ms - Host latency: 2.18618 ms (end to end 3.25496 ms, enqueue 0.79104 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74377 ms - Host latency: 2.18049 ms (end to end 3.25278 ms, enqueue 0.775903 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72844 ms - Host latency: 2.16897 ms (end to end 3.22505 ms, enqueue 0.804248 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73003 ms - Host latency: 2.17207 ms (end to end 3.22673 ms, enqueue 0.798242 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73367 ms - Host latency: 2.17188 ms (end to end 3.24482 ms, enqueue 0.767603 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7269 ms - Host latency: 2.16604 ms (end to end 3.22195 ms, enqueue 0.804297 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73933 ms - Host latency: 2.17673 ms (end to end 3.26089 ms, enqueue 0.775415 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7394 ms - Host latency: 2.17527 ms (end to end 3.25562 ms, enqueue 0.780859 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.72424 ms - Host latency: 2.15996 ms (end to end 3.24014 ms, enqueue 0.766113 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7238 ms - Host latency: 2.15823 ms (end to end 3.25098 ms, enqueue 0.774194 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74309 ms - Host latency: 2.18008 ms (end to end 3.26423 ms, enqueue 0.783545 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70872 ms - Host latency: 2.1468 ms (end to end 3.20657 ms, enqueue 0.772803 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.6877 ms - Host latency: 2.12146 ms (end to end 3.1748 ms, enqueue 0.771533 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.70217 ms - Host latency: 2.13796 ms (end to end 3.19954 ms, enqueue 0.778247 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.7468 ms - Host latency: 2.18518 ms (end to end 3.27385 ms, enqueue 0.783008 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.74905 ms - Host latency: 2.1905 ms (end to end 3.27144 ms, enqueue 0.78606 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.76587 ms - Host latency: 2.20437 ms (end to end 3.30181 ms, enqueue 0.807739 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.75076 ms - Host latency: 2.18816 ms (end to end 3.26775 ms, enqueue 0.781592 ms)
[11/15/2021-10:38:57] [I] Average on 10 runs - GPU latency: 1.73657 ms - Host latency: 2.17939 ms (end to end 3.23716 ms, enqueue 0.788892 ms)
[11/15/2021-10:38:57] [I]
[11/15/2021-10:38:57] [I] === Performance summary ===
[11/15/2021-10:38:57] [I] Throughput: 560.659 qps
[11/15/2021-10:38:57] [I] Latency: min = 2.0842 ms, max = 2.94159 ms, mean = 2.20648 ms, median = 2.17383 ms, percentile(99%) = 2.90884 ms
[11/15/2021-10:38:57] [I] End-to-End Host Latency: min = 2.12744 ms, max = 4.814 ms, mean = 3.30478 ms, median = 3.24341 ms, percentile(99%) = 4.7103 ms
[11/15/2021-10:38:57] [I] Enqueue Time: min = 0.670349 ms, max = 1.08557 ms, mean = 0.79436 ms, median = 0.784424 ms, percentile(99%) = 0.959076 ms
[11/15/2021-10:38:57] [I] H2D Latency: min = 0.407349 ms, max = 0.458923 ms, mean = 0.429892 ms, median = 0.428467 ms, percentile(99%) = 0.446045 ms
[11/15/2021-10:38:57] [I] GPU Compute Time: min = 1.65759 ms, max = 2.50467 ms, mean = 1.76951 ms, median = 1.73621 ms, percentile(99%) = 2.47987 ms
[11/15/2021-10:38:57] [I] D2H Latency: min = 0.00439453 ms, max = 0.0147705 ms, mean = 0.0070766 ms, median = 0.00689697 ms, percentile(99%) = 0.0098877 ms
[11/15/2021-10:38:57] [I] Total Host Walltime: 3.00539 s
[11/15/2021-10:38:57] [I] Total GPU Compute Time: 2.98163 s
[11/15/2021-10:38:57] [I] Explanations of the performance metrics are printed in the verbose logs.
[11/15/2021-10:38:57] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8003] # ./trtexec --onnx=/root/workspace/onnx/yolov5s-6.0-qat-yolo-op.onnx --workspace=10240 --int8 --saveEngine=/root/yolov5s-6.0-qat-int8.engine --plugins=/root/workspace/plugins/YoloLayer_TRT_v6.0/build/libyolo.so
[11/15/2021-10:38:57] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1013, GPU 1515 (MiB)
My problem is when I infer this engine, it is worked!! but the output is empty. I have tried to use pretrained model and do not fine-tuning model, it is worked,and the inference’s result is correct.