YOLO v4 inference with TensorRT after training with TLT 3.0

Hi NVES,

Here is the output from trtexec (it seems to be ok?):

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=model.engine --batch=1 --verbose
[04/08/2021-21:39:13] [I] === Model Options ===
[04/08/2021-21:39:13] [I] Format: *
[04/08/2021-21:39:13] [I] Model: 
[04/08/2021-21:39:13] [I] Output:
[04/08/2021-21:39:13] [I] === Build Options ===
[04/08/2021-21:39:13] [I] Max batch: 1
[04/08/2021-21:39:13] [I] Workspace: 16 MB
[04/08/2021-21:39:13] [I] minTiming: 1
[04/08/2021-21:39:13] [I] avgTiming: 8
[04/08/2021-21:39:13] [I] Precision: FP32
[04/08/2021-21:39:13] [I] Calibration: 
[04/08/2021-21:39:13] [I] Safe mode: Disabled
[04/08/2021-21:39:13] [I] Save engine: 
[04/08/2021-21:39:13] [I] Load engine: model.engine
[04/08/2021-21:39:13] [I] Builder Cache: Enabled
[04/08/2021-21:39:13] [I] NVTX verbosity: 0
[04/08/2021-21:39:13] [I] Inputs format: fp32:CHW
[04/08/2021-21:39:13] [I] Outputs format: fp32:CHW
[04/08/2021-21:39:13] [I] Input build shapes: model
[04/08/2021-21:39:13] [I] Input calibration shapes: model
[04/08/2021-21:39:13] [I] === System Options ===
[04/08/2021-21:39:13] [I] Device: 0
[04/08/2021-21:39:13] [I] DLACore: 
[04/08/2021-21:39:13] [I] Plugins:
[04/08/2021-21:39:13] [I] === Inference Options ===
[04/08/2021-21:39:13] [I] Batch: 1
[04/08/2021-21:39:13] [I] Input inference shapes: model
[04/08/2021-21:39:13] [I] Iterations: 10
[04/08/2021-21:39:13] [I] Duration: 3s (+ 200ms warm up)
[04/08/2021-21:39:13] [I] Sleep time: 0ms
[04/08/2021-21:39:13] [I] Streams: 1
[04/08/2021-21:39:13] [I] ExposeDMA: Disabled
[04/08/2021-21:39:13] [I] Spin-wait: Disabled
[04/08/2021-21:39:13] [I] Multithreading: Disabled
[04/08/2021-21:39:13] [I] CUDA Graph: Disabled
[04/08/2021-21:39:13] [I] Skip inference: Disabled
[04/08/2021-21:39:13] [I] Inputs:
[04/08/2021-21:39:13] [I] === Reporting Options ===
[04/08/2021-21:39:13] [I] Verbose: Enabled
[04/08/2021-21:39:13] [I] Averages: 10 inferences
[04/08/2021-21:39:13] [I] Percentile: 99
[04/08/2021-21:39:13] [I] Dump output: Disabled
[04/08/2021-21:39:13] [I] Profile: Disabled
[04/08/2021-21:39:13] [I] Export timing to JSON file: 
[04/08/2021-21:39:13] [I] Export output to JSON file: 
[04/08/2021-21:39:13] [I] Export profile to JSON file: 
[04/08/2021-21:39:13] [I] 
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::CoordConvAC version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::GenerateDetection_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::Proposal version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[04/08/2021-21:39:13] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[04/08/2021-21:39:14] [V] [TRT] Deserialize required 855265 microseconds.
[04/08/2021-21:39:14] [I] Starting inference threads
[04/08/2021-21:39:17] [I] Warmup completed 20 queries over 200 ms
[04/08/2021-21:39:17] [I] Timing trace has 316 queries over 3.01907 s
[04/08/2021-21:39:17] [I] Trace averages of 10 runs:
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.60522 ms - Host latency: 11.1722 ms (end to end 18.4379 ms, enqueue 2.38806 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.60532 ms - Host latency: 11.1729 ms (end to end 18.4686 ms, enqueue 2.37536 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.51675 ms - Host latency: 11.0845 ms (end to end 18.7923 ms, enqueue 2.38184 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53139 ms - Host latency: 11.0985 ms (end to end 18.5644 ms, enqueue 2.37105 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52454 ms - Host latency: 11.0943 ms (end to end 18.7998 ms, enqueue 2.37631 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.54829 ms - Host latency: 11.1189 ms (end to end 18.608 ms, enqueue 2.39402 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.50683 ms - Host latency: 11.0739 ms (end to end 18.7834 ms, enqueue 2.37681 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52607 ms - Host latency: 11.0938 ms (end to end 18.7242 ms, enqueue 2.38451 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.5155 ms - Host latency: 11.0833 ms (end to end 18.7939 ms, enqueue 2.41888 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53549 ms - Host latency: 11.1026 ms (end to end 18.7437 ms, enqueue 2.38383 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.64969 ms - Host latency: 11.2172 ms (end to end 18.9746 ms, enqueue 2.41281 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.56407 ms - Host latency: 11.1328 ms (end to end 18.2136 ms, enqueue 2.38313 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.51993 ms - Host latency: 11.0866 ms (end to end 18.7584 ms, enqueue 2.38447 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53993 ms - Host latency: 11.1071 ms (end to end 18.7939 ms, enqueue 2.37832 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53353 ms - Host latency: 11.1016 ms (end to end 18.6047 ms, enqueue 2.375 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52291 ms - Host latency: 11.0902 ms (end to end 18.8013 ms, enqueue 2.3837 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53189 ms - Host latency: 11.1006 ms (end to end 18.6207 ms, enqueue 2.3755 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52517 ms - Host latency: 11.0925 ms (end to end 18.6595 ms, enqueue 2.37561 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53262 ms - Host latency: 11.1011 ms (end to end 18.5324 ms, enqueue 2.38126 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.5319 ms - Host latency: 11.0992 ms (end to end 18.8119 ms, enqueue 2.377 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52864 ms - Host latency: 11.0968 ms (end to end 18.7304 ms, enqueue 2.37883 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52002 ms - Host latency: 11.0883 ms (end to end 18.7924 ms, enqueue 2.37947 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53667 ms - Host latency: 11.1041 ms (end to end 18.6309 ms, enqueue 2.37495 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52307 ms - Host latency: 11.093 ms (end to end 18.7872 ms, enqueue 2.37766 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53164 ms - Host latency: 11.0994 ms (end to end 18.7299 ms, enqueue 2.38091 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52068 ms - Host latency: 11.0883 ms (end to end 18.7975 ms, enqueue 2.37576 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.53113 ms - Host latency: 11.0993 ms (end to end 18.5183 ms, enqueue 2.37327 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52207 ms - Host latency: 11.0889 ms (end to end 18.7955 ms, enqueue 2.38027 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.58308 ms - Host latency: 11.1534 ms (end to end 18.0532 ms, enqueue 2.37944 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.50659 ms - Host latency: 11.0748 ms (end to end 18.7812 ms, enqueue 2.37905 ms)
[04/08/2021-21:39:17] [I] Average on 10 runs - GPU latency: 9.52559 ms - Host latency: 11.0926 ms (end to end 18.68 ms, enqueue 2.37747 ms)
[04/08/2021-21:39:17] [I] Host Latency
[04/08/2021-21:39:17] [I] min: 11.0056 ms (end to end 13.7264 ms)
[04/08/2021-21:39:17] [I] max: 12.3127 ms (end to end 20.0497 ms)
[04/08/2021-21:39:17] [I] mean: 11.1058 ms (end to end 18.672 ms)
[04/08/2021-21:39:17] [I] median: 11.0933 ms (end to end 18.7994 ms)
[04/08/2021-21:39:17] [I] percentile: 11.3425 ms at 99% (end to end 19.0951 ms at 99%)
[04/08/2021-21:39:17] [I] throughput: 104.668 qps
[04/08/2021-21:39:17] [I] walltime: 3.01907 s
[04/08/2021-21:39:17] [I] Enqueue Time
[04/08/2021-21:39:17] [I] min: 2.3114 ms
[04/08/2021-21:39:17] [I] max: 2.74396 ms
[04/08/2021-21:39:17] [I] median: 2.38004 ms
[04/08/2021-21:39:17] [I] GPU Compute
[04/08/2021-21:39:17] [I] min: 9.44214 ms
[04/08/2021-21:39:17] [I] max: 10.7449 ms
[04/08/2021-21:39:17] [I] mean: 9.53782 ms
[04/08/2021-21:39:17] [I] median: 9.52533 ms
[04/08/2021-21:39:17] [I] percentile: 9.77197 ms at 99%
[04/08/2021-21:39:17] [I] total compute time: 3.01395 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=model.engine --batch=1 --verbose

I’ll be posting the whole script in a separate reply.