@anqliu like I shared before the model was successfully converted to a TensorRT engine by nvinfer
(please check the output of polygraphy
in the 1st post).
Just nvinfer
stopped being able to load the TensorRT engine of the CustomVision ONNX models when we upgraded the base DeepStream docker images from 6.1.1
to 6.2
.
Converting the model with trtexec
and using model-engine-file
on DS like you suggested fails with the same error.
root@ds6.2:/var/lib/models# trtexec --onnx=f2fc6fde-cfa3-4948-85a5-667a95d6b281.onnx --saveEngine=f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # trtexec --onnx=f2fc6fde-cfa3-4948-85a5-667a95d6b281.onnx --saveEngine=f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine
[10/17/2023-21:53:04] [I] === Model Options ===
[10/17/2023-21:53:04] [I] Format: ONNX
[10/17/2023-21:53:04] [I] Model: f2fc6fde-cfa3-4948-85a5-667a95d6b281.onnx
[10/17/2023-21:53:04] [I] Output:
[10/17/2023-21:53:04] [I] === Build Options ===
[10/17/2023-21:53:04] [I] Max batch: explicit batch
[10/17/2023-21:53:04] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/17/2023-21:53:04] [I] minTiming: 1
[10/17/2023-21:53:04] [I] avgTiming: 8
[10/17/2023-21:53:04] [I] Precision: FP32
[10/17/2023-21:53:04] [I] LayerPrecisions:
[10/17/2023-21:53:04] [I] Calibration:
[10/17/2023-21:53:04] [I] Refit: Disabled
[10/17/2023-21:53:04] [I] Sparsity: Disabled
[10/17/2023-21:53:04] [I] Safe mode: Disabled
[10/17/2023-21:53:04] [I] DirectIO mode: Disabled
[10/17/2023-21:53:04] [I] Restricted mode: Disabled
[10/17/2023-21:53:04] [I] Build only: Disabled
[10/17/2023-21:53:04] [I] Save engine: f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine
[10/17/2023-21:53:04] [I] Load engine:
[10/17/2023-21:53:04] [I] Profiling verbosity: 0
[10/17/2023-21:53:04] [I] Tactic sources: Using default tactic sources
[10/17/2023-21:53:04] [I] timingCacheMode: local
[10/17/2023-21:53:04] [I] timingCacheFile:
[10/17/2023-21:53:04] [I] Heuristic: Disabled
[10/17/2023-21:53:04] [I] Preview Features: Use default preview flags.
[10/17/2023-21:53:04] [I] Input(s)s format: fp32:CHW
[10/17/2023-21:53:04] [I] Output(s)s format: fp32:CHW
[10/17/2023-21:53:04] [I] Input build shapes: model
[10/17/2023-21:53:04] [I] Input calibration shapes: model
[10/17/2023-21:53:04] [I] === System Options ===
[10/17/2023-21:53:04] [I] Device: 0
[10/17/2023-21:53:04] [I] DLACore:
[10/17/2023-21:53:04] [I] Plugins:
[10/17/2023-21:53:04] [I] === Inference Options ===
[10/17/2023-21:53:04] [I] Batch: Explicit
[10/17/2023-21:53:04] [I] Input inference shapes: model
[10/17/2023-21:53:04] [I] Iterations: 10
[10/17/2023-21:53:04] [I] Duration: 3s (+ 200ms warm up)
[10/17/2023-21:53:04] [I] Sleep time: 0ms
[10/17/2023-21:53:04] [I] Idle time: 0ms
[10/17/2023-21:53:04] [I] Streams: 1
[10/17/2023-21:53:04] [I] ExposeDMA: Disabled
[10/17/2023-21:53:04] [I] Data transfers: Enabled
[10/17/2023-21:53:04] [I] Spin-wait: Disabled
[10/17/2023-21:53:04] [I] Multithreading: Disabled
[10/17/2023-21:53:04] [I] CUDA Graph: Disabled
[10/17/2023-21:53:04] [I] Separate profiling: Disabled
[10/17/2023-21:53:04] [I] Time Deserialize: Disabled
[10/17/2023-21:53:04] [I] Time Refit: Disabled
[10/17/2023-21:53:04] [I] NVTX verbosity: 0
[10/17/2023-21:53:04] [I] Persistent Cache Ratio: 0
[10/17/2023-21:53:04] [I] Inputs:
[10/17/2023-21:53:04] [I] === Reporting Options ===
[10/17/2023-21:53:04] [I] Verbose: Disabled
[10/17/2023-21:53:04] [I] Averages: 10 inferences
[10/17/2023-21:53:04] [I] Percentiles: 90,95,99
[10/17/2023-21:53:04] [I] Dump refittable layers:Disabled
[10/17/2023-21:53:04] [I] Dump output: Disabled
[10/17/2023-21:53:04] [I] Profile: Disabled
[10/17/2023-21:53:04] [I] Export timing to JSON file:
[10/17/2023-21:53:04] [I] Export output to JSON file:
[10/17/2023-21:53:04] [I] Export profile to JSON file:
[10/17/2023-21:53:04] [I]
[10/17/2023-21:53:04] [I] === Device Information ===
[10/17/2023-21:53:04] [I] Selected Device: NVIDIA GeForce GTX 1060 6GB
[10/17/2023-21:53:04] [I] Compute Capability: 6.1
[10/17/2023-21:53:04] [I] SMs: 10
[10/17/2023-21:53:04] [I] Compute Clock Rate: 1.7335 GHz
[10/17/2023-21:53:04] [I] Device Global Memory: 6064 MiB
[10/17/2023-21:53:04] [I] Shared Memory per SM: 96 KiB
[10/17/2023-21:53:04] [I] Memory Bus Width: 192 bits (ECC disabled)
[10/17/2023-21:53:04] [I] Memory Clock Rate: 4.004 GHz
[10/17/2023-21:53:04] [I]
[10/17/2023-21:53:04] [I] TensorRT version: 8.5.2
[10/17/2023-21:53:04] [I] [TRT] [MemUsageChange] Init CUDA: CPU +9, GPU +0, now: CPU 22, GPU 1128 (MiB)
[10/17/2023-21:53:05] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +122, GPU +22, now: CPU 199, GPU 1151 (MiB)
[10/17/2023-21:53:05] [I] Start parsing network model
[10/17/2023-21:53:05] [I] [TRT] ----------------------------------------------------------------
[10/17/2023-21:53:05] [I] [TRT] Input filename: f2fc6fde-cfa3-4948-85a5-667a95d6b281.onnx
[10/17/2023-21:53:05] [I] [TRT] ONNX IR version: 0.0.4
[10/17/2023-21:53:05] [I] [TRT] Opset version: 10
[10/17/2023-21:53:05] [I] [TRT] Producer name: customvision
[10/17/2023-21:53:05] [I] [TRT] Producer version:
[10/17/2023-21:53:05] [I] [TRT] Domain:
[10/17/2023-21:53:05] [I] [TRT] Model version: 0
[10/17/2023-21:53:05] [I] [TRT] Doc string:
[10/17/2023-21:53:05] [I] [TRT] ----------------------------------------------------------------
[10/17/2023-21:53:05] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/17/2023-21:53:05] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[10/17/2023-21:53:05] [I] Finish parsing network model
[10/17/2023-21:53:05] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +12, now: CPU 218, GPU 1162 (MiB)
[10/17/2023-21:53:05] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 219, GPU 1172 (MiB)
[10/17/2023-21:53:05] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[10/17/2023-21:55:28] [I] [TRT] Total Activation Memory: 6422669824
[10/17/2023-21:55:28] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[10/17/2023-21:55:28] [I] [TRT] Total Host Persistent Memory: 178960
[10/17/2023-21:55:28] [I] [TRT] Total Device Persistent Memory: 863744
[10/17/2023-21:55:28] [I] [TRT] Total Scratch Memory: 102720
[10/17/2023-21:55:28] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 905 MiB
[10/17/2023-21:55:28] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 129 steps to complete.
[10/17/2023-21:55:28] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 4.53997ms to assign 8 blocks to 129 nodes requiring 14770688 bytes.
[10/17/2023-21:55:28] [I] [TRT] Total Activation Memory: 14770688
[10/17/2023-21:55:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 393, GPU 1185 (MiB)
[10/17/2023-21:55:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +11, now: CPU 0, GPU 11 (MiB)
[10/17/2023-21:55:28] [I] Engine built in 144.201 sec.
[10/17/2023-21:55:28] [I] [TRT] Loaded engine size: 11 MiB
[10/17/2023-21:55:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 270, GPU 1156 (MiB)
[10/17/2023-21:55:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11, now: CPU 0, GPU 11 (MiB)
[10/17/2023-21:55:28] [I] Engine deserialized in 0.00559827 sec.
[10/17/2023-21:55:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 271, GPU 1156 (MiB)
[10/17/2023-21:55:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +15, now: CPU 0, GPU 26 (MiB)
[10/17/2023-21:55:28] [I] Setting persistentCacheLimit to 0 bytes.
[10/17/2023-21:55:28] [I] Using random values for input image_tensor
[10/17/2023-21:55:28] [I] Created input binding for image_tensor with dimensions 1x3x320x320
[10/17/2023-21:55:28] [I] Using random values for output detected_boxes
[10/17/2023-21:55:28] [I] Created output binding for detected_boxes with dimensions 1x-1x4
[10/17/2023-21:55:28] [I] Using random values for output detected_classes
[10/17/2023-21:55:28] [I] Created output binding for detected_classes with dimensions 1x-1
[10/17/2023-21:55:28] [I] Using random values for output detected_scores
[10/17/2023-21:55:28] [I] Created output binding for detected_scores with dimensions 1x-1
[10/17/2023-21:55:28] [I] Starting inference
[10/17/2023-21:55:31] [I] Warmup completed 79 queries over 200 ms
[10/17/2023-21:55:31] [I] Timing trace has 1229 queries over 3.0043 s
[10/17/2023-21:55:31] [I]
[10/17/2023-21:55:31] [I] === Trace details ===
[10/17/2023-21:55:31] [I] Trace averages of 10 runs:
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31973 ms - Host latency: 2.42259 ms (enqueue 2.41859 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.32186 ms - Host latency: 2.42467 ms (enqueue 2.42161 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3131 ms - Host latency: 2.41595 ms (enqueue 2.41131 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31678 ms - Host latency: 2.41966 ms (enqueue 2.41548 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31763 ms - Host latency: 2.42049 ms (enqueue 2.41635 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31451 ms - Host latency: 2.41735 ms (enqueue 2.41342 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31463 ms - Host latency: 2.41749 ms (enqueue 2.41418 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31701 ms - Host latency: 2.41986 ms (enqueue 2.41613 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31414 ms - Host latency: 2.41703 ms (enqueue 2.41413 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31829 ms - Host latency: 2.42116 ms (enqueue 2.41721 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3135 ms - Host latency: 2.41638 ms (enqueue 2.41246 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.32056 ms - Host latency: 2.42345 ms (enqueue 2.41922 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31795 ms - Host latency: 2.42081 ms (enqueue 2.41721 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31518 ms - Host latency: 2.41808 ms (enqueue 2.414 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31739 ms - Host latency: 2.42026 ms (enqueue 2.41559 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.47253 ms - Host latency: 2.57542 ms (enqueue 2.57201 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.49191 ms - Host latency: 2.59456 ms (enqueue 2.58906 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.46044 ms - Host latency: 2.56313 ms (enqueue 2.56777 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31829 ms - Host latency: 2.42118 ms (enqueue 2.41783 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.7308 ms - Host latency: 2.83388 ms (enqueue 2.89098 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.45786 ms - Host latency: 2.56044 ms (enqueue 2.55618 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.32646 ms - Host latency: 2.42935 ms (enqueue 2.42617 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31478 ms - Host latency: 2.41767 ms (enqueue 2.41423 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31757 ms - Host latency: 2.42041 ms (enqueue 2.41624 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31522 ms - Host latency: 2.4181 ms (enqueue 2.41348 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31631 ms - Host latency: 2.41916 ms (enqueue 2.4152 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31368 ms - Host latency: 2.41663 ms (enqueue 2.41191 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31979 ms - Host latency: 2.42268 ms (enqueue 2.41824 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31368 ms - Host latency: 2.41667 ms (enqueue 2.4126 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31882 ms - Host latency: 2.42197 ms (enqueue 2.41763 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31761 ms - Host latency: 2.42047 ms (enqueue 2.41586 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31475 ms - Host latency: 2.41752 ms (enqueue 2.41432 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31881 ms - Host latency: 2.42152 ms (enqueue 2.41708 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3171 ms - Host latency: 2.42001 ms (enqueue 2.41503 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31754 ms - Host latency: 2.42045 ms (enqueue 2.4153 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31796 ms - Host latency: 2.42086 ms (enqueue 2.41637 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31564 ms - Host latency: 2.41842 ms (enqueue 2.41448 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31343 ms - Host latency: 2.41617 ms (enqueue 2.41155 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31865 ms - Host latency: 2.42137 ms (enqueue 2.41847 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31715 ms - Host latency: 2.41998 ms (enqueue 2.4165 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.50616 ms - Host latency: 2.60905 ms (enqueue 2.60515 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31615 ms - Host latency: 2.41895 ms (enqueue 2.41543 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31505 ms - Host latency: 2.41794 ms (enqueue 2.41404 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31499 ms - Host latency: 2.4178 ms (enqueue 2.41389 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3187 ms - Host latency: 2.42153 ms (enqueue 2.41807 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31826 ms - Host latency: 2.42114 ms (enqueue 2.41752 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31298 ms - Host latency: 2.41581 ms (enqueue 2.41277 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31648 ms - Host latency: 2.41921 ms (enqueue 2.4149 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31952 ms - Host latency: 2.42233 ms (enqueue 2.41819 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3187 ms - Host latency: 2.42147 ms (enqueue 2.41781 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31492 ms - Host latency: 2.41787 ms (enqueue 2.41442 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31595 ms - Host latency: 2.41881 ms (enqueue 2.41509 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31737 ms - Host latency: 2.42006 ms (enqueue 2.41654 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31732 ms - Host latency: 2.42002 ms (enqueue 2.41702 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3139 ms - Host latency: 2.41681 ms (enqueue 2.41322 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31633 ms - Host latency: 2.41903 ms (enqueue 2.41432 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31753 ms - Host latency: 2.42031 ms (enqueue 2.41647 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31284 ms - Host latency: 2.41566 ms (enqueue 2.41185 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31868 ms - Host latency: 2.42159 ms (enqueue 2.41747 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31611 ms - Host latency: 2.41891 ms (enqueue 2.4155 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31678 ms - Host latency: 2.41949 ms (enqueue 2.41525 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31664 ms - Host latency: 2.41965 ms (enqueue 2.41554 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31506 ms - Host latency: 2.41786 ms (enqueue 2.41386 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31393 ms - Host latency: 2.4168 ms (enqueue 2.41304 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.43423 ms - Host latency: 2.5371 ms (enqueue 2.53331 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.38359 ms - Host latency: 2.4864 ms (enqueue 2.48402 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31864 ms - Host latency: 2.42151 ms (enqueue 2.4179 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31628 ms - Host latency: 2.41902 ms (enqueue 2.41582 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31569 ms - Host latency: 2.41862 ms (enqueue 2.41484 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31718 ms - Host latency: 2.41997 ms (enqueue 2.41484 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31787 ms - Host latency: 2.42064 ms (enqueue 2.41671 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31422 ms - Host latency: 2.41704 ms (enqueue 2.41296 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31641 ms - Host latency: 2.41924 ms (enqueue 2.41511 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31768 ms - Host latency: 2.42047 ms (enqueue 2.41626 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3171 ms - Host latency: 2.41989 ms (enqueue 2.41616 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31873 ms - Host latency: 2.42158 ms (enqueue 2.41798 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31558 ms - Host latency: 2.41829 ms (enqueue 2.41484 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31929 ms - Host latency: 2.42217 ms (enqueue 2.41897 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31741 ms - Host latency: 2.42014 ms (enqueue 2.41704 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31316 ms - Host latency: 2.41592 ms (enqueue 2.41228 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31641 ms - Host latency: 2.41914 ms (enqueue 2.41565 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31589 ms - Host latency: 2.4186 ms (enqueue 2.41499 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31604 ms - Host latency: 2.4188 ms (enqueue 2.41543 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31494 ms - Host latency: 2.41777 ms (enqueue 2.41418 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31682 ms - Host latency: 2.41965 ms (enqueue 2.41621 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31916 ms - Host latency: 2.422 ms (enqueue 2.41804 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31489 ms - Host latency: 2.41763 ms (enqueue 2.41438 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31648 ms - Host latency: 2.41921 ms (enqueue 2.41594 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31616 ms - Host latency: 2.41895 ms (enqueue 2.41465 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.40559 ms - Host latency: 2.50862 ms (enqueue 2.49448 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.32893 ms - Host latency: 2.43162 ms (enqueue 2.42742 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31572 ms - Host latency: 2.41863 ms (enqueue 2.41384 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.32039 ms - Host latency: 2.42319 ms (enqueue 2.41858 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3176 ms - Host latency: 2.42031 ms (enqueue 2.41619 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31453 ms - Host latency: 2.41738 ms (enqueue 2.41306 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3187 ms - Host latency: 2.42146 ms (enqueue 2.41787 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31863 ms - Host latency: 2.42163 ms (enqueue 2.41765 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31965 ms - Host latency: 2.42263 ms (enqueue 2.4186 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31572 ms - Host latency: 2.41863 ms (enqueue 2.41423 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31626 ms - Host latency: 2.41897 ms (enqueue 2.41545 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31804 ms - Host latency: 2.42097 ms (enqueue 2.41646 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31738 ms - Host latency: 2.42034 ms (enqueue 2.41672 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31699 ms - Host latency: 2.4198 ms (enqueue 2.41638 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31255 ms - Host latency: 2.41577 ms (enqueue 2.41123 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31013 ms - Host latency: 2.41296 ms (enqueue 2.40928 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31846 ms - Host latency: 2.42109 ms (enqueue 2.41726 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31631 ms - Host latency: 2.41912 ms (enqueue 2.41611 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31655 ms - Host latency: 2.41934 ms (enqueue 2.41616 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31331 ms - Host latency: 2.41609 ms (enqueue 2.41235 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31274 ms - Host latency: 2.4156 ms (enqueue 2.41135 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.32024 ms - Host latency: 2.42302 ms (enqueue 2.41902 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.3176 ms - Host latency: 2.42031 ms (enqueue 2.4166 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31653 ms - Host latency: 2.41936 ms (enqueue 2.41516 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31455 ms - Host latency: 2.41721 ms (enqueue 2.41426 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.41228 ms - Host latency: 2.51489 ms (enqueue 2.51589 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31638 ms - Host latency: 2.41921 ms (enqueue 2.41531 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31597 ms - Host latency: 2.41873 ms (enqueue 2.41497 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31682 ms - Host latency: 2.41963 ms (enqueue 2.41614 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31777 ms - Host latency: 2.42063 ms (enqueue 2.41709 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31672 ms - Host latency: 2.41951 ms (enqueue 2.41514 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31475 ms - Host latency: 2.41755 ms (enqueue 2.41377 ms)
[10/17/2023-21:55:31] [I] Average on 10 runs - GPU latency: 2.31409 ms - Host latency: 2.41675 ms (enqueue 2.41296 ms)
[10/17/2023-21:55:31] [I]
[10/17/2023-21:55:31] [I] === Performance summary ===
[10/17/2023-21:55:31] [I] Throughput: 409.08 qps
[10/17/2023-21:55:31] [I] Latency: min = 2.40259 ms, max = 4.48938 ms, mean = 2.43246 ms, median = 2.41895 ms, percentile(90%) = 2.43066 ms, percentile(95%) = 2.43835 ms, percentile(99%) = 2.83197 ms
[10/17/2023-21:55:31] [I] Enqueue Time: min = 2.39722 ms, max = 4.48743 ms, mean = 2.42911 ms, median = 2.41504 ms, percentile(90%) = 2.42761 ms, percentile(95%) = 2.43509 ms, percentile(99%) = 2.82538 ms
[10/17/2023-21:55:31] [I] H2D Latency: min = 0.0973511 ms, max = 0.0991821 ms, mean = 0.0982423 ms, median = 0.0982666 ms, percentile(90%) = 0.0983887 ms, percentile(95%) = 0.0985107 ms, percentile(99%) = 0.0986328 ms
[10/17/2023-21:55:31] [I] GPU Compute Time: min = 2.2998 ms, max = 4.38562 ms, mean = 2.32963 ms, median = 2.31616 ms, percentile(90%) = 2.32764 ms, percentile(95%) = 2.33569 ms, percentile(99%) = 2.72888 ms
[10/17/2023-21:55:31] [I] D2H Latency: min = 0.00408936 ms, max = 0.00805664 ms, mean = 0.00458358 ms, median = 0.0045166 ms, percentile(90%) = 0.00476074 ms, percentile(95%) = 0.00488281 ms, percentile(99%) = 0.00488281 ms
[10/17/2023-21:55:31] [I] Total Host Walltime: 3.0043 s
[10/17/2023-21:55:31] [I] Total GPU Compute Time: 2.86312 s
[10/17/2023-21:55:31] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[10/17/2023-21:55:31] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[10/17/2023-21:55:31] [W] * GPU compute time is unstable, with coefficient of variance = 4.44933%.
[10/17/2023-21:55:31] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[10/17/2023-21:55:31] [I] Explanations of the performance metrics are printed in the verbose logs.
[10/17/2023-21:55:31] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # trtexec --onnx=f2fc6fde-cfa3-4948-85a5-667a95d6b281.onnx --saveEngine=f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine
TRT model info:
root@ds6.2:/var/lib/models# polygraphy inspect model f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine
[I] Loading bytes from /var/lib/models/f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine
[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine
---- 1 Engine Input(s) ----
{image_tensor [dtype=float32, shape=(1, 3, 320, 320)]}
---- 3 Engine Output(s) ----
{detected_boxes [dtype=float32, shape=(1, -1, 4)],
detected_classes [dtype=int32, shape=(1, -1)],
detected_scores [dtype=float32, shape=(1, -1)]}
---- Memory ----
Device Memory: 14770688 bytes
---- 1 Profile(s) (4 Tensor(s) Each) ----
- Profile: 0
Tensor: image_tensor (Input), Index: 0 | Shapes: min=(1, 3, 320, 320), opt=(1, 3, 320, 320), max=(1, 3, 320, 320)
Tensor: detected_boxes (Output), Index: 1 | Shape: (1, -1, 4)
Tensor: detected_classes (Output), Index: 2 | Shape: (1, -1)
Tensor: detected_scores (Output), Index: 3 | Shape: (1, -1)
---- 140 Layer(s) ----
The hardcoded config to load the previously generated TensorRT engine:
root@ds6.2:/var/lib/models# cat f2fc6fde-cfa3-4948-85a5-667a95d6b281.91c0db8e-e0c8-4b89-8ef8-52d3298ea32c.config
[property]
model-engine-file=f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine
labelfile-path=f2fc6fde-cfa3-4948-85a5-667a95d6b281.labels
num-detected-classes=1
net-scale-factor=1
model-color-format=0
network-mode=0
infer-dims=3;320;320
output-blob-names=detected_boxes;detected_classes;detected_scores
cluster-mode=4
network-type=0
parse-bbox-func-name=DisableParsing
custom-lib-path=/opt/lib/objectdetector.so
[class-attrs-all]
pre-cluster-threshold=0.1
The output of deepstream deployment:
2023-10-17T22:07:45.342680Z INFO run{deployment_id=91c0db8e-e0c8-4b89-8ef8-52d3298ea32c}:run_pipeline_inner: gst_runner::gstreamer_log: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1909> [UID = 1]: deserialized trt engine from :/var/lib/models/f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine gst_level=INFO category=nvinfer object=model_inference1
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:512 ImplicitTrtBackend initialize failed because bindings has wildcard dims
2023-10-17T22:07:45.352089Z INFO run{deployment_id=91c0db8e-e0c8-4b89-8ef8-52d3298ea32c}:run_pipeline_inner: gst_runner::gstreamer_log: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2012> [UID = 1]: Use deserialized engine model: /var/lib/models/f2fc6fde-cfa3-4948-85a5-667a95d6b281.trtexec.engine gst_level=INFO category=nvinfer object=model_inference1
gst-runner: nvdsinfer_context_impl.cpp:1421: NvDsInferStatus nvdsinfer::NvDsInferContextImpl::allocateBuffers(): Assertion `bindingDims.numElements > 0' failed.
Isn’t a bug DeepStream trying to load a TensorRT Explicit Batch Engine
with the ImplicitTrtBackend
?