Description
I am unable to load a .trt Engine that has been converted from ONNX-format.
Using the following container nvcr.io/nvidia/tensorrt:21.06-py3
. I have downloaded a tensorflow model and converted it to ONNX, and finally .trt.
When loading the engine using trtexec is seems to work fine:
root@46d03ec1349c:/files/TensorRT/build/out# trtexec --loadEngine=model/model.trt --useCudaGraph --noDataTransfers --iterations=100 --avgRuns=100 --workspace=7000
&&&& RUNNING TensorRT.trtexec # trtexec --loadEngine=model/model.trt --useCudaGraph --noDataTransfers --iterations=100 --avgRuns=100 --workspace=7000
[07/06/2021-07:18:16] [I] === Model Options ===
[07/06/2021-07:18:16] [I] Format: *
[07/06/2021-07:18:16] [I] Model:
[07/06/2021-07:18:16] [I] Output:
[07/06/2021-07:18:16] [I] === Build Options ===
[07/06/2021-07:18:16] [I] Max batch: 1
[07/06/2021-07:18:16] [I] Workspace: 7000 MiB
[07/06/2021-07:18:16] [I] minTiming: 1
[07/06/2021-07:18:16] [I] avgTiming: 8
[07/06/2021-07:18:16] [I] Precision: FP32
[07/06/2021-07:18:16] [I] Calibration:
[07/06/2021-07:18:16] [I] Refit: Disabled
[07/06/2021-07:18:16] [I] Safe mode: Disabled
[07/06/2021-07:18:16] [I] Save engine:
[07/06/2021-07:18:16] [I] Load engine: model/model.trt
[07/06/2021-07:18:16] [I] Builder Cache: Enabled
[07/06/2021-07:18:16] [I] NVTX verbosity: 0
[07/06/2021-07:18:16] [I] Tactic sources: Using default tactic sources
[07/06/2021-07:18:16] [I] Input(s)s format: fp32:CHW
[07/06/2021-07:18:16] [I] Output(s)s format: fp32:CHW
[07/06/2021-07:18:16] [I] Input build shapes: model
[07/06/2021-07:18:16] [I] Input calibration shapes: model
[07/06/2021-07:18:16] [I] === System Options ===
[07/06/2021-07:18:16] [I] Device: 0
[07/06/2021-07:18:16] [I] DLACore:
[07/06/2021-07:18:16] [I] Plugins:
[07/06/2021-07:18:16] [I] === Inference Options ===
[07/06/2021-07:18:16] [I] Batch: 1
[07/06/2021-07:18:16] [I] Input inference shapes: model
[07/06/2021-07:18:16] [I] Iterations: 100
[07/06/2021-07:18:16] [I] Duration: 3s (+ 200ms warm up)
[07/06/2021-07:18:16] [I] Sleep time: 0ms
[07/06/2021-07:18:16] [I] Streams: 1
[07/06/2021-07:18:16] [I] ExposeDMA: Disabled
[07/06/2021-07:18:16] [I] Data transfers: Disabled
[07/06/2021-07:18:16] [I] Spin-wait: Disabled
[07/06/2021-07:18:16] [I] Multithreading: Disabled
[07/06/2021-07:18:16] [I] CUDA Graph: Enabled
[07/06/2021-07:18:16] [I] Separate profiling: Disabled
[07/06/2021-07:18:16] [I] Skip inference: Disabled
[07/06/2021-07:18:16] [I] Inputs:
[07/06/2021-07:18:16] [I] === Reporting Options ===
[07/06/2021-07:18:16] [I] Verbose: Disabled
[07/06/2021-07:18:16] [I] Averages: 100 inferences
[07/06/2021-07:18:16] [I] Percentile: 99
[07/06/2021-07:18:16] [I] Dump refittable layers:Disabled
[07/06/2021-07:18:16] [I] Dump output: Disabled
[07/06/2021-07:18:16] [I] Profile: Disabled
[07/06/2021-07:18:16] [I] Export timing to JSON file:
[07/06/2021-07:18:16] [I] Export output to JSON file:
[07/06/2021-07:18:16] [I] Export profile to JSON file:
[07/06/2021-07:18:16] [I]
[07/06/2021-07:18:16] [I] === Device Information ===
[07/06/2021-07:18:16] [I] Selected Device: GeForce GTX 1080
[07/06/2021-07:18:16] [I] Compute Capability: 6.1
[07/06/2021-07:18:16] [I] SMs: 20
[07/06/2021-07:18:16] [I] Compute Clock Rate: 1.7335 GHz
[07/06/2021-07:18:16] [I] Device Global Memory: 8118 MiB
[07/06/2021-07:18:16] [I] Shared Memory per SM: 96 KiB
[07/06/2021-07:18:16] [I] Memory Bus Width: 256 bits (ECC disabled)
[07/06/2021-07:18:16] [I] Memory Clock Rate: 5.005 GHz
[07/06/2021-07:18:16] [I]
[07/06/2021-07:18:27] [I] Engine loaded in 10.4438 sec.
[07/06/2021-07:18:27] [I] Starting inference
[07/06/2021-07:18:31] [I] Warmup completed 4 queries over 200 ms
[07/06/2021-07:18:31] [I] Timing trace has 100 queries over 4.05742 s
[07/06/2021-07:18:31] [I] Trace averages of 100 runs:
[07/06/2021-07:18:31] [I] Average on 100 runs - GPU latency: 40.5731 ms - Host latency: 40.5731 ms (end to end 40.5731 ms, enqueue 0.663882 ms)
[07/06/2021-07:18:31] [I] Host Latency
[07/06/2021-07:18:31] [I] min: 40.4674 ms (end to end 40.4674 ms)
[07/06/2021-07:18:31] [I] max: 42.1765 ms (end to end 42.1765 ms)
[07/06/2021-07:18:31] [I] mean: 40.5731 ms (end to end 40.5731 ms)
[07/06/2021-07:18:31] [I] median: 40.5258 ms (end to end 40.5258 ms)
[07/06/2021-07:18:31] [I] percentile: 42.1765 ms at 99% (end to end 42.1765 ms at 99%)
[07/06/2021-07:18:31] [I] throughput: 24.6462 qps
[07/06/2021-07:18:31] [I] walltime: 4.05742 s
[07/06/2021-07:18:31] [I] Enqueue Time
[07/06/2021-07:18:31] [I] min: 0.273438 ms
[07/06/2021-07:18:31] [I] max: 1.13965 ms
[07/06/2021-07:18:31] [I] median: 0.639282 ms
[07/06/2021-07:18:31] [I] GPU Compute
[07/06/2021-07:18:31] [I] min: 40.4674 ms
[07/06/2021-07:18:31] [I] max: 42.1765 ms
[07/06/2021-07:18:31] [I] mean: 40.5731 ms
[07/06/2021-07:18:31] [I] median: 40.5258 ms
[07/06/2021-07:18:31] [I] percentile: 42.1765 ms at 99%
[07/06/2021-07:18:31] [I] total compute time: 4.05731 s
&&&& PASSED TensorRT.trtexec # trtexec --loadEngine=model/model.trt --useCudaGraph --noDataTransfers --iterations=100 --avgRuns=100 --workspace=7000
However, if I try to load the same engine in Python like this:
#!/usr/bin/python3
import tensorrt as trt
print(trt.__version__)
ENGINE_PATH = "model/model.trt"
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
with open(ENGINE_PATH, "rb") as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
I get an error:
7.2.3.4
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
[TensorRT] ERROR: safeDeserializationUtils.cpp (322) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
Why does this happen and are there anyway to fix it so I can load the model in Python and use it for object detection?
Thanks for any help!