Python: Unable to load .trt model, but loads fine using trtexec

Description

I am unable to load a .trt Engine that has been converted from ONNX-format.
Using the following container nvcr.io/nvidia/tensorrt:21.06-py3. I have downloaded a tensorflow model and converted it to ONNX, and finally .trt.
When loading the engine using trtexec is seems to work fine:

root@46d03ec1349c:/files/TensorRT/build/out# trtexec --loadEngine=model/model.trt --useCudaGraph --noDataTransfers --iterations=100 --avgRuns=100 --workspace=7000
&&&& RUNNING TensorRT.trtexec # trtexec --loadEngine=model/model.trt --useCudaGraph --noDataTransfers --iterations=100 --avgRuns=100 --workspace=7000
[07/06/2021-07:18:16] [I] === Model Options ===
[07/06/2021-07:18:16] [I] Format: *
[07/06/2021-07:18:16] [I] Model: 
[07/06/2021-07:18:16] [I] Output:
[07/06/2021-07:18:16] [I] === Build Options ===
[07/06/2021-07:18:16] [I] Max batch: 1
[07/06/2021-07:18:16] [I] Workspace: 7000 MiB
[07/06/2021-07:18:16] [I] minTiming: 1
[07/06/2021-07:18:16] [I] avgTiming: 8
[07/06/2021-07:18:16] [I] Precision: FP32
[07/06/2021-07:18:16] [I] Calibration: 
[07/06/2021-07:18:16] [I] Refit: Disabled
[07/06/2021-07:18:16] [I] Safe mode: Disabled
[07/06/2021-07:18:16] [I] Save engine: 
[07/06/2021-07:18:16] [I] Load engine: model/model.trt
[07/06/2021-07:18:16] [I] Builder Cache: Enabled
[07/06/2021-07:18:16] [I] NVTX verbosity: 0
[07/06/2021-07:18:16] [I] Tactic sources: Using default tactic sources
[07/06/2021-07:18:16] [I] Input(s)s format: fp32:CHW
[07/06/2021-07:18:16] [I] Output(s)s format: fp32:CHW
[07/06/2021-07:18:16] [I] Input build shapes: model
[07/06/2021-07:18:16] [I] Input calibration shapes: model
[07/06/2021-07:18:16] [I] === System Options ===
[07/06/2021-07:18:16] [I] Device: 0
[07/06/2021-07:18:16] [I] DLACore: 
[07/06/2021-07:18:16] [I] Plugins:
[07/06/2021-07:18:16] [I] === Inference Options ===
[07/06/2021-07:18:16] [I] Batch: 1
[07/06/2021-07:18:16] [I] Input inference shapes: model
[07/06/2021-07:18:16] [I] Iterations: 100
[07/06/2021-07:18:16] [I] Duration: 3s (+ 200ms warm up)
[07/06/2021-07:18:16] [I] Sleep time: 0ms
[07/06/2021-07:18:16] [I] Streams: 1
[07/06/2021-07:18:16] [I] ExposeDMA: Disabled
[07/06/2021-07:18:16] [I] Data transfers: Disabled
[07/06/2021-07:18:16] [I] Spin-wait: Disabled
[07/06/2021-07:18:16] [I] Multithreading: Disabled
[07/06/2021-07:18:16] [I] CUDA Graph: Enabled
[07/06/2021-07:18:16] [I] Separate profiling: Disabled
[07/06/2021-07:18:16] [I] Skip inference: Disabled
[07/06/2021-07:18:16] [I] Inputs:
[07/06/2021-07:18:16] [I] === Reporting Options ===
[07/06/2021-07:18:16] [I] Verbose: Disabled
[07/06/2021-07:18:16] [I] Averages: 100 inferences
[07/06/2021-07:18:16] [I] Percentile: 99
[07/06/2021-07:18:16] [I] Dump refittable layers:Disabled
[07/06/2021-07:18:16] [I] Dump output: Disabled
[07/06/2021-07:18:16] [I] Profile: Disabled
[07/06/2021-07:18:16] [I] Export timing to JSON file: 
[07/06/2021-07:18:16] [I] Export output to JSON file: 
[07/06/2021-07:18:16] [I] Export profile to JSON file: 
[07/06/2021-07:18:16] [I] 
[07/06/2021-07:18:16] [I] === Device Information ===
[07/06/2021-07:18:16] [I] Selected Device: GeForce GTX 1080
[07/06/2021-07:18:16] [I] Compute Capability: 6.1
[07/06/2021-07:18:16] [I] SMs: 20
[07/06/2021-07:18:16] [I] Compute Clock Rate: 1.7335 GHz
[07/06/2021-07:18:16] [I] Device Global Memory: 8118 MiB
[07/06/2021-07:18:16] [I] Shared Memory per SM: 96 KiB
[07/06/2021-07:18:16] [I] Memory Bus Width: 256 bits (ECC disabled)
[07/06/2021-07:18:16] [I] Memory Clock Rate: 5.005 GHz
[07/06/2021-07:18:16] [I] 
[07/06/2021-07:18:27] [I] Engine loaded in 10.4438 sec.
[07/06/2021-07:18:27] [I] Starting inference
[07/06/2021-07:18:31] [I] Warmup completed 4 queries over 200 ms
[07/06/2021-07:18:31] [I] Timing trace has 100 queries over 4.05742 s
[07/06/2021-07:18:31] [I] Trace averages of 100 runs:
[07/06/2021-07:18:31] [I] Average on 100 runs - GPU latency: 40.5731 ms - Host latency: 40.5731 ms (end to end 40.5731 ms, enqueue 0.663882 ms)
[07/06/2021-07:18:31] [I] Host Latency
[07/06/2021-07:18:31] [I] min: 40.4674 ms (end to end 40.4674 ms)
[07/06/2021-07:18:31] [I] max: 42.1765 ms (end to end 42.1765 ms)
[07/06/2021-07:18:31] [I] mean: 40.5731 ms (end to end 40.5731 ms)
[07/06/2021-07:18:31] [I] median: 40.5258 ms (end to end 40.5258 ms)
[07/06/2021-07:18:31] [I] percentile: 42.1765 ms at 99% (end to end 42.1765 ms at 99%)
[07/06/2021-07:18:31] [I] throughput: 24.6462 qps
[07/06/2021-07:18:31] [I] walltime: 4.05742 s
[07/06/2021-07:18:31] [I] Enqueue Time
[07/06/2021-07:18:31] [I] min: 0.273438 ms
[07/06/2021-07:18:31] [I] max: 1.13965 ms
[07/06/2021-07:18:31] [I] median: 0.639282 ms
[07/06/2021-07:18:31] [I] GPU Compute
[07/06/2021-07:18:31] [I] min: 40.4674 ms
[07/06/2021-07:18:31] [I] max: 42.1765 ms
[07/06/2021-07:18:31] [I] mean: 40.5731 ms
[07/06/2021-07:18:31] [I] median: 40.5258 ms
[07/06/2021-07:18:31] [I] percentile: 42.1765 ms at 99%
[07/06/2021-07:18:31] [I] total compute time: 4.05731 s
&&&& PASSED TensorRT.trtexec # trtexec --loadEngine=model/model.trt --useCudaGraph --noDataTransfers --iterations=100 --avgRuns=100 --workspace=7000

However, if I try to load the same engine in Python like this:

#!/usr/bin/python3

import tensorrt as trt

print(trt.__version__)

ENGINE_PATH = "model/model.trt"

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)

with open(ENGINE_PATH, "rb") as f:
  engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)

I get an error:

7.2.3.4
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
[TensorRT] ERROR: safeDeserializationUtils.cpp (322) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

Why does this happen and are there anyway to fix it so I can load the model in Python and use it for object detection?

Thanks for any help!

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Thank you for your reply.
Seems that I got it working by adding trt.init_libnvinfer_plugins(TRT_LOGGER, namespace="").
Does this mean that the plugins are not loaded automatically, so in order to make the application find them I load them like that?

@RobertB,

Before accessing the Plugin Registry, we need call trt.init_libnvinfer_plugins()
https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Plugin/IPluginRegistry.html#tensorrt.init_libnvinfer_plugins

1 Like