Unable to convert tensorflow model(Bisenetv2) to tensorRT engine file

I am trying to convert the bisenetv2 tensorflow model to tensorrt engine file on google colab. I was able to convert the trained model to onnx file, but I am unable to create TensorRT engine file from onnx file using the code provided in below colab link.

Environment

TensorRT Version: 8.2.1.8
GPU Type: Tesla K80
Nvidia Driver Version:
CUDA Version: 11.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.7.13
TensorFlow Version (if applicable): 2.8.0
ONNX Version (if applicable): 1.11.0
tf2onnx Version (if applicable): 1.11.0
PyTorch Version (if applicable): 1.9.3/1190aa
Baremetal or Container (if container which image + tag):

Relevant Files

Bisenetv2 model link: bisenetv2_cityscapes_frozen_850.pb - Google Drive

TensorRT package used: TensorRT-8.2.1.8.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz - Google Drive

I get the below error message.

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /content/TensorRT-8.2.1.8/bin/trtexec --onnx=/content/bisenetv2_cityscapes_frozen_850.onnx --saveEngine=Bisenetv2_cityscapes_model_pb_fp16.trt --explicitBatch --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
[04/14/2022-09:34:13] [W] --explicitBatch flag has been deprecated and has no effect!
[04/14/2022-09:34:13] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[04/14/2022-09:34:13] [I] === Model Options ===
[04/14/2022-09:34:13] [I] Format: ONNX
[04/14/2022-09:34:13] [I] Model: /content/bisenetv2_cityscapes_frozen_850.onnx
[04/14/2022-09:34:13] [I] Output:
[04/14/2022-09:34:13] [I] === Build Options ===
[04/14/2022-09:34:13] [I] Max batch: explicit batch
[04/14/2022-09:34:13] [I] Workspace: 16 MiB
[04/14/2022-09:34:13] [I] minTiming: 1
[04/14/2022-09:34:13] [I] avgTiming: 8
[04/14/2022-09:34:13] [I] Precision: FP32+FP16
[04/14/2022-09:34:13] [I] Calibration:
[04/14/2022-09:34:13] [I] Refit: Disabled
[04/14/2022-09:34:13] [I] Sparsity: Disabled
[04/14/2022-09:34:13] [I] Safe mode: Disabled
[04/14/2022-09:34:13] [I] DirectIO mode: Disabled
[04/14/2022-09:34:13] [I] Restricted mode: Disabled
[04/14/2022-09:34:13] [I] Save engine: Bisenetv2_cityscapes_model_pb_fp16.trt
[04/14/2022-09:34:13] [I] Load engine:
[04/14/2022-09:34:13] [I] Profiling verbosity: 0
[04/14/2022-09:34:13] [I] Tactic sources: Using default tactic sources
[04/14/2022-09:34:13] [I] timingCacheMode: local
[04/14/2022-09:34:13] [I] timingCacheFile:
[04/14/2022-09:34:13] [I] Input(s): fp16:chw
[04/14/2022-09:34:13] [I] Output(s): fp16:chw
[04/14/2022-09:34:13] [I] Input build shapes: model
[04/14/2022-09:34:13] [I] Input calibration shapes: model
[04/14/2022-09:34:13] [I] === System Options ===
[04/14/2022-09:34:13] [I] Device: 0
[04/14/2022-09:34:13] [I] DLACore:
[04/14/2022-09:34:13] [I] Plugins:
[04/14/2022-09:34:13] [I] === Inference Options ===
[04/14/2022-09:34:13] [I] Batch: Explicit
[04/14/2022-09:34:13] [I] Input inference shapes: model
[04/14/2022-09:34:13] [I] Iterations: 10
[04/14/2022-09:34:13] [I] Duration: 3s (+ 200ms warm up)
[04/14/2022-09:34:13] [I] Sleep time: 0ms
[04/14/2022-09:34:13] [I] Idle time: 0ms
[04/14/2022-09:34:13] [I] Streams: 1
[04/14/2022-09:34:13] [I] ExposeDMA: Disabled
[04/14/2022-09:34:13] [I] Data transfers: Enabled
[04/14/2022-09:34:13] [I] Spin-wait: Disabled
[04/14/2022-09:34:13] [I] Multithreading: Disabled
[04/14/2022-09:34:13] [I] CUDA Graph: Disabled
[04/14/2022-09:34:13] [I] Separate profiling: Disabled
[04/14/2022-09:34:13] [I] Time Deserialize: Disabled
[04/14/2022-09:34:13] [I] Time Refit: Disabled
[04/14/2022-09:34:13] [I] Skip inference: Disabled
[04/14/2022-09:34:13] [I] Inputs:
[04/14/2022-09:34:13] [I] === Reporting Options ===
[04/14/2022-09:34:13] [I] Verbose: Disabled
[04/14/2022-09:34:13] [I] Averages: 10 inferences
[04/14/2022-09:34:13] [I] Percentile: 99
[04/14/2022-09:34:13] [I] Dump refittable layers:Disabled
[04/14/2022-09:34:13] [I] Dump output: Disabled
[04/14/2022-09:34:13] [I] Profile: Disabled
[04/14/2022-09:34:13] [I] Export timing to JSON file:
[04/14/2022-09:34:13] [I] Export output to JSON file:
[04/14/2022-09:34:13] [I] Export profile to JSON file:
[04/14/2022-09:34:13] [I]
[04/14/2022-09:34:13] [I] === Device Information ===
[04/14/2022-09:34:13] [I] Selected Device: Tesla K80
[04/14/2022-09:34:13] [I] Compute Capability: 3.7
[04/14/2022-09:34:13] [I] SMs: 13
[04/14/2022-09:34:13] [I] Compute Clock Rate: 0.8235 GHz
[04/14/2022-09:34:13] [I] Device Global Memory: 11441 MiB
[04/14/2022-09:34:13] [I] Shared Memory per SM: 112 KiB
[04/14/2022-09:34:13] [I] Memory Bus Width: 384 bits (ECC enabled)
[04/14/2022-09:34:13] [I] Memory Clock Rate: 2.505 GHz
[04/14/2022-09:34:13] [I]
[04/14/2022-09:34:13] [I] TensorRT version: 8.2.1
[04/14/2022-09:34:13] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 0, GPU 107 (MiB)
[04/14/2022-09:34:14] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 0 MiB, GPU 107 MiB
[04/14/2022-09:34:14] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 0 MiB, GPU 107 MiB
[04/14/2022-09:34:14] [I] Start parsing network model
[04/14/2022-09:34:14] [I] [TRT] ----------------------------------------------------------------
[04/14/2022-09:34:14] [I] [TRT] Input filename: /content/bisenetv2_cityscapes_frozen_850.onnx
[04/14/2022-09:34:14] [I] [TRT] ONNX IR version: 0.0.5
[04/14/2022-09:34:14] [I] [TRT] Opset version: 10
[04/14/2022-09:34:14] [I] [TRT] Producer name: tf2onnx
[04/14/2022-09:34:14] [I] [TRT] Producer version: 1.9.3
[04/14/2022-09:34:14] [I] [TRT] Domain:
[04/14/2022-09:34:14] [I] [TRT] Model version: 0
[04/14/2022-09:34:14] [I] [TRT] Doc string:
[04/14/2022-09:34:14] [I] [TRT] ----------------------------------------------------------------
[04/14/2022-09:34:14] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[04/14/2022-09:34:14] [I] Finish parsing network model
[04/14/2022-09:34:14] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[04/14/2022-09:34:14] [W] [TRT] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0
[04/14/2022-09:34:14] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +48, now: CPU 0, GPU 155 (MiB)
[04/14/2022-09:34:14] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +45, now: CPU 0, GPU 200 (MiB)
[04/14/2022-09:34:14] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[04/14/2022-09:34:14] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/14/2022-09:34:14] [E] Error[2]: [optimizer.cpp::getFormatRequirements::3793] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. no supported formats)
[04/14/2022-09:34:14] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[04/14/2022-09:34:14] [E] Engine could not be created from network
[04/14/2022-09:34:14] [E] Building engine failed
[04/14/2022-09:34:14] [E] Failed to create engine from model.
[04/14/2022-09:34:14] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # /content/TensorRT-8.2.1.8/bin/trtexec --onnx=/content/bisenetv2_cityscapes_frozen_850.onnx --saveEngine=Bisenetv2_cityscapes_model_pb_fp16.trt --explicitBatch --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Could you please try on the latest TensorRT version 8.4.
If you still face this issue please share with the ONNX model and trtexec --verbose logs for better debugging.

https://developer.nvidia.com/nvidia-tensorrt-8x-download