Issue with TensorRT Optimization on Custom Mode

I am experiencing a bug while using TensorRT for AI development services. The issue arises when attempting to optimize a model for inference, leading to unexpected behavior during the execution of inference requests. Specifically, the model fails to load correctly, resulting in a runtime error that prevents any predictions from being made.

Environment

  • TensorRT Version: 8.4.1

  • GPU Type: NVIDIA GeForce RTX 3080

  • Nvidia Driver Version: 510.39.01

  • CUDA Version: 11.4

  • CUDNN Version: 8.2.1

  • Operating System + Version: Ubuntu 20.04 LTS

  • Python Version : 3.8.10

  • TensorFlow Version : 2.6.2

  • PyTorch Version : 1.9.0

  • Baremetal or Container : Container (NVIDIA NGC PyTorch 21.09)

Relevant Files

Please find the necessary files to reproduce the issue in the following link: GitHub Repository. The repository includes the model files, data, and scripts required to replicate the bug.

Steps To Reproduce

  1. Exact steps/commands to build your repro:
  • Clone the repository:
bash
     git clone https://github.com/username/repo-name.git
     cd repo-name
     
  • Install the required dependencies:
bash
     pip install -r requirements.txt
     
  1. Exact steps/commands to run your repro:
  • Attempt to optimize the model using TensorRT:
bash
     python optimize_model.py --model_path=model.onnx
     
  • Run inference on the optimized model:
bash
     python run_inference.py --model_path=optimized_model.trt --input_data=input_data.json
     
  1. Full traceback of errors encountered:
   RuntimeError: Unable to load the optimized model. 
   Error Code: 0x0000001 - TensorRT failed to create the engine.
   Check the logs for more details.
   

I would appreciate any assistance in resolving this issue, as it is critical for the deployment of our AI solution. Thank you!