I am experiencing a bug while using TensorRT for AI development services. The issue arises when attempting to optimize a model for inference, leading to unexpected behavior during the execution of inference requests. Specifically, the model fails to load correctly, resulting in a runtime error that prevents any predictions from being made.
Environment
-
TensorRT Version: 8.4.1
-
GPU Type: NVIDIA GeForce RTX 3080
-
Nvidia Driver Version: 510.39.01
-
CUDA Version: 11.4
-
CUDNN Version: 8.2.1
-
Operating System + Version: Ubuntu 20.04 LTS
-
Python Version : 3.8.10
-
TensorFlow Version : 2.6.2
-
PyTorch Version : 1.9.0
-
Baremetal or Container : Container (NVIDIA NGC PyTorch 21.09)
Relevant Files
Please find the necessary files to reproduce the issue in the following link: GitHub Repository. The repository includes the model files, data, and scripts required to replicate the bug.
Steps To Reproduce
- Exact steps/commands to build your repro:
- Clone the repository:
bash
git clone https://github.com/username/repo-name.git
cd repo-name
- Install the required dependencies:
bash
pip install -r requirements.txt
- Exact steps/commands to run your repro:
- Attempt to optimize the model using TensorRT:
bash
python optimize_model.py --model_path=model.onnx
- Run inference on the optimized model:
bash
python run_inference.py --model_path=optimized_model.trt --input_data=input_data.json
- Full traceback of errors encountered:
RuntimeError: Unable to load the optimized model.
Error Code: 0x0000001 - TensorRT failed to create the engine.
Check the logs for more details.
I would appreciate any assistance in resolving this issue, as it is critical for the deployment of our AI solution. Thank you!