We tried converting T5 xl model to TensorRT as per instructions at https://developer.nvidia.com/blog/optimizing-t5-and-gpt-2-for-real-time-inference-with-tensorrt/
I am not able to build TensorRT OSS and running into following error:
/urs/bin/aarch64-linux-gnu-g++ is not a full path to existing compiler tool.
I tried converting encoder and decoder separately into tensorrt but could not run the converted models in triton server.
GPU Type: A100
Nvidia Driver Version: 530.30.02
CUDA Version: 12.1
Operating System + Version:
Python Version (if applicable): 3.9.2
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):