Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT

Originally published at: https://developer.nvidia.com/blog/optimizing-t5-and-gpt-2-for-real-time-inference-with-tensorrt/

TensorRT 8.2 optimizes HuggingFace T5 and GPT-2 models. With TensorRT-accelerated GPT-2 and T5, you can generate excellent human-like texts and build real-time translation, summarization, and other online NLP applications within strict latency requirements.

I am facing issue while converting T5 base model using the steps in the above blog. I was able to convert the T5 -small model to TRT using the above blog and the associated notebook.

Below is the issue I am facing when converting T5-base to TRT:
PolygraphyException Traceback (most recent call last)
in ()
1 t5_trt_encoder_engine = T5EncoderONNXFile(
2 os.path.join(onnx_model_path, encoder_onnx_model_fpath), metadata
----> 3 ).as_trt_engine(os.path.join(tensorrt_model_path, encoder_onnx_model_fpath) + “.engine”)
5 t5_trt_decoder_engine = T5DecoderONNXFile(

8 frames
in func_impl(network, config, save_timing_cache)

in func_impl(serialized_engine)

/usr/local/lib/python3.7/dist-packages/polygraphy/logger/logger.py in critical(self, message)
347 from polygraphy.exception import PolygraphyException
→ 349 raise PolygraphyException(message) from None
351 def internal_error(self, message):

PolygraphyException: Invalid Engine. Please ensure the engine was built correctly

I have also tried increasing precision to FP 32. But still getting the same issue.

Could you please fill an issue at Issues · NVIDIA/TensorRT (github.com) together with some information of your system, e.g. OS, GPU type, Memory…

You can also try the script version, e.g. python3 run.py run T5 trt --variant T5-base

have you solve the problem?