Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT

jwitsoe · December 2, 2021, 5:00pm

Originally published at: https://developer.nvidia.com/blog/optimizing-t5-and-gpt-2-for-real-time-inference-with-tensorrt/

TensorRT 8.2 optimizes HuggingFace T5 and GPT-2 models. With TensorRT-accelerated GPT-2 and T5, you can generate excellent human-like texts and build real-time translation, summarization, and other online NLP applications within strict latency requirements.

user158228 · January 31, 2022, 12:41pm

I am facing issue while converting T5 base model using the steps in the above blog. I was able to convert the T5 -small model to TRT using the above blog and the associated notebook.

Below is the issue I am facing when converting T5-base to TRT:
PolygraphyException Traceback (most recent call last)
in ()
1 t5_trt_encoder_engine = T5EncoderONNXFile(
2 os.path.join(onnx_model_path, encoder_onnx_model_fpath), metadata
----> 3 ).as_trt_engine(os.path.join(tensorrt_model_path, encoder_onnx_model_fpath) + “.engine”)
4
5 t5_trt_decoder_engine = T5DecoderONNXFile(

8 frames
in func_impl(network, config, save_timing_cache)

in func_impl(serialized_engine)

/usr/local/lib/python3.7/dist-packages/polygraphy/logger/logger.py in critical(self, message)
347 from polygraphy.exception import PolygraphyException
348
→ 349 raise PolygraphyException(message) from None
350
351 def internal_error(self, message):

PolygraphyException: Invalid Engine. Please ensure the engine was built correctly

I have also tried increasing precision to FP 32. But still getting the same issue.

vinhn · February 8, 2022, 3:48am

Could you please fill an issue at Issues · NVIDIA/TensorRT (github.com) together with some information of your system, e.g. OS, GPU type, Memory…

vinhn · February 12, 2022, 12:39pm

You can also try the script version, e.g. python3 run.py run T5 trt --variant T5-base

2805409260 · March 21, 2022, 7:26am

have you solve the problem?

Topic		Replies	Views
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1100	December 13, 2022
Speeding up Deep Learning Inference Using TensorFlow, ONNX, and TensorRT Technical Blog	19	1555	June 27, 2022
Exploring NVIDIA TensorRT Engines with TREx Technical Blog	0	460	June 16, 2022
Trying to run TensorFlow 1.15 produced graphdefs with TF2 based tensorRT but TensorRT model is not building correctly TensorRT tensorrt , tensorflow , python , inference-server-triton , machine-learning	4	958	May 13, 2021
Deploy Object Detection TF-TRT INT8 with DS Triton DeepStream SDK inference-server-triton	16	1316	October 12, 2021
Trying to run TensorFlow 1.15 produced graphdefs with TF2 based tensorRT but TensorRT model is not building correctly TensorRT	6	999	July 15, 2021
TensorRT Batch Inference: different results TensorRT	4	4245	December 1, 2021
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1261	February 25, 2020
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2923	January 18, 2019
Object Detection on GPUs in 10 Minutes Technical Blog	8	605	October 20, 2019

Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT

Related topics