Originally published at: https://developer.nvidia.com/blog/optimizing-t5-and-gpt-2-for-real-time-inference-with-tensorrt/
TensorRT 8.2 optimizes HuggingFace T5 and GPT-2 models. With TensorRT-accelerated GPT-2 and T5, you can generate excellent human-like texts and build real-time translation, summarization, and other online NLP applications within strict latency requirements.
I am facing issue while converting T5 base model using the steps in the above blog. I was able to convert the T5 -small model to TRT using the above blog and the associated notebook.
Below is the issue I am facing when converting T5-base to TRT:
PolygraphyException Traceback (most recent call last)
in ()
1 t5_trt_encoder_engine = T5EncoderONNXFile(
2 os.path.join(onnx_model_path, encoder_onnx_model_fpath), metadata
----> 3 ).as_trt_engine(os.path.join(tensorrt_model_path, encoder_onnx_model_fpath) + “.engine”)
4
5 t5_trt_decoder_engine = T5DecoderONNXFile(
8 frames
in func_impl(network, config, save_timing_cache)
in func_impl(serialized_engine)
/usr/local/lib/python3.7/dist-packages/polygraphy/logger/logger.py in critical(self, message)
347 from polygraphy.exception import PolygraphyException
348
→ 349 raise PolygraphyException(message) from None
350
351 def internal_error(self, message):
PolygraphyException: Invalid Engine. Please ensure the engine was built correctly
I have also tried increasing precision to FP 32. But still getting the same issue.
Could you please fill an issue at Issues · NVIDIA/TensorRT (github.com) together with some information of your system, e.g. OS, GPU type, Memory…
You can also try the script version, e.g. python3 run.py run T5 trt --variant T5-base
have you solve the problem?