I am working with the Tensorflow 2.0 project that uses multiple models for inference.
Some of those models were optimized using TF-TRT.
I tried both regular offline conversion and offline conversion with engine serialization. In case of regular conversion TensorRT engine is rebuilt every time model execution context changes. While using models with serialized engines, I’m not able to load more than one TensorRT-optimized models.
My application uses single Session at runtime.
I am using
nvcr.io/nvidia/tensorflow:19.12-tf2-py3 docker container to optimize models and run the application.
More about the issue in:
What is the correct approach to run simultaneously multiple TensorRT-optimized models with pre-built engines using Tensorflow?
Is it a valid solution to use a separate Session for each of those models?