Save engine cached in dynamic TRTEngineOp

douglas.l · November 27, 2019, 11:49am

I have a trained TensorFlow object detection model that I would like to run on a Jetson Nano, and I’m looking for the most efficient way to run the model. The model contains operations not supported by TensorRT, so I’m looking at using TensorFlow-TensorRT.

My model is dynamically sized so I’m setting is_dynamic_op=True, however the startup time on the Jetson Nano is quite slow, about 10 minutes. Ideally I’d like to power up the Nano and start processing input in < 1 minute.

In practice I will always know up-front what the model size will be, and for each run it will always be the same. I see from the TRTEngineOp docs (1) that the op will cache the dynamically generated model, is it possible to export that engine so that I can deploy a pre-built engine to improve startup time?

Alternatively, is there a way to set is_dynamic_op=False when using TensorFlow-TensorRT by specifying any unknown sizes at convert time?

Any help would be appreciated!

(1) https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#cache-var

SunilJB · December 6, 2019, 8:38am

Hi,

Few ways you can optimize the initialization:
1)Use opt_profiles in TF-TRT:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#opt_profiles
2)You can also make all the shapes known beforehand and then use is_dynamic_op=False

In TF-TRT v1, you can also use the save() method to serialize the engines but that only works if you use SavedModel format.
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#integrate-ovr

Thanks