I have a trained TensorFlow object detection model that I would like to run on a Jetson Nano, and I’m looking for the most efficient way to run the model. The model contains operations not supported by TensorRT, so I’m looking at using TensorFlow-TensorRT.
My model is dynamically sized so I’m setting is_dynamic_op=True, however the startup time on the Jetson Nano is quite slow, about 10 minutes. Ideally I’d like to power up the Nano and start processing input in < 1 minute.
In practice I will always know up-front what the model size will be, and for each run it will always be the same. I see from the TRTEngineOp docs (1) that the op will cache the dynamically generated model, is it possible to export that engine so that I can deploy a pre-built engine to improve startup time?
Alternatively, is there a way to set is_dynamic_op=False when using TensorFlow-TensorRT by specifying any unknown sizes at convert time?
Any help would be appreciated!
(1) https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#cache-var