Building new tensorrt engine every time input shape of tracker changed on xavier

Hi , i optimized my model (tracking) with tensorrt , the input shape is [[?,128,64,3]] and output shape is [?,128], i use the jetson xavier development kits, when i run the inference every time i detect a new person and add it to the tracker the display windows is blocked because it build a new tensorrt engine , so it influence time speed , how can i resolve this problem ?

Building a new TensorRT engine for net/TRTEngineOp_1 input shapes: [[1,128,64,3]]
Building a new TensorRT engine for net/TRTEngineOp_1 input shapes: [[2,128,64,3]]
Building a new TensorRT engine for net/TRTEngineOp_1 input shapes: [[3,128,64,3]]

Screenshot from 2020-04-16 09-19-11
As they say here:
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#static-dynamic-mode
Dynamic mode allows you to have unknown shapes in your model, despite the fact that TensorRT requires all shapes to be fully defined. In this mode, TF-TRT creates a new TensorRT engine for each unique input shape that is supplied to the model. For example, you may have an image classification network that works on images of any size where the input placeholder has the shape [?, ?, ?, 3]. If you were to first send a batch of images to the model with shape [8, 224, 224, 3], a new TensorRT engine will be created that is optimized for those dimensions. Since the engine will have to be built at this time, this first batch will take longer to execute than usual. If you later send more images with the same shape of [8, 224, 224, 3], the previously built engine will be used immediately with no additional overhead. If you instead send a batch with a different shape, a new engine would have to be created for that shape. The argument maximum_cached_engines can be used to control how many engines will be stored at a time, for each individual TRTEngineOp in the graph.

i used batch_size=1 it’s work fine but when i changed the size of batch i get the same problem , i need to use batch !!

i initialized different TRTEngineOp (with different shape) and i used The argument maximum_cached_engines when i create tensorrt model

Glad to know it’s resolved by yourself, thanks for the update.

How to initialize different TRTEngineOp with different shape?I didn’t see the API to pass the input shape to TrtGraphConverter .Can you be more specific?

1 Like

Sorry for the late, i hope that it help you , just add maximum_cached_engines=100 (or another number)to trt.TrtGraphConverter

Note: Setting maximum_cached_engines to a very large number like 100 doesn’t increase the memory usage until that many engines actually get built during runtime (maximum_cached_engines is just an upper bound on the number of engines in the cache).

Explanation based on Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation

There are two modes in TF-TRT: static (default mode) and dynamic. The static mode is enabled when is_dynamic_op=False and otherwise dynamic mode is enabled. The main difference between these two modes is that the TensorRT engines are built offline (by TrtGraphConverter.convert) when you are in static mode, whereas in the dynamic mode, the TensorRT engines are built during runtime when the actual inference happens.

Use dynamic mode if you have a graph that has undefined shapes (dimensions that are None or -1). If you try to convert a model which has undefined shapes while in static mode, TF-TRT will issue the following warning:

Input shapes must be fully defined when in static mode. Please try is_dynamic_op=True.

Dynamic mode allows you to have unknown shapes in your model, despite the fact that TensorRT requires all shapes to be fully defined. In this mode, TF-TRT creates a new TensorRT engine for each unique input shape that is supplied to the model.

Dynamic mode allows you to have unknown shapes in your model, despite the fact that TensorRT requires all shapes to be fully defined. In this mode, TF-TRT creates a new TensorRT engine for each unique input shape that is supplied to the model.

The argument maximum_cached_engines can be used to control how many engines will be stored at a time, for each individual TRTEngineOp in the graph.

TensorRT engines are cached in an LRU cache located in the TRTEngineOp op. The key to this cache are the shapes of the op inputs. For example, a new engine is created if the cache is empty or if an engine for a given input shape does not exist in the cache. You can control the number of engines cached with the argument maximum_cached_engines.
TensorRT uses the batch size of the inputs as one of the parameters to select the highest performing CUDA kernels. The batch size is provided as the first dimension of the inputs. The batch size is determined by input shapes during execution when is_dynamic_op is true (TF 2 default), and by the argument max_batch_size when is_dynamic_op is false (TF 1 default). An engine can be reused for a new input, if:

the engine batch size is greater than or equal to the batch size of new input, and
the non-batch dims match the new input

If you want to have a conservative memory usage, set maximum_cached_engines to 1 to force any existing cache to be evicted each time a new engine is created. On the other hand, if your main goal is to reduce latency, then increase maximum_cached_engines to prevent the recreation of engines as much as possible. Caching more engines uses more resources on the machine, however, that is not a problem for typical models.
Note: Setting maximum_cached_engines to a very large number like 100 doesn’t increase the memory usage until that many engines actually get built during runtime (maximum_cached_engines is just an upper bound on the number of engines in the cache).

1 Like