TensorRT and IOptimizationProfile

Description

I am trying to use TensorRT IOptimizationProfiles to compile an engine from tensorflow via ONNX to TensorRT, for use with different image matrix sizes.
The conversion/compile works and also the inference using the different profiles works. What confuses me is the CUDA memory used by the IExecutionContext, which I track with cudaMemGetInfo. For compiling the ONNX model to TensorRT I use custom code as trtexec does not seem to allow to specify more than one optimization profile. For the optimization profiles I use setExtraMemoryTarget(0) to keep additional memory as low as possible (however, in my experiments this setting did not really make a difference anyway).

For instance, for my custom model and a single optimization profile with a shape of (1,1,896,896) the context uses 264 MB cuda memory.

For the two shapes (and thus two optimization profiles) (1,1,896,896) and (1,1,768,768) I get a CUDA memory consumption for the context of 633 MB, which is MORE than double the memory used by the single (larger) profile.

What is going on here? Is this expected behavior? What is the use of optimization profiles if I can get less memory consumption (ok minus the memory taken up by the model itself) if I just compile and load 2 separate engines compared to having 1 engine with optimization profiles?

Is there any further documentation of the usage of optimization profiles other than the C++ API reference manual?

Environment

TensorRT Version: 7.2.3
GPU Type: Quadro P2000
Nvidia Driver Version: 27.21.14.5206
CUDA Version: 11.0
CUDNN Version: 8.1
Operating System + Version: Win10
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 1.15.3

Hi @daum,

Sorry for the delayed response. Could you please share with us minimal issue reproducible scripts/onnx model to try from our end.

Thank you.

Hi,

while trying to boil down my use case, I discovered that the behavior was due to a wrong calculation of the shapes provided by the OptimizationProfiles, i.e. user error. With the correct shapes, memory consumption for the above example is pretty much identical in both cases. Sorry, my bad…

However, I still have a related question:
From my experiments so far it seems, a context is always allocated large enough, such that inference with any supported optimization profile is possible and not just large enough for the one currently set. Is this correct?

While this is good for quickly switching between optimization profiles (no additionaly runtime overhead for this) it takes away possibilities for memory management. Is there a way (or will there be a way) to make a context only support a specific optimization profile in order to benefit (memory wise) from an optimization profile for a small shape and not have it always allocate enough memory for the largest one?

Thank you.

Hi @daum,

Glad to know your issue is resolved.
Regarding context allocated, currently we do not have a way to reduce this. As of now, largest size will be allocated. This may be optimized in future releases.

Thank you.