During integration of dynamic shape support for a detection algorithm, I’ve encountered an interesting behavior of TensorRT. It seems that the device memory consumption depends on the maximum input size across all optimization profiles and, particularly, not limited to the currently used profile or the currently used input resolution. I’ve tested this behavior by adding 3 profiles, selecting the 2nd profile and tweaking the max size for the 3rd profile (the actual input shape and 1/2nd profiles are kept unchanged).
So, it looks like device memory is allocated according to the worst possible case (ie the upper boundary on the input shape across all profiles). Is this understanding correct? If so, is it possible to work around the limitation (for example, creating engine without memory and plugging a sufficient workspace buffer as needed)?
P.S. The motivation is to allow working with a wide range of input resolutions, but pay (in terms of memory) for large resolutions only when we’ve encountered such large resolution.
TensorRT Version: v126.96.36.199
GPU Type: RTX 2070
Nvidia Driver Version: 470.63.01
CUDA Version: v11.4.3
CUDNN Version: v188.8.131.52
Sorry for the delay in addressing this issue. We will get back to you in 1 or 2 days.
Thank you for replying.
I’ve looked into the suggested API. As I understand, I’ll need to use IExecutionContext::setDeviceMemory()  method to provide my own device memory buffer. The documentation of this method states that I would need at least ICudaEngine::getDeviceMemorySize()  bytes. But getDeviceMemorySize() does not takes any parameter that would allow me to select a particular optimization profile. And API for choosing optimization profiles is present in IExecutionContext class  and not ICudeEngine.
In short, it seems that I can use my own device buffer, but its size does not depend on the optimization profile (and corresponds to the same worst-case profile logic). So, I’m effectively in the same sitation as without custom device buffers. Or did I miss something?
Good question, yes you’re right. Always allocating for the worst case is a simplifying assumption in the executor - we’re assuming that the application would need to budget memory for your worst expected case. But this assumption isn’t always valid.
Thank you for the information. Can we expect any improvements on the subject in the foreseeable future?
Yes, this may be improved in the future releases.
Hi! Are there any advancements on the topic? TensorRT significantly reduces memory consumption for my model, but allocating memory for the worst case negates all improvements:(
I would like to be able to independently control the required memory.