Description
Hi, I have a plugin with IPluginV2DynamicExt, the output tensor shape of my algorithm used in enqueue is not fixed, the output tensor has a shape of [N, C], for my algorithm, N is decided by the input content rather than input shape, even if the input shape is fixed each time, the output shape maybe [N1, C] or [N2, C]… So I am wondering can I just use cudaMalloc to malloc a memory for plugin outputs in enqueue at runtime?
Right now, I just reserved a MAX_N size gpu memory for outputs which is a waste of gpu resource and the computation is expensive.
Environment
TensorRT Version: 7.1.3.4
GPU Type: xavier
Nvidia Driver Version: 440.33
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.7