How to deploy models in which output size depends on the contents of the input image?

Hi,

There are some deep learning models in which output size of the network depends on the contents/features of the input image but not the size of it. For example, output size of some Object Detection models change with respect to the number of objects in the image.

For models that output size depends on the input size/shape, they can be deployed with dynamic shapes. Because before the inference, one can get the input image size/shape and deduce the output size from it. However in my case, output size is not known before the inference.

How can I deploy such models?

I think, TRT supports dynamic output shapes as well. It should handle this case with the normal dynamic shapes workflow.

You can try trtexec command line tool to test, debug and convert you model:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec#example-4-running-an-onnx-model-with-full-dimensions-and-dynamic-shapes

Thanks

But I can’t know how many bytes to allocate for the output buffers before execution. Even if I allocate a large chunk of memory, after the inference, I won’t know how many bytes of the output buffer would be useful, how many bytes would be excessively allocated.

Hi @SunilJB,

Do you have any suggestions regarding this issue?

Thanks.

Hi,

Can you try setting output dimension using network as suggested in the below sample:
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L182

Thanks

Hi @SunilJB ,

There have been other parsing issues regarding the model I work on, so I can’t try your suggestion right now.

What I see is, the sample code gets the network dimensions before creating the engine. Then uses these dimensions as sizes of the host/device buffers before inference. But I can’t understand how this would be helpful if output size solely depend on image contents (number of detected objects/pixels for Object Detection problem).

As far as I know, TensorRT’s dynamic shapes work such that it somehow deduces output size from the input sizes, or user can set dynamic output size before enqueue/execute. For example, using dynamic shapes, I can make batch size dimension of the output shape dynamic, and set sizes of the output buffers depending on number of frames fed into my program. TensorRT allows this kind of operation. Or for some networks, length of the output array may depend on input image’s width/height. Again, in this case I can determine the size of output buffer just before inference using the width/height of input image.

However, in my case, I don’t see a way to get output dimensions because they are obtained after inference. In DL frameworks, output tensors are dynamic and they can expand/shrink depending on the inference results. Does TensorRT provide this kind of operation?

Thanks.

1 Like