Openvx/Visionworks graph input from GPU memory Buffer

Description

Hello,

I am facing a problem with Siamrpn++ inferencing with tensorrt. This problem is present for
many similar network architecture and is, from what i learned, due to the non-support of the cross-correlation between two dynamic input by tensorrt (tensorrt seems to require a static kernel for this kind of operations).

To solve this issue the solutions found yet are to crop the network architecture before the cross-correlation operation and to do those either manually or using other framework.

One functionnal solution was to couple tensorrt with a second inference engine (onnxruntime) which support this operation. However for unknown reasons performances were terrible when working with onnxruntime (more than 500ms for a single inference while having around 60ms on the pc-based version for the whole network. the hardware difference justify a part of the gap still the gap is not coherent between what was seen on resnet-50 for comparison).

An other solution seem to reimplement the operation in Cuda which seems particularly time consuming and not portable.
The last solution which I am exploring is using openvx/visionworks and the vxMatchTemplate node to implement the mentioned operation.

IN the scope of the last solution I am trying to setup Visionworks image memory to an already allocated GPU buffer (output of tensorrt) but couldn’t find how to do so.
It seems vxCreateImageFromHandle would be a good starting point but the Memory type parameter dontains only Host and None type which doesn’t seems to correspond to GPU memory (thought memory is shared I don’t think pointer are interchangeable this way).

So the question is is there a way to do this in a correct way ? also if you have any recommandation concerning the whole problematic mentionned above, that would be greatly appreciated.

Thank you,
Regards.

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 7.1.3-1
GPU Type: jetson TX2 gpu
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version: 8.0.0.1810-1
Operating System + Version: l4t R32
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Hi,
This looks like a Jetson issue. We recommend you to raise it to the respective platform from the below link

Thanks!

Ok moved here:

Thought tensorrt issue mentionned is not jetson specific (observed on a windows plateform too), and same for visionworks buffer issue (thought visionworks is mainly oriented toward jetson)