I am facing a problem with Siamrpn++ inferencing with tensorrt. This problem is present for
many similar network architecture and is, from what i learned, due to the non-support of the cross-correlation between two dynamic input by tensorrt (tensorrt seems to require a static kernel for this kind of operations).
To solve this issue the solutions found yet are to crop the network architecture before the cross-correlation operation and to do those either manually or using other framework.
One functionnal solution was to couple tensorrt with a second inference engine (onnxruntime) which support this operation. However for unknown reasons performances were terrible when working with onnxruntime (more than 500ms for a single inference while having around 60ms on the pc-based version for the whole network. the hardware difference justify a part of the gap still the gap is not coherent between what was seen on resnet-50 for comparison).
An other solution seem to reimplement the operation in Cuda which seems particularly time consuming and not portable.
The last solution which I am exploring is using openvx/visionworks and the vxMatchTemplate node to implement the mentioned operation.
IN the scope of the last solution I am trying to setup Visionworks image memory to an already allocated GPU buffer (output of tensorrt) but couldn’t find how to do so.
It seems vxCreateImageFromHandle would be a good starting point but the Memory type parameter dontains only Host and None type which doesn’t seems to correspond to GPU memory (thought memory is shared I don’t think pointer are interchangeable this way).
So the question is is there a way to do this in a correct way ? also if you have any recommandation concerning the whole problematic mentionned above, that would be greatly appreciated.
A clear and concise description of the bug or issue.
TensorRT Version : 7.1.3-1
GPU Type : jetson TX2 gpu
Nvidia Driver Version :
CUDA Version : 10.2
CUDNN Version : 188.8.131.520-1
Operating System + Version : l4t R32
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :