Issue with Multiplying a TensorRT with Another which leads to a dimension missmatch at a Concatenate Op

Description

Hello Guys,

I am having an issue with Converting an Onnx Model to TensorRT.
In the PyTorch script, I am Multiplying a vector of dimension torch.Size([1, 3, 52, 52, 2]) with one of torch.Size([1, 3, 1, 1, 2])

twofour = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]
y[…, 2:4] has the shape torch.Size([1, 3, 52, 52, 2])
self.anchor_grid[i] has the shape torch.Size([1, 3, 1, 1, 2])

The result in the PyTorch code is a vector of size torch.Size([1, 3, 52, 52, 2]) (same as the y[…, 2:4])

The Conversion to ONNX goes well but the conversion to TensorRT triggers the following issue:
    [2021-04-30 17:30:48   ERROR] Concat_218: all concat input tensors must have the same number of dimensions, but mismatch at input 1. Input 0 shape: [-1,-1,52,52,2], Input 1 shape: [-1,-1,3,-1,-1,2]
    While parsing node number 219 [Concat -> "384"]:

I put in copy the image of the ONNX Model.

I don’t understand how does this mistake happen. The multiplication is element-wise therefore the results should be the same size.

Here is a more complete snippet of the code in PyTorch:

onetwo = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
twofour = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]
foursix = y[..., 4:] 

new_y = torch.cat([twofour,twofour, foursix], dim=4) but I don't understand why the element-wise multiplication is not understood.

If I replace self.anchor_grid[i] by a float value, it works

Environment

TensorRT Version: 7.0.0
GPU Type: NVIDIA V100
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: Ubuntu 18.0.4
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.8
** ONNX Conversion done with 1.8
** ONNX IR version: 0.0.6
** Opset version: 12
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.01-py3

Relevant Files

Steps To Reproduce

When converting with ./onnx2trt the mentioned issue occurs.

Thanks in advance,

Regards

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

Hi @souquet.leo,

Sorry for the delayed response. Could you please try on latest TensorRT version 8.0 and let us know if you still face this issue. Also noticed dimension mismatch (add is 5D but for right one mul size is 6D).

Thank you.