DLA allows only same dimensions inputs to Elementwise

Description

When I am trying to create the engine file on DLA using below command
trtexec --onnx=./test_mul.onnx --explicitBatch --workspace=1024 --saveEngine=./test_mul_fp16.trt --verbose --fp16 --useDLACore=0 --allowGPUFallback

the multiplication layer in the onnx model is falling back to GPU instead of running on DLA with this warning
DLA allows only same dimensions inputs to Elementwise

how can we make the multiplication layer in this model to run on DLA?

Environment

TensorRT Version: 7.1.3
GPU Type: xavier
Nvidia Driver Version: Package:nvidia-jetpack, Version: 4.4
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

onnx file test_mul.onnx - Google Drive

Steps To Reproduce

run below command

trtexec --onnx=./test_mul.onnx --explicitBatch --workspace=1024 --saveEngine=./test_mul_fp16.trt --verbose --fp16 --useDLACore=0 --allowGPUFallback

Hi,
Please check the below links, as they might answer your concerns.

Thanks!

Thanks, but my problem is that the multiplication operator was running fine on GPU, but when I am trying to run it on DLA, it’s automatically falling back to GPU. It’s saying that Pointwise multiplication with broadcast is not supported on DLA, Does this issue is fixed in the latest version of TensorRT or it’s still a limitation on DLA.

Hi,

Moving this post to the Jetson Xavier forum so the Jetson team can take a look for a better help.

Thank you.

Hi,

Unfortunately no.
We test your model on TensorRT 8.2 and 8.4, the same error occurs.
Please enable --allowGPUFallback flag to use GPU instead.

Thanks.

Okay, thanks for the reply. I have enabled that flag and it ran on GPU, but Nowhere in the DLA documentation, there is not mentioned this broadcasting issue right?

Hi,

On the contrary, we document that GPU supports broadcast feature if one of the input tensors has lengths equal to 1.

Since DLA is a hardware-based inference engine, it is not as flexible as GPU.
It can only support the basic elementwise operators.

Thanks.

Sorry for the delay and thanks for your reply

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.