TensorRT optimization for DLA

We are trying to use tensorRT_optimization tool to optimize some fairly standard segmentation models (a lot of convolution/deconvolution layers), which works fine when optimizing for GPU. However, when optimizing for DLA using --useDLA flag the thing breaks with some weird errors including
Error (conv2d/kernel not running on DLA)
and
Error (truediv/y not running on DLA)

What’s the situation with the DLA support in the tool?

In the Nvidia Drive 9.0 software release notes it is mentioned
6.15 TensorRT Deep Learning Accelerator Performance Limitation
In this release of DRIVE Software, TensorRT does not include full Deep Learning
Accelerator (DLA) support and therefore it is not at full performance.

Is this what we are encountering when using the tool? Is there a workaround?

Hi,

Could you add –winograd 0 to seee if helps?

Thanks.

Unfortunately, this didn’t help, the errors are exactly the same.

We can’t seem to find any record of this --winograd option. Is there a complete documentation of all options somewhere?

Hi,

You can find some related information in our TensorRT document.
The TensorRT version of DRIVE Software 9.0 is v5.0.
Here is the support matrix for DLA in v5.0:
[url]TensorRT Support Matrix :: Deep Learning SDK Documentation

The

The support matrix you linked states that 2D convolution layers are supported on DLA. So we’re not sure
why we get “conv2d/kernel not running on DLA” error. There are broadcasting limitations for convolution layers stated here, but we don’t seem to have an issue with that since these limitations are not DLA specific (and our model optimizes for GPU). We have also found more details about DLA support on
Developer Guide :: NVIDIA Deep Learning TensorRT Documentation (this doesn’t seem to be 5.0.3 version specific). Again we are complying to these guides in our conv2D use.

Also we can’t find mention of winograd option in the TensorRT documentation. We are not sure if you’re referring to trtexec tensorRT tool or AGX Drive specific /usr/local/driveworks/tools/dnn/tensorRT_optimization tool. We’re trying to use the latter tensorRT_optimization tool, as described in Nvidia Drive documentation.

Dear tonci.antunovic,
What’s the situation with the DLA support in the tool?

If you are generating tensorRT model for DLA using TensorRT_Optimization, all layers your DNN need to be supported on DLA. Incase of trtexec you can use --allowGPUFallback flag to allow unsupported layers to run using GPU. This is not supported with TensorRT_Optimization tool.

As you are trying to get tensorRT model for DLA, could you double check if you have all DLA supported layers in DNN? Please share layer details to understand the issue. It would be great if you can share network architecture file to reproduce the issue on our end.