DRIVE AGX: Onnx parsing error. Not supported operation

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
[*] DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
[*] Linux
QNX
other

Hardware Platform
[*] NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
[*] 1.5.0.7774
other

Host Machine Version
[*] native Ubuntu 18.04
other

Dear support team !

In scope of my project, I’ve used OnnxYolov4 model for development by using TensorRT-7.2.3.4 package on host machine.
No issues wasn’t observed during parsing ONNX model and further steps.
The C++ code has been compiled for aarch64 and launched on targedt Drive AGX platform.
As a result Ive gotten error:

[06/01/2021-18:36:05] [W] [TRT] ModelImporter.cpp:140: No importer registered for op: Equal. Attempting to import as plugin.
[06/01/2021-18:36:05] [I] [TRT] builtin_op_importers.cpp:2191: Searching for plugin: Equal, plugin_version: 1, plugin_namespace:
While parsing node number 375 [Equal]:
ERROR: builtin_op_importers.cpp:2193 In function importFallbackPluginImporter:
[8] Assertion failed: creator && “Plugin not found”
FailEqualOpAgx.log (10.9 KB)

If I understood correctly, TensorRT 6.3.1 package on target platform doesn’t support op: Equal .
In other words the library Onnx model parsing on target platform can’t support this operation:
/usr/lib/aarch64-linux-gnu/libnvonnxparser.so.6.3.1
On the other hand, libnvonnxparser.so.7.2.1 already has this support and it can be explained the difference in results.
Please correct me if I’m wrong.

Could you please help me to understand what is the most fastest and correct solution w/o writing a custom plugin would be done:

  1. DRIVE OS Linux 5.2.0 has TensorRT 6.3.1, does Nvidia have any plans to provide the latest TensorRT package in the future ?
  2. Is it possible to compile libnvonnxparser.so.7.2.1 locally on the host for aarch64 or directly on target platform
    and replace it ?

I appreciate your help in any case.

Dear @anton.nesterenko,
Yes. Next DRIVE release will have next TensorRT release. As a WAR, you may try getting the TRT 7.x libs other needed libs from Jetson platform and use them on DRIVE. But this is not officially supported and you may notice issues.

On DRIVE platform, with TRT 6.3 libs, you need to write custom plugin to get it work…

Dear @SivaRamaKrishnaNV ,

I’ve tried to replace TRT libs on DRIVE with Jetson’s versions, but a lot of conflicts/dependencies…
So I’ve decided to follow the next scenario.

The main idea is porting operation code from the latest branch to TRT 6.3 w/o writing a custom plugin

  1. I found a code for missing operation in master branch:
    GitHub - NVIDIA/TensorRT at master
    https://github.com/onnx/onnx-tensorrt/blob/a80015b29668722bc4d8e226f64c3fe8edf60179/builtin_op_importers.cpp#L1054
    https://github.com/onnx/onnx-tensorrt/blob/a80015b29668722bc4d8e226f64c3fe8edf60179/onnx2trt_utils.cpp#L814
  2. AGX platform has TRT v6.3
    I’ve cloned source code release 6.0 to AGX :
    GitHub - NVIDIA/TensorRT at release/6.0
  3. Performed compilation directly on AGX for aarch64:
    make nvonnxparser
    Result: libnvonnxparser.so.6.0.1
    Copied to /usr/lib/aarch64-linux-gnu/ and set new link for using this libnvonnxparser
  4. Run my project on AGX and got new error:
    Please see the attached log file:
    OnnxIRerror.txt (1.8 KB)

— End node —
ERROR: /home/nvidia/TensoRT_C++/REMOVE/TensorRT/parsers/onnx/builtin_op_importers.cpp:757 In function importConv:
[8] Assertion failed: (nbSpatialDims == 2 && kernel_weights.shape.nbDims == 4) || (nbSpatialDims == 3 && kernel_weights.shape.nbDims == 5)
nvidia@tegra-ubuntu:~/TensoRT_C++/ForAGX/Aarch64/bin$

Looks like the code for 6.0.1 doesn’t support the current model.
5. The next step would be added code for , but it doesn’t make sense.

  1. Could you please correct my actions, is it a correct approach ?
  2. Could you please provide me link to repo for building TRT 6.3 lib the same version as already preinstalled on AGX ?
  3. If I have to write a custom plugin for and AGX board, please give me detailed directions how to do it ?

Thank you for support,

Dear @anton.nesterenko,
Could you please provide me link to repo for building TRT 6.3 lib the same version as already preinstalled on AGX?

The TensorRT package on DRIVE AGX comes along with DRIVE OS. There is no separate repo/package.

You need to implement custom plugins for unsupported layer. Please check Developer Guide :: NVIDIA Deep Learning TensorRT Documentation for mode details

Dear @SivaRamaKrishnaNV ,

I’ve understood your point and will develop a new custom plugin.

Could you please tell me when the next DRIVE OS release will be(approximately) and may we expect that future release will contain TRT 7 libs ?

Thanks for support.

Dear @anton.nesterenko,
The next DRIVE OS release is scheduled this month. But it does not have TRT 7 libs.

Dear @SivaRamaKrishnaNV ,

I’ve added a custom plugin for node “Equal_375”.
Please see the attached file EqualNode.png

For adding this plugin I’ve followed some examples and overridden necessary methods of IPluginV2DynamicExt
JFYI, the plugin crashed with using class IPluginV2Ext
I didn’t develop a full source code, only for passing plugin’s error
Looks like the plugin was applied and found, please FailCast.log
FailCast.log (300.0 KB)

However I’ve got error:

[06/23/2021-15:50:40] [V] [TRT] ModelImporter.cpp:186: Equal_375 [Equal] outputs: [1111 → (6)],
[06/23/2021-15:50:40] [V] [TRT] ModelImporter.cpp:108: Parsing node: Cast_376 [Cast]
[06/23/2021-15:50:40] [V] [TRT] ModelImporter.cpp:124: Searching for input: 1111
[06/23/2021-15:50:40] [V] [TRT] ModelImporter.cpp:130: Cast_376 [Cast] inputs: [1111 → (6)],
Unsupported ONNX data type: BOOL (9)
While parsing node number 376 [Cast → “1112”]:
— Begin node —
input: “1111”
output: “1112”
name: “Cast_376”
op_type: “Cast”
attribute {
name: “to”
i: 9
type: INT
}
— End node —
ERROR: builtin_op_importers.cpp:308 In function importCast:
[4] Assertion failed: static_cast(dtype) != -1
destroy()

This error for the next node “Cast_376”
CastNode.png

I suppose the error from libnvonnxparser.so .

I’ve compared execution logs for the model with using TensorRT 7.x
FullVerboseTRT7.log
FullVerboseTRT7.log (2.1 MB)

[06/13/2021-16:48:57] [V] [TRT] ModelImporter.cpp:179: Equal_375 [Equal] outputs: [1111 → (6)],
[06/13/2021-16:48:57] [V] [TRT] ModelImporter.cpp:103: Parsing node: Cast_376 [Cast]
[06/13/2021-16:48:57] [V] [TRT] ModelImporter.cpp:119: Searching for input: 1111
[06/13/2021-16:48:57] [V] [TRT] ModelImporter.cpp:125: Cast_376 [Cast] inputs: [1111 → (6)],
[06/13/2021-16:48:57] [V] [TRT] builtin_op_importers.cpp:320: Casting to type: bool
[06/13/2021-16:48:57] [V] [TRT] ImporterContext.hpp:154: Registering layer: Cast_376 for ONNX node: Cast_376
[06/13/2021-16:48:57] [V] [TRT] ImporterContext.hpp:120: Registering tensor: 1112 for ONNX tensor: 1112
[06/13/2021-16:48:57] [V] [TRT] ModelImporter.cpp:179: Cast_376 [Cast] outputs: [1112 → (6)],

It seems no issue for casting operation.

Could you please help me to figure out with :

  1. DRIVE AGX uses libnvonnxparser.so.6.3.1
    Is it possible that the new issue relates with oldest version of TRT6 instead of TRT7 and I can’t fix it
    with using my current ONNX model ?

  2. Due to plugin, wasn’t developed correctly I have new issue ?
    May be I need to somehow try to cast output tensor to supported data type ?
    However, I think it won’t help because the model already has the node with expected data type 9.

  3. Do I need develop or rewrite already registered plugin for this case ?

Thank a lot for support,

Dear @anton.nesterenko,
TRT 6.3 does not support BOOL datatype.

Can you please inform your schedule about the release of the DRIVE OS that has TensorRT 7.x libs?

Hi @kyehliu ,

Release schedule won’t be communicated in the forum. Please contact with your nvidia rep to see if they have any to share.