Terminate when parsing ONNX graph (nvinfer1::AssertionFailure)

frenzi · October 19, 2020, 11:19am

Description

When parsing ONNX graph, it returns early with “terminate called after throwing an instance of ‘nvinfer1::AssertionFailure’”. I am trying to develop two cusom operations, correlation and grid_sampler, however it does not seem to be breaking at these…yet. It is able to go past the first correlation node corretly, however it fails before it reaches the grid_sampler. The correlation node is implemented as IPluginV2DynamicExt as I previously had errors telling me I needed to use that. Here is an except of the last few messages:

UNKNOWN: ModelImporter.cpp:103: Parsing node: Tile_3508 [Tile]
UNKNOWN: ModelImporter.cpp:119: Searching for input: 5468
UNKNOWN: ModelImporter.cpp:119: Searching for input: 5473
UNKNOWN: ModelImporter.cpp:125: Tile_3508 [Tile] inputs: [5468 -> (-1)], [5473 -> (3)], 
UNKNOWN: ImporterContext.hpp:141: Registering layer: Tile_3508 for ONNX node: Tile_3508
terminate called after throwing an instance of 'nvinfer1::AssertionFailure'
  what():  std::exception
Aborted (core dumped)

Here’s an except from the correlation layer just in case.

UNKNOWN: ModelImporter.cpp:103: Parsing node: correlation_3454 [correlation]
UNKNOWN: ModelImporter.cpp:119: Searching for input: 3568
UNKNOWN: ModelImporter.cpp:119: Searching for input: 5379
UNKNOWN: ModelImporter.cpp:125: correlation_3454 [correlation] inputs: [3568 -> (-1, 384, 27, 8)], [5379 -> (-1, 384, 27, 8)], 
INFO: ModelImporter.cpp:135: No importer registered for op: correlation. Attempting to import as plugin.
INFO: builtin_op_importers.cpp:3659: Searching for plugin: correlation, plugin_version: 1, plugin_namespace: 
INFO: builtin_op_importers.cpp:3676: Successfully created plugin: correlation
UNKNOWN: ImporterContext.hpp:141: Registering layer: correlation_3454 for ONNX node: correlation_3454
UNKNOWN: ImporterContext.hpp:116: Registering tensor: 5406 for ONNX tensor: 5406
UNKNOWN: ModelImporter.cpp:179: correlation_3454 [correlation] outputs: [5406 -> (-1, 81, 27, 8)],

Environment

TensorRT Version: 7.1.3.4-1+cuda10.2
GPU Type: GTX1070 Ti
Nvidia Driver Version: 440.33.01
CUDA Version: cuda 10.2
CUDNN Version: 8.0.0.180-1+cuda10.2
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): 1.6.0
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Code can be found here.

ONNX file can be found here, I’ve inspected the graph with Netron and it looks fine.
https://drive.google.com/file/d/1GTPWH5JXSbFXn2VmSHMQ-T-mEHPzseYl/view?usp=sharing

Steps To Reproduce

Compile and run the executable I guess, you may need to change some path settings for the location of the onnx graph and class id’s (which is just a text file, just put 18 rows of jibberish and you should be fine if there’s any issues).

SunilJB · October 19, 2020, 4:21pm

Hi @frenzi,
Thanks for sharing the details. Will try to look into it and get back with update.

Thanks

frenzi · October 22, 2020, 1:01pm

I just got my 3090 and I (painfully) reinstalled and upgraded everything to the latest i.e. TensorRT 7.2.1 + CUDA 11.1 now when I am building the graph it gives me a more descriptive error which is:

UNKNOWN: ImporterContext.hpp:154: Registering layer: Tile_3508 for ONNX node: Tile_3508
INTERNAL_ERROR: Assertion failed: equalIfKnown(a, b)
../builder/Layers.cpp:121
Aborting...
terminate called after throwing an instance of 'nvinfer1::AssertionFailure'
  what():  std::exception
Aborted (core dumped)

AakankshaS · January 28, 2021, 4:44pm

Hi @frenzi ,
This looks like an ONNX model issue.
Can you please raise it in the respective forum

Thanks!

frenzi · January 29, 2021, 2:36am

Hi @AakankshaS
Once PyTorch 1.7 released a few days later, Oct 28, I moved to that as it had native CUDA 11 support (rather than recompiling master). It seems to build the ONNX graph in a slightly different way and consequently everything works fine now.
Cheers