Internal Error (Assertion isMultiple(tensor.start[vectorDim], spv) failed. ) when loading ONNX file

Description

An internal error happens while loading an ONNX model.

I exported a model as an ONNX file, but TensorRT doesn’t support one of the layers. I’m making a plugin and implemented enough of the interfaces that the network parses successfully.

However, it encounters an internal error during buildSerializedNetwork.

As far as I can tell, this is not due to the plugin. None of the plugin’s NYI methods are being called between the time the parser finishes and the time the error occurs. Also, the optimization step that fails is dealing with tensors whose dimensions are very different than the ones for the plugin’s layers.

Here is the tail of the logs where the failure occurs:

V:--------------- Timing Runner: Cast_437 (Cast)
V:Cast has no valid tactics for this config, skipping
V:--------------- Timing Runner: Cast_437 (Reformat)
V:Tactic: 0x00000000000003e8 Time: 0.0688853
V:Tactic: 0x00000000000003ea Time: 0.0320223
V:Tactic: 0x0000000000000000 Time: 0.0188113
V:Fastest Tactic: 0x0000000000000000 Time: 0.0188113
V:>>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x0000000000000000
V:*************** Autotuning format combination: Float(32768,4096:32,64,1) -> Float(1:4,4096,64,1) ***************
V:--------------- Timing Runner: Cast_437 (Cast)
V:Cast has no valid tactics for this config, skipping
V:--------------- Timing Runner: Cast_437 (Reformat)
V:Tactic: 0x00000000000003e8 Time: 0.254352
V:Tactic: 0x00000000000003ea Time: 0.0640907
V:Tactic: 0x0000000000000000 Time: 0.324917
V:Fastest Tactic: 0x00000000000003ea Time: 0.0640907
V:>>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003ea
V:*************** Autotuning format combination: Float(1:4,4096,64,1) -> Float(1048576,4096,64,1) ***************
V:Deleting timing cache: 613 entries, served 16507 hits since creation.
E:2: [engineTacticSupplyHelpers.cpp::makeEngineTensor::55] Error Code 2: Internal Error (Assertion isMultiple(tensor.start[vectorDim], spv) failed. )
E:2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

Environment

TensorRT Version: 8.4.1
GPU Type: Titan RTX
Nvidia Driver Version: WSL: 516.59 Native: 515.48
CUDA Version: 11.7
CUDNN Version: 8
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): N/A
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): ONNX model was exported with PyTorch 1.12
Baremetal or Container (if container which image + tag): Bare metal (both WSL and native)

Relevant Files

Example project including the broken model: trt_failure.zip

Steps To Reproduce

  1. Run the following:
$ unzip trt_failure.zip
$ cd trt_failure/
$ ./build.sh
$ ./hello_trt_plugin
  1. Observe failure above
  2. Copy some other onnx file to ./model.onnx (e.g. mnist.onnx)
  3. Observe Success message

Hi,
Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.

Thanks!

I’m confused. This plugin uses IPluginV2DynamicExt.

In grid_sample_plugin.h:

class GridSamplePlugin : public IPluginV2DynamicExt {

@NVES please have another look. I think you were mislead by a comment indicating which interface the implementations were overriding. This code does not inherit from the old versions of the interface.

I also think this answer is misleading. If these old interfaces were included for backward compatibility, then they should still work. Again, this is beside the point because this example does inherit from the newest interface.

Update: if I downgrade to TensorRT 8.0.3, it no longer fails during this step.

Hi,

We could reproduce the error. Please allow us some time to work on this issue.

Thank you.

1 Like

FWIW, I also tried TensorRT 8.2.* but get a completely different error:

V:--------------- Timing Runner: {ForeignNode[Sub_4504...Transpose_4478 + Reshape_4479 + Reshape_4487]} (Myelin)
W:Skipping tactic 0 due to Myelin error: myelinTargetSetPropertyMemorySize called with invalid memory size (0).
V:Fastest Tactic: -3360065831133338131 Time: inf
V:Deleting timing cache: 484 entries, 10723 hits
E:10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[Sub_4504...Transpose_4478 + Reshape_4479 + Reshape_4487]}.)
E:2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Check failed: plan
Aborted

Hi,

That’s fine. As we are not facing the above error in the latest version(It might have been fixed). We recommend you to please try using TensorRT 8.4.3 recent release. If you still face this issue, you can expect it will be fixed in future releases.

Thank you.