Internal Error (Assertion isMultiple(tensor.start[vectorDim], spv) failed. ) when loading ONNX file

cogwheel · July 29, 2022, 8:53pm

Description

An internal error happens while loading an ONNX model.

I exported a model as an ONNX file, but TensorRT doesn’t support one of the layers. I’m making a plugin and implemented enough of the interfaces that the network parses successfully.

However, it encounters an internal error during buildSerializedNetwork.

As far as I can tell, this is not due to the plugin. None of the plugin’s NYI methods are being called between the time the parser finishes and the time the error occurs. Also, the optimization step that fails is dealing with tensors whose dimensions are very different than the ones for the plugin’s layers.

Here is the tail of the logs where the failure occurs:

V:--------------- Timing Runner: Cast_437 (Cast)
V:Cast has no valid tactics for this config, skipping
V:--------------- Timing Runner: Cast_437 (Reformat)
V:Tactic: 0x00000000000003e8 Time: 0.0688853
V:Tactic: 0x00000000000003ea Time: 0.0320223
V:Tactic: 0x0000000000000000 Time: 0.0188113
V:Fastest Tactic: 0x0000000000000000 Time: 0.0188113
V:>>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x0000000000000000
V:*************** Autotuning format combination: Float(32768,4096:32,64,1) -> Float(1:4,4096,64,1) ***************
V:--------------- Timing Runner: Cast_437 (Cast)
V:Cast has no valid tactics for this config, skipping
V:--------------- Timing Runner: Cast_437 (Reformat)
V:Tactic: 0x00000000000003e8 Time: 0.254352
V:Tactic: 0x00000000000003ea Time: 0.0640907
V:Tactic: 0x0000000000000000 Time: 0.324917
V:Fastest Tactic: 0x00000000000003ea Time: 0.0640907
V:>>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003ea
V:*************** Autotuning format combination: Float(1:4,4096,64,1) -> Float(1048576,4096,64,1) ***************
V:Deleting timing cache: 613 entries, served 16507 hits since creation.
E:2: [engineTacticSupplyHelpers.cpp::makeEngineTensor::55] Error Code 2: Internal Error (Assertion isMultiple(tensor.start[vectorDim], spv) failed. )
E:2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

Environment

TensorRT Version: 8.4.1
GPU Type: Titan RTX
Nvidia Driver Version: WSL: 516.59 Native: 515.48
CUDA Version: 11.7
CUDNN Version: 8
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): N/A
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): ONNX model was exported with PyTorch 1.12
Baremetal or Container (if container which image + tag): Bare metal (both WSL and native)

Relevant Files

Example project including the broken model: trt_failure.zip

Steps To Reproduce

Run the following:

$ unzip trt_failure.zip
$ cd trt_failure/
$ ./build.sh
$ ./hello_trt_plugin

Observe failure above
Copy some other onnx file to ./model.onnx (e.g. mnist.onnx)
Observe Success message

NVES · July 29, 2022, 9:07pm

Hi,
Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.

Thanks!

cogwheel · July 29, 2022, 9:15pm

I’m confused. This plugin uses IPluginV2DynamicExt.

In grid_sample_plugin.h:

class GridSamplePlugin : public IPluginV2DynamicExt {

cogwheel · August 1, 2022, 4:16pm

@NVES please have another look. I think you were mislead by a comment indicating which interface the implementations were overriding. This code does not inherit from the old versions of the interface.

I also think this answer is misleading. If these old interfaces were included for backward compatibility, then they should still work. Again, this is beside the point because this example does inherit from the newest interface.

cogwheel · August 1, 2022, 5:48pm

Update: if I downgrade to TensorRT 8.0.3, it no longer fails during this step.

spolisetty · August 4, 2022, 12:32pm

Hi,

We could reproduce the error. Please allow us some time to work on this issue.

Thank you.

cogwheel · August 22, 2022, 8:05pm

FWIW, I also tried TensorRT 8.2.* but get a completely different error:

V:--------------- Timing Runner: {ForeignNode[Sub_4504...Transpose_4478 + Reshape_4479 + Reshape_4487]} (Myelin)
W:Skipping tactic 0 due to Myelin error: myelinTargetSetPropertyMemorySize called with invalid memory size (0).
V:Fastest Tactic: -3360065831133338131 Time: inf
V:Deleting timing cache: 484 entries, 10723 hits
E:10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[Sub_4504...Transpose_4478 + Reshape_4479 + Reshape_4487]}.)
E:2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Check failed: plan
Aborted

spolisetty · August 23, 2022, 5:13am

Hi,

That’s fine. As we are not facing the above error in the latest version(It might have been fixed). We recommend you to please try using TensorRT 8.4.3 recent release. If you still face this issue, you can expect it will be fixed in future releases.

Thank you.

Topic		Replies	Views
Tenssorrt INT8 precision engine build failed for the models having custom layer (BatchedNMSDynamic_TRT) TensorRT	11	1900	June 29, 2021
ONNX Plugin Layer implements TensorRT	11	1907	January 12, 2021
Onnx2TRT missing TensorListStack plugin TensorRT	1	1076	December 9, 2021
Custom plugin supporting int8 I/O type check fail TensorRT	2	538	May 26, 2023
Error in tensorrt test TRT file TensorRT tensorrt , onnx	3	1362	July 5, 2022
Assertion failed: *tensor = importer_ctx->network()->addInput( input.name().c_str(), trt_dtype, trt_dims) TensorRT tensorrt	15	924	June 24, 2022
Running SSD-Lite from Pytorch model zoo runs into errors on TensorRT TensorRT	1	617	August 4, 2022
nvonnxparser::IParse::parse() fail,and trt report paramenter check fail TensorRT tensorrt	7	1200	July 12, 2021
Failed to build engine caused by the dynamic input error TensorRT tensorrt	2	1039	November 21, 2022
ONNX to TRT Engine conversion Error TensorRT tensorrt	8	3664	May 25, 2022

Internal Error (Assertion isMultiple(tensor.start[vectorDim], spv) failed. ) when loading ONNX file

Description

Environment

Relevant Files

Steps To Reproduce

Related topics