Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size)

thomas.boulay · June 9, 2020, 6:08pm

Description

HI,
I ran
trtexec --onnx=model.onnx --explicitBatch

but it failed with the following error:

The ONNX files is correctly parsed, the error seems to occur during the engine build. I also tried with optimization profiles manually defined but I got once again the same Assertion Error.

I used tf2onnx.convert script for converting frozen graph to onnx model. I tried opset=12, opset=11 and opset=10 but in all these cases, I have still the same error.

With the current pipeline (frozengraph->UFF->trtengine) it worked, but (frozengraph->ONNX->trtengine) failed.

How can I solve this issue please?

Environment

TensorRT Version: 7.1.0.16
GPU Type: Jetson AGX
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: JetPack 4.4
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

trtexec --onnx=model.onnx --explicitBatch

SunilJB · June 10, 2020, 8:41am

Reshape node “Reshape__483” seems to be weird, it is converting to 1,1,1,1. (Float(1,5,640,30720) → Float(1,1,1,1))
Could you please check if it’s implemented correctly?
Also, can you try few things:
Check ONNX model using checker function and see if it passes?
import onnx

model = onnx.load(“model.onnx”)

onnx.checker.check_model(model)

Please share the .pb file as well so we can help better.

Thanks

thomas.boulay · June 10, 2020, 10:06am

Hi Sunil,

Yes the checker passed without any issue.

I used the same .pb to convert to uff via uff.from_tensorflow_frozen_model() and then I used this uff to build the engine and it works. I don’t think the .pb is the problem but let’s see.

How can I share the .pb with a restricted access, please?

SunilJB · June 12, 2020, 8:40am

Can you upload the model and share the link via IM?

Thanks

SunilJB · June 15, 2020, 11:31am

I am able to successfully run the model using TRT 7 (NGC container: tensorrt:20.03-py3) on my system.
Could you please try below commands and let me know if you are still facing the issue?

python -m tf2onnx.convert --graphdef frozen_graph.pb --output model.onnx --inputs Infer-Network-Input-Y:0,Infer-Network-Input-UV:0 --outputs 'fcn_prob/Softmax:0' --opset=11

trtexec --onnx=model.onnx --explicitBatch --verbose
[06/15/2020-11:24:03] [I] Host latency
[06/15/2020-11:24:03] [I] min: 7.04991 ms (end to end 7.18958 ms)
[06/15/2020-11:24:03] [I] max: 7.45667 ms (end to end 13.7762 ms)
[06/15/2020-11:24:03] [I] mean: 7.24213 ms (end to end 13.5211 ms)
[06/15/2020-11:24:03] [I] median: 7.25 ms (end to end 13.5364 ms)
[06/15/2020-11:24:03] [I] percentile: 7.35107 ms at 99% (end to end 13.7202 ms at 99%)
[06/15/2020-11:24:03] [I] throughput: 0 qps
[06/15/2020-11:24:03] [I] walltime: 3.02217 s
[06/15/2020-11:24:03] [I] GPU Compute
[06/15/2020-11:24:03] [I] min: 6.60896 ms
[06/15/2020-11:24:03] [I] max: 7.02258 ms
[06/15/2020-11:24:03] [I] mean: 6.81442 ms
[06/15/2020-11:24:03] [I] median: 6.81982 ms
[06/15/2020-11:24:03] [I] percentile: 6.91895 ms at 99%
[06/15/2020-11:24:03] [I] total compute time: 3.01198 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=model.onnx --explicitBatch --verbose

Thanks

thomas.boulay · June 15, 2020, 2:21pm

Well, many thanks Sunil.

The problem was coming from frozengraph to ONNX conversion.
Actually, I used my own script to convert my tf graph to the onnx model.
For those who are in the same case. In my script I called the following function:

with tf_loader.tf_session(graph=tf_graph):
g = process_tf_graph(tf_graph,
continue_on_error=args.continue_on_error,
target=args.target,
opset=args.opset,
custom_op_handlers=custom_ops,
extra_opset=extra_opset,
shape_override=args.shape_override,
input_names=inputs,
output_names=outputs,
inputs_as_nchw=args.inputs_as_nchw)

model_proto = onnx_graph.make_model("converted from {}".format(model_path))

To solve the issue, I just had to add “onnx_graph = optimizer.optimize_graph(g)”, just after the process_tf_graph call:

with tf_loader.tf_session(graph=tf_graph):
g = process_tf_graph(tf_graph,
continue_on_error=args.continue_on_error,
target=args.target,
opset=args.opset,
custom_op_handlers=custom_ops,
extra_opset=extra_opset,
shape_override=args.shape_override,
input_names=inputs,
output_names=outputs,
inputs_as_nchw=args.inputs_as_nchw)

**onnx_graph = optimizer.optimize_graph(g)**
model_proto = onnx_graph.make_model("converted from {}".format(model_path))

Best regards,

thomas.boulay · June 18, 2020, 1:53pm

Hi Sunil,

I come back with the same network. I managed to generate the cuda engine using:
trtexec --onnx=model.onnx --explicitBatch --verbose --saveEngine model.engine

Please find a part of the command log:

[06/18/2020-15:35:02] [I] === Model Options ===
[06/18/2020-15:35:02] [I] Format: ONNX
[06/18/2020-15:35:02] [I] Model: model.onnx
[06/18/2020-15:35:02] [I] Output:
[06/18/2020-15:35:02] [I] === Build Options ===
[06/18/2020-15:35:02] [I] Max batch: explicit
[06/18/2020-15:35:02] [I] Workspace: 16 MB
[06/18/2020-15:35:02] [I] minTiming: 1
[06/18/2020-15:35:02] [I] avgTiming: 8
[06/18/2020-15:35:02] [I] Precision: FP32
[06/18/2020-15:35:02] [I] Calibration:
[06/18/2020-15:35:02] [I] Safe mode: Disabled
[06/18/2020-15:35:02] [I] Save engine: test.engine
[06/18/2020-15:35:02] [I] Load engine:
[06/18/2020-15:35:02] [I] Builder Cache: Enabled
[06/18/2020-15:35:02] [I] NVTX verbosity: 0
[06/18/2020-15:35:02] [I] Inputs format: fp32:CHW
[06/18/2020-15:35:02] [I] Outputs format: fp32:CHW
[06/18/2020-15:35:02] [I] Input build shapes: model
[06/18/2020-15:35:02] [I] Input calibration shapes: model
[06/18/2020-15:35:02] [I] === System Options ===
[06/18/2020-15:35:02] [I] Device: 0
[06/18/2020-15:35:02] [I] DLACore:
[06/18/2020-15:35:02] [I] Plugins:
[06/18/2020-15:35:02] [I] === Inference Options ===
[06/18/2020-15:35:02] [I] Batch: Explicit
[06/18/2020-15:35:02] [I] Input inference shapes: model
[06/18/2020-15:35:02] [I] Iterations: 10
[06/18/2020-15:35:02] [I] Duration: 3s (+ 200ms warm up)
[06/18/2020-15:35:02] [I] Sleep time: 0ms
[06/18/2020-15:35:02] [I] Streams: 1
[06/18/2020-15:35:02] [I] ExposeDMA: Disabled
[06/18/2020-15:35:02] [I] Spin-wait: Disabled
[06/18/2020-15:35:02] [I] Multithreading: Disabled
[06/18/2020-15:35:02] [I] CUDA Graph: Disabled
[06/18/2020-15:35:02] [I] Skip inference: Disabled
[06/18/2020-15:35:02] [I] Inputs:
[06/18/2020-15:35:02] [I] === Reporting Options ===
[06/18/2020-15:35:02] [I] Verbose: Disabled
[06/18/2020-15:35:02] [I] Averages: 10 inferences
[06/18/2020-15:35:02] [I] Percentile: 99
[06/18/2020-15:35:02] [I] Dump output: Disabled
[06/18/2020-15:35:02] [I] Profile: Disabled
[06/18/2020-15:35:02] [I] Export timing to JSON file:
[06/18/2020-15:35:02] [I] Export output to JSON file:
[06/18/2020-15:35:02] [I] Export profile to JSON file:
[06/18/2020-15:35:02] [I]

Input filename: model.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: tf2onnx
Producer version: 1.7.0
**Domain: **
Model version: 0
**Doc string: **
[06/18/2020-15:35:03] [W] [TRT] onnx2trt_utils.cpp:217: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/18/2020-15:35:03] [W] Dynamic dimensions required for input: Infer-Network-Input-Y:0, but no shapes were provided. Automatically overriding shape to: 1x384x1024x1
[06/18/2020-15:35:03] [W] Dynamic dimensions required for input: Infer-Network-Input-UV:0, but no shapes were provided. Automatically overriding shape to: 1x192x512x2

Unfortunately, I never managed to get a correct output using the model.engine for running the inference with TensorRT7 compare to the tensorflow output reference.

To run the TRT inference, I execute the following command:

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
with open(trtEngine, ‘rb’) as f, trt.Runtime(TRT_LOGGER) as runtime:

    engine = runtime.deserialize_cuda_engine(f.read())
  
    h_input_Y = cuda.pagelocked_empty(1*trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(trt.float32))
    h_input_UV = cuda.pagelocked_empty(1*trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(trt.float32))
    h_output_SEG = cuda.pagelocked_empty(1*trt.volume(engine.get_binding_shape(2)), dtype=trt.nptype(trt.float32))
    h_output_BB = cuda.pagelocked_empty(1*trt.volume(engine.get_binding_shape(3)), dtype=trt.nptype(trt.float32))
    # Allocate device memory
    d_input_Y = cuda.mem_alloc(h_input_Y.nbytes)
    d_input_UV = cuda.mem_alloc(h_input_UV.nbytes)
    d_output_SEG = cuda.mem_alloc(output[0].nbytes)
    d_output_BB = cuda.mem_alloc(output[1].nbytes)

    # Create a stream in which to copy inputs/outputs and run inference.
    stream = cuda.Stream()

    with engine.create_execution_context() as context:
        context.active_optimization_profile = 0
        np.copyto(h_input_Y, input['input_y'].ravel())
        np.copyto(h_input_UV, input['input_uv'].ravel())
        # Transfer input data to the GPU.
        cuda.memcpy_htod_async(d_input_Y, h_input_Y, stream)
        cuda.memcpy_htod_async(d_input_UV, h_input_UV, stream)
        # Run inference.
        context.execute_async(batch_size=batchSize, bindings=[int(d_input_Y), int(d_input_UV),  int(d_output_SEG),  int(d_output_BB)], stream_handle=stream.handle)
        # Transfer predictions back from the GPU.
        cuda.memcpy_dtoh_async(h_output_SEG, d_output_SEG, stream)
        cuda.memcpy_dtoh_async(h_output_BB, d_output_BB, stream)
        # Synchronize the stream
        stream.synchronize()

Could you give me support to figure out the problem? I tried a lot of things (update the jetpack version to 4.4, update tf version to 15.2, test with UFF, several opsets have been tested 10,11 and12) but I have still the same wrong output.

I already share with you the .pb file. Is it possible to test on your side the correctness of the trt engine output compare to tf output?

Two things are confusiong me actualy:

Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32, i don’t understand why I have this warning when creating the trt_engine from ONNX model.
Inputs format: fp32:CHW and Outputs format: fp32:CHW. I don’t understand why I got this message in the log. I should be HWC format since this is the format in the .pb file. Is it a way to define this format during the ONNX creation or during the engine creation.

Sorry for the very long message

Regards,

SunilJB · June 19, 2020, 8:40am

TRT doesn’t support INT64, it’s just warning msg that TRT is trying to cast down to INT32.
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-713/api/python_api/infer/FoundationalTypes/DataType.html

TensorRT’s format just specify the memory layout rather than the dimension. The dimension is always in NCHW order.
Since ONNX just support one format so I think there is no ONNX has flag to indicate it is NHWC. But you can try “–inputs-as-nchw” option in tf2onnx. Please refer below links:

github.com/onnx/tensorflow-onnx

--inputs-as-nchw: inputs in NCHW format and outputs in NHWC format

opened 12:33PM - 28 May 20 UTC

closed 02:02AM - 09 Sep 20 UTC

Patrick-PhoenixAI

pending on user response

**Describe the bug** the --inputs-as-nchw parameter let me feed the model using… NCHW format However the model is outputting NHWC format by making a transpose at the end of the model. **System information** - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Unbuntu 18.04 - Tensorflow Version: TensorFlow 1.15 - Python version: 3.6 **Expected behavior** Inputs and outputs in NCHW format. **Additional context** I am converting the TensorFlow model on a host PC without GPU but I am using the ONNX model on an NVIDIA Jetson embedded platform.

Model just have 2 inputs and 1 output, did you marked more outputs during model generations?

Could you please refer to below link in case it helps:

Thanks

xuewei.li1993 · September 25, 2020, 10:00am

Hi,

I also run into the same failure when I parse the onnx model using the cpp api.
Here is my cpp code:

 nvinfer1::INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));

nvonnxparser::IParser* parser = nvonnxparser::createParser(*network, gLogger.getTRTLogger());
parser->parseFromFile("/mnt/test/model1.onnx",2);
auto preprocessorConfig = builder->createBuilderConfig();
// Create an optimization profile so that we can specify a range of input dimensions.
auto profile = builder->createOptimizationProfile();

// This profile will be valid for all images whose size falls in the range of [(1, 1, 1, 1), (1, 1, 56, 56)]
// but TensorRT will optimize for (1, 1, 28, 28)
profile->setDimensions("segment_ids_1:0", OptProfileSelector::kMIN, Dims2{ 766, 64 });
profile->setDimensions("segment_ids_1:0", OptProfileSelector::kOPT, Dims2{ 766, 64 });
profile->setDimensions("segment_ids_1:0", OptProfileSelector::kMAX, Dims2{ 766, 64 });

profile->setDimensions("input_mask_1:0", OptProfileSelector::kMIN, Dims2{ 778, 64 });
profile->setDimensions("input_mask_1:0", OptProfileSelector::kOPT, Dims2{ 778, 64 });
profile->setDimensions("input_mask_1:0", OptProfileSelector::kMAX, Dims2{ 778, 64 });

profile->setDimensions("input_ids_1:0", OptProfileSelector::kMIN, Dims2{ 779, 64 });
profile->setDimensions("input_ids_1:0", OptProfileSelector::kOPT, Dims2{ 779, 64 });
profile->setDimensions("input_ids_1:0", OptProfileSelector::kMAX, Dims2{ 779, 64 });

preprocessorConfig->addOptimizationProfile(profile);

//int flag = 1 << BuilderFlag.kFP16;
preprocessorConfig->setFlags(1);
  
// Build the engine
auto engine = builder->buildEngineWithConfig(*network, *preprocessorConfig);

The error message is :

[W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[F] [TRT] Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size
…/builder/cudnnBuilderBlockChooser.cpp:127
Aborting…

[E] [TRT] …/builder/cudnnBuilderBlockChooser.cpp (127) - Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size)
dlis_run: /mnt/xuewei/IBert/bertInference.cpp:270: nvinfer1::ICudaEngine* bert::BertInference::APIToModel(int, int, int): Assertion `engine != nullptr’ failed.
Aborted (core dumped)

I used:

tf2onnx.convert --saved-model

converting the tensorflow model to onnx model.

I ran onnx.check. and the check passed.

What can I do next to fix this issue?

Thanks

Environment

TensorRT Version : 7.0.0.11
GPU Type : v100
Nvidia Driver Version :
CUDA Version :10.2
CUDNN Version :7.6
Operating System + Version : 18.04.1-Ubuntu
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

xuewei.li1993 · September 25, 2020, 10:07am

Here is the onnx model info:

Input filename: /mnt/test/model1.onnx
ONNX IR version: 0.0.4
Opset version: 8
Producer name: tf2onnx
Producer version: 1.6.3
Domain:
Model version: 0
Doc string:

Topic		Replies	Views
I am trying to convert the ONNX SSD mobilnet v3 model into TensorRT Engine. I am getting the below error Jetson TX2 tensorrt , tensorflow	24	3701	February 17, 2022
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1410	July 12, 2022
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	723	April 30, 2024
[TensorRT] ERROR: Network must have at least one output TensorRT tensorrt	29	2381	September 30, 2021
Issues with torch.nn.ReflectionPad2d(padding) conversion to TRT engine TensorRT tensorrt , pytorch , onnx	21	4183	February 8, 2022
Torchvision Faster RCNN failed to convert to TensorRT engine TensorRT tensorrt , ubuntu , python	3	1444	October 5, 2023
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4559	April 27, 2022
Errors with reading pb file in TensorRT and readNetFromTensorflow in C++ TensorRT	3	1238	January 26, 2021
TensorRT cannot parse ONNX model TensorRT	5	1798	June 18, 2020
AttributeError: 'NoneType' object has no attribute 'create_execution_context' TensorRT	30	21961	June 17, 2023

Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size)

Description

Environment

Relevant Files

Steps To Reproduce

Environment

Related topics