Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size)

Description

HI,
I ran
trtexec --onnx=model.onnx --explicitBatch

but it failed with the following error:

Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size)

The ONNX files is correctly parsed, the error seems to occur during the engine build. I also tried with optimization profiles manually defined but I got once again the same Assertion Error.

I used tf2onnx.convert script for converting frozen graph to onnx model. I tried opset=12, opset=11 and opset=10 but in all these cases, I have still the same error.

With the current pipeline (frozengraph->UFF->trtengine) it worked, but (frozengraph->ONNX->trtengine) failed.

How can I solve this issue please?

Environment

TensorRT Version: 7.1.0.16
GPU Type: Jetson AGX
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: JetPack 4.4
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

trtexec --onnx=model.onnx --explicitBatch

Reshape node “Reshape__483” seems to be weird, it is converting to 1,1,1,1. (Float(1,5,640,30720) -> Float(1,1,1,1))
Could you please check if it’s implemented correctly?
Also, can you try few things:
Check ONNX model using checker function and see if it passes?
import onnx

model = onnx.load(“model.onnx”)

onnx.checker.check_model(model)

Please share the .pb file as well so we can help better.

Thanks

Hi Sunil,

Yes the checker passed without any issue.

I used the same .pb to convert to uff via uff.from_tensorflow_frozen_model() and then I used this uff to build the engine and it works. I don’t think the .pb is the problem but let’s see.

How can I share the .pb with a restricted access, please?

Can you upload the model and share the link via IM?

Thanks

I am able to successfully run the model using TRT 7 (NGC container: tensorrt:20.03-py3) on my system.
Could you please try below commands and let me know if you are still facing the issue?

python -m tf2onnx.convert --graphdef frozen_graph.pb --output model.onnx --inputs Infer-Network-Input-Y:0,Infer-Network-Input-UV:0 --outputs 'fcn_prob/Softmax:0' --opset=11

trtexec --onnx=model.onnx --explicitBatch --verbose
[06/15/2020-11:24:03] [I] Host latency
[06/15/2020-11:24:03] [I] min: 7.04991 ms (end to end 7.18958 ms)
[06/15/2020-11:24:03] [I] max: 7.45667 ms (end to end 13.7762 ms)
[06/15/2020-11:24:03] [I] mean: 7.24213 ms (end to end 13.5211 ms)
[06/15/2020-11:24:03] [I] median: 7.25 ms (end to end 13.5364 ms)
[06/15/2020-11:24:03] [I] percentile: 7.35107 ms at 99% (end to end 13.7202 ms at 99%)
[06/15/2020-11:24:03] [I] throughput: 0 qps
[06/15/2020-11:24:03] [I] walltime: 3.02217 s
[06/15/2020-11:24:03] [I] GPU Compute
[06/15/2020-11:24:03] [I] min: 6.60896 ms
[06/15/2020-11:24:03] [I] max: 7.02258 ms
[06/15/2020-11:24:03] [I] mean: 6.81442 ms
[06/15/2020-11:24:03] [I] median: 6.81982 ms
[06/15/2020-11:24:03] [I] percentile: 6.91895 ms at 99%
[06/15/2020-11:24:03] [I] total compute time: 3.01198 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=model.onnx --explicitBatch --verbose

Thanks

Well, many thanks Sunil.

The problem was coming from frozengraph to ONNX conversion.
Actually, I used my own script to convert my tf graph to the onnx model.
For those who are in the same case. In my script I called the following function:

with tf_loader.tf_session(graph=tf_graph):
g = process_tf_graph(tf_graph,
continue_on_error=args.continue_on_error,
target=args.target,
opset=args.opset,
custom_op_handlers=custom_ops,
extra_opset=extra_opset,
shape_override=args.shape_override,
input_names=inputs,
output_names=outputs,
inputs_as_nchw=args.inputs_as_nchw)

model_proto = onnx_graph.make_model("converted from {}".format(model_path))

To solve the issue, I just had to add “onnx_graph = optimizer.optimize_graph(g)”, just after the process_tf_graph call:

with tf_loader.tf_session(graph=tf_graph):
g = process_tf_graph(tf_graph,
continue_on_error=args.continue_on_error,
target=args.target,
opset=args.opset,
custom_op_handlers=custom_ops,
extra_opset=extra_opset,
shape_override=args.shape_override,
input_names=inputs,
output_names=outputs,
inputs_as_nchw=args.inputs_as_nchw)

**onnx_graph = optimizer.optimize_graph(g)**
model_proto = onnx_graph.make_model("converted from {}".format(model_path))

Best regards,

Hi Sunil,

I come back with the same network. I managed to generate the cuda engine using:
trtexec --onnx=model.onnx --explicitBatch --verbose --saveEngine model.engine

Please find a part of the command log:

[06/18/2020-15:35:02] [I] === Model Options ===
[06/18/2020-15:35:02] [I] Format: ONNX
[06/18/2020-15:35:02] [I] Model: model.onnx
[06/18/2020-15:35:02] [I] Output:
[06/18/2020-15:35:02] [I] === Build Options ===
[06/18/2020-15:35:02] [I] Max batch: explicit
[06/18/2020-15:35:02] [I] Workspace: 16 MB
[06/18/2020-15:35:02] [I] minTiming: 1
[06/18/2020-15:35:02] [I] avgTiming: 8
[06/18/2020-15:35:02] [I] Precision: FP32
[06/18/2020-15:35:02] [I] Calibration:
[06/18/2020-15:35:02] [I] Safe mode: Disabled
[06/18/2020-15:35:02] [I] Save engine: test.engine
[06/18/2020-15:35:02] [I] Load engine:
[06/18/2020-15:35:02] [I] Builder Cache: Enabled
[06/18/2020-15:35:02] [I] NVTX verbosity: 0
[06/18/2020-15:35:02] [I] Inputs format: fp32:CHW
[06/18/2020-15:35:02] [I] Outputs format: fp32:CHW
[06/18/2020-15:35:02] [I] Input build shapes: model
[06/18/2020-15:35:02] [I] Input calibration shapes: model
[06/18/2020-15:35:02] [I] === System Options ===
[06/18/2020-15:35:02] [I] Device: 0
[06/18/2020-15:35:02] [I] DLACore:
[06/18/2020-15:35:02] [I] Plugins:
[06/18/2020-15:35:02] [I] === Inference Options ===
[06/18/2020-15:35:02] [I] Batch: Explicit
[06/18/2020-15:35:02] [I] Input inference shapes: model
[06/18/2020-15:35:02] [I] Iterations: 10
[06/18/2020-15:35:02] [I] Duration: 3s (+ 200ms warm up)
[06/18/2020-15:35:02] [I] Sleep time: 0ms
[06/18/2020-15:35:02] [I] Streams: 1
[06/18/2020-15:35:02] [I] ExposeDMA: Disabled
[06/18/2020-15:35:02] [I] Spin-wait: Disabled
[06/18/2020-15:35:02] [I] Multithreading: Disabled
[06/18/2020-15:35:02] [I] CUDA Graph: Disabled
[06/18/2020-15:35:02] [I] Skip inference: Disabled
[06/18/2020-15:35:02] [I] Inputs:
[06/18/2020-15:35:02] [I] === Reporting Options ===
[06/18/2020-15:35:02] [I] Verbose: Disabled
[06/18/2020-15:35:02] [I] Averages: 10 inferences
[06/18/2020-15:35:02] [I] Percentile: 99
[06/18/2020-15:35:02] [I] Dump output: Disabled
[06/18/2020-15:35:02] [I] Profile: Disabled
[06/18/2020-15:35:02] [I] Export timing to JSON file:
[06/18/2020-15:35:02] [I] Export output to JSON file:
[06/18/2020-15:35:02] [I] Export profile to JSON file:
[06/18/2020-15:35:02] [I]

Input filename: model.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: tf2onnx
Producer version: 1.7.0
**Domain: **
Model version: 0
**Doc string: **
[06/18/2020-15:35:03] [W] [TRT] onnx2trt_utils.cpp:217: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/18/2020-15:35:03] [W] Dynamic dimensions required for input: Infer-Network-Input-Y:0, but no shapes were provided. Automatically overriding shape to: 1x384x1024x1
[06/18/2020-15:35:03] [W] Dynamic dimensions required for input: Infer-Network-Input-UV:0, but no shapes were provided. Automatically overriding shape to: 1x192x512x2

Unfortunately, I never managed to get a correct output using the model.engine for running the inference with TensorRT7 compare to the tensorflow output reference.

To run the TRT inference, I execute the following command:

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
with open(trtEngine, ‘rb’) as f, trt.Runtime(TRT_LOGGER) as runtime:

    engine = runtime.deserialize_cuda_engine(f.read())
  
    h_input_Y = cuda.pagelocked_empty(1*trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(trt.float32))
    h_input_UV = cuda.pagelocked_empty(1*trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(trt.float32))
    h_output_SEG = cuda.pagelocked_empty(1*trt.volume(engine.get_binding_shape(2)), dtype=trt.nptype(trt.float32))
    h_output_BB = cuda.pagelocked_empty(1*trt.volume(engine.get_binding_shape(3)), dtype=trt.nptype(trt.float32))
    # Allocate device memory
    d_input_Y = cuda.mem_alloc(h_input_Y.nbytes)
    d_input_UV = cuda.mem_alloc(h_input_UV.nbytes)
    d_output_SEG = cuda.mem_alloc(output[0].nbytes)
    d_output_BB = cuda.mem_alloc(output[1].nbytes)

    # Create a stream in which to copy inputs/outputs and run inference.
    stream = cuda.Stream()

    with engine.create_execution_context() as context:
        context.active_optimization_profile = 0
        np.copyto(h_input_Y, input['input_y'].ravel())
        np.copyto(h_input_UV, input['input_uv'].ravel())
        # Transfer input data to the GPU.
        cuda.memcpy_htod_async(d_input_Y, h_input_Y, stream)
        cuda.memcpy_htod_async(d_input_UV, h_input_UV, stream)
        # Run inference.
        context.execute_async(batch_size=batchSize, bindings=[int(d_input_Y), int(d_input_UV),  int(d_output_SEG),  int(d_output_BB)], stream_handle=stream.handle)
        # Transfer predictions back from the GPU.
        cuda.memcpy_dtoh_async(h_output_SEG, d_output_SEG, stream)
        cuda.memcpy_dtoh_async(h_output_BB, d_output_BB, stream)
        # Synchronize the stream
        stream.synchronize()

Could you give me support to figure out the problem? I tried a lot of things (update the jetpack version to 4.4, update tf version to 15.2, test with UFF, several opsets have been tested 10,11 and12) but I have still the same wrong output.

I already share with you the .pb file. Is it possible to test on your side the correctness of the trt engine output compare to tf output?

Two things are confusiong me actualy:

  1. Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32, i don’t understand why I have this warning when creating the trt_engine from ONNX model.

  2. Inputs format: fp32:CHW and Outputs format: fp32:CHW. I don’t understand why I got this message in the log. I should be HWC format since this is the format in the .pb file. Is it a way to define this format during the ONNX creation or during the engine creation.

Sorry for the very long message

Regards,

TRT doesn’t support INT64, it’s just warning msg that TRT is trying to cast down to INT32.
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-713/api/python_api/infer/FoundationalTypes/DataType.html

TensorRT’s format just specify the memory layout rather than the dimension. The dimension is always in NCHW order.
Since ONNX just support one format so I think there is no ONNX has flag to indicate it is NHWC. But you can try “–inputs-as-nchw” option in tf2onnx. Please refer below links:
https://github.com/onnx/onnx/issues/369

Model just have 2 inputs and 1 output, did you marked more outputs during model generations?

Could you please refer to below link in case it helps:

Thanks