Errors while reading ONNX file produced by TAO 5

d.humeniuk · August 29, 2023, 5:31pm

Hi! I have trained a yolov4_tiny model with TAO 5 framework. With the tao model yolo_v4_tiny export command I have exported the obtained hdf5 file during training to onnx file. When running inference on the same machine that was used for training with the obtained .onnx file I get errors. I provide the details about the system configuration as well as the commands I used below:

• Hardware AWS g4dn.2xlarge ( 1 NVIDIA T4 GPU, 8 CPUs, 32 GiB RAM)
• Network Type Yolo_v4_tiny
• Training spec file
yolov4_tiny.txt (1.8 KB)

• How to reproduce the issue ?

Exporting the .hdf5 file to .onnx:

tao model yolo_v4_tiny export -m /tao-experiments/results/yolov4_tiny_cardbox/weights/yolov4_resnet18_epoch_036.hdf5 -o /tao-experiments/models/cardbox_detection_with_yolov4.onnx -e /tao-experiments/specs/yolov4_tiny.txt -k $KEY --gen_ds_config --verbose

Code for the ONNX model inference:

import onnx
import onnxruntime
model_path = "tao-experiments/models/cardbox_detection_with_yolov4_tiny.onnx"
ort_session = onnxruntime.InferenceSession(model_path, None, providers=['CPUExecutionProvider'])

Error received:

File "test_engine.py", line 12, in <module>
    ort_session = onnxruntime.InferenceSession(model_path, None, providers=['CPUExecutionProvider'])
  File "/home/d.HUMENIUK/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/d.HUMENIUK/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 424, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from tao-experiments/models_28_08_23/cardbox_detection_with_yolov4.onnx failed:This is an invalid model. In Node, ("BatchedNMS_N", BatchedNMSDynamic_TRT, "", -1) : ("box": tensor(float),"cls": tensor(float),) -> ("BatchedNMS": tensor(int32),"BatchedNMS_1": tensor(float),"BatchedNMS_2": tensor(float),"BatchedNMS_3": tensor(float),) , Error No Op registered for BatchedNMSDynamic_TRT with domain_version of 12

Code for ONNX model verification:

import onnx
import onnxruntime
model_path = "tao-experiments/models_29_08_23/cardbox_detection_with_FasterRCNN.onnx"
onnx_model = onnx.load(model_path)
try:
    onnx.checker.check_model(onnx_model)
except onnx.checker.ValidationError as e:
    print("The model is invalid: %s" % e)
else:
    print("The model is valid!")

Error received:

The model is invalid: Field 'shape' of 'type' is required but missing.

I would be grateful for your help!

P.S. I have also tried training the Yolov4 model as well as the RCNN and I get the same error for the produced ONNX file (I used a specification file dedicated to this models, different from the one I attached)

Morganh · August 30, 2023, 3:34am

Please check if netron can open the onnx files.
Then, please use tao deploy to generate tensorrt engine based on onnx file and then run inference.
Reference:
https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/yolo_v4_tiny/yolo_v4_tiny.ipynb,
https://github.com/NVIDIA/tao_deploy/blob/main/nvidia_tao_deploy/cv/yolo_v4/scripts/gen_trt_engine.py.

d.humeniuk · August 30, 2023, 3:47am

Dear Morganh,
Thank you very much for your reply.
Yes, netron can successfully open the model.
I am attaching the exported “.png” file with the model architecture.
In my application, it would be very useful to be able to run the inference without using the tao model yolo_v4_tiny inference as I want to run the inference in the Isaac Sim standalone app. And the Isaac Sim is running in a container in the AWS instance. Do you think it would be possible, or running the inference with the tensorrt engine is the only way?
Thank you!

Morganh · August 30, 2023, 5:31am

It is not the only way but it is suggested to run inference with the tensorrt engine. May I know the doc link when you “run the inference in the Isaac Sim standalone app”? Also, could you please share the .onnx files? Thanks.

d.humeniuk · August 30, 2023, 4:24pm

Dear Morganh,
Thank you for your reply.
I am attaching the .onnx file.
cardbox_detection_with_yolov4_tiny.onnx (22.6 MB)
In my application, I want to read images from a camera and pass them to the model to run the inference. Reading the images from the camera in a standalone app is well described in the documentation: camera example.
It was also shown that it is possible to run the inference in Isaac Sim: inference in Isaac Sim..
What I would like to do is run the inference on the image obtained from a camera in Isaac Sim with the object detection model that I produced with TAO 5.

bke · August 31, 2023, 2:54pm

It seems like the ONNX model contains an unsupported layer that performs NMS (Non-Max-Suppresion). This is some proprietary piece that is not supported by a popular ONNX runtime framework like OnnxRuntime.

I just got the same error. This raises several questions for me:

Can I export the model without the included NMS layer? I can perform NMS of Soft-NMS by myself.
If not, how do you propose we run the ONNX model? If it does not support OnnxRuntime, is there any other runtime that is supported that runs on non-Nvidia hardware?

Best regards,
Ben

Morganh · September 4, 2023, 2:56am

Please trim the exported onnx file to a new onnx file. Then perform NMS by yourself.

bke · September 5, 2023, 6:10am

Thank you!

I tried to extract a sub-model using the Python ONNX utils as documented here. But as usual it’s not going to be easy (error that occurs).

I also found documentation on how to insert NMS for another type of network into the ONNX graph. But again this is very model specific.

Before spending a day on this I would like to know if there is another way? My second question, how would you run an exported ONNX model? As far as I can tell there is no runtime that can support it because it does not comply with the ONNX opset v15. Also, the used NMS layer it not supported by any opset.

Morganh · September 5, 2023, 9:29am

Please use below way to trim the onnx. For example, to trim https://forums.developer.nvidia.com/uploads/short-url/tc4tmv15H16NiFPOqOa1QNswBnM.onnx

import onnx_graphsurgeon as gs
import numpy as np
import onnx

model = onnx.load("cardbox_detection_with_yolov4_tiny.onnx")
graph = gs.import_onnx(model)

tensors = graph.tensors()

graph.inputs = [tensors["Input"].to_variable(dtype=np.float32, shape=( "N", 3,512,768))]
graph.outputs = [tensors["box"].to_variable(dtype=np.float32), tensors["cls"].to_variable(dtype=np.float32)]

graph.cleanup()

onnx.save(gs.export_onnx(graph), "cardbox_detection_with_yolov4_tiny_cut.onnx")

Then, no issue when run it with

import onnx
import onnxruntime
#model_path = "./cardbox_detection_with_yolov4_tiny.onnx"
model_path = "./cardbox_detection_with_yolov4_tiny_cut.onnx"
ort_session = onnxruntime.InferenceSession(model_path, None, providers=['CPUExecutionProvider'])

The BatchedNMS implementation can refer to https://github.com/NVIDIA/TensorRT/tree/23.08/plugin/batchedNMSPlugin

d.humeniuk · September 5, 2023, 4:22pm

Dear Morganh and bke,
Thank you very much for your inputs.
The last script you provided @Morganh works well and I could successfully open the trimmed model and observe its outputs. I did it on the AWS g4dn.2xlarge instance. Thank you very much for providing this answer! If you don’t have further questions @bke, I think we can mark this issue as solved. Thank you again!

bke · September 6, 2023, 5:14am

@Morganh, highly appreciated, this allows us to proceed. @d.humeniuk, you can mark the post as solved.

system · September 20, 2023, 5:15am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Errors while reading TensorRT engine file produced by TAO 5 TAO Toolkit	6	691	September 7, 2023
Usability of the exported onnx file TAO Toolkit	3	612	October 3, 2023
Inference with tensorrt engine file has different results compared with trained hdf5 model TAO Toolkit	9	205	July 8, 2024
Onnx python post-processing vs. TAO train post processing TAO Toolkit	6	605	October 24, 2023
Exporting to ONNX TAO Toolkit	17	860	March 7, 2024
Failed to convert to tensorrt engine for yolov4 model trained in TAO TAO Toolkit jetson	5	142	July 3, 2024
Error during run inference with custom yolov5 model in deepstream DeepStream SDK deepstream	8	99	September 17, 2024
Problems exporting TAO ONNX model to Jetson TAO Toolkit jetson-inference	5	110	August 27, 2024
Unsupported ONNX data type: UINT8 (2) TensorRT	24	8986	May 6, 2021
Unable to deploy TAO 4.0.1 yolov4 model on deepstream6.0 TAO Toolkit deepstream	43	1083	August 18, 2023

Errors while reading ONNX file produced by TAO 5

Related topics