Inference error while using tensorrt engine on jetson nano

Hello,
I am having the following error while running inference on a trt engine
The engine file is for object detection model ‘ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’ from tensorflow model zoo.

Error:
[TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)

Environment

TensorRT version: 8.0.1.6
onnx version: 1.10.2
Jetpack: 4.6

Relevant Files

I have uploaded my onnx and trt engine file for the model, and the script I am using for inference.
And I follow the code given in the link: https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/ to convert onnx to trt file.

inference.py (5.2 KB)
model.trt (12.4 MB)
model.onnx (10.4 MB)

Please have a look.
Thank you.

Hi,

Have you run this model with onnxruntime before?
It seems that there are some BatchMultiClassNonMaxSuppression which is not supported by TensorRT.

Thanks.

Hello,

Yes I tried running the model with onnxruntime and it gives the following error:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from model_final.onnx failed:This is an invalid model. In Node, (“nms/non_maximum_suppression_first”, EfficientNMS_TRT, “”, -1) : (“StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead/scale:0_1”: tensor(float),“StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead/slicer:0_0”: tensor(float),“nms/anchors:0”: tensor(float),) → (“num_detections”: tensor(int32),“detection_boxes”: tensor(float),“detection_scores”: tensor(float),“detection_classes”: tensor(int32),) , Error No Op registered for EfficientNMS_TRT with domain_version of 11

Can you give any idea how to move forward from here?

Thank you

Hi,

Do you convert the model into ONNX with tf2onnx?
Could you share the detailed steps with us?

Thanks.

Hello,
Yes I used tf2onxx for conversion of saved graph to the onnx format. The following command with opset 11 was used for conversion:

python -m tf2onnx.convert --saved-model tensorflow-model-path --opset 11 --output model.onnx

And the following code was used to create tensorrt engine from the onnx file. This code was available on one of the nvidia jetson nano forum regarding conversion to tensorrt engine.

engine.py (1.0 KB)
create_engine.py (692 Bytes)

Hi,

The model from TF2 Object Detection API requires some customization.
Could you follow the tutorial below to convert the model into TensorRT:

Thanks.

Hi,

Confirmed that we can convert the ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 model into TensorRT on Jetson.
Below is the detailed steps for your reference:

1. Environment

$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-tensorflow:r32.6.1-tf2.5-py3

2. Install Prerequisites

$ apt-get update
$ apt-get install cmake g++ git libprotobuf-dev protobuf-compiler
$ pip3 install onnx tf2onnx pillow
$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/onnx-graphsurgeon/
$ make install
$ python3 -m pip install --force-reinstall dist/*.whl
$ wget https://nvidia.box.com/shared/static/jy7nqva7l88mq9i8bw3g3sklzf4kccn2.whl -O onnxruntime_gpu-1.10.0-cp36-cp36m-linux_aarch64.whl
$ pip3 install onnxruntime_gpu-1.10.0-cp36-cp36m-linux_aarch64.whl

3. Prepare Source

$ cd ../../samples/python/tensorflow_object_detection_api/
$ git clone https://github.com/tensorflow/models.git
$ cp -r models/research/object_detection .
$ protoc object_detection/protos/*.proto --python_out=.

Apply below change:

diff --git a/samples/python/tensorflow_object_detection_api/create_onnx.py b/samples/python/tensorflow_object_detection_api/create_onnx.py
index b6ac423..7292756 100644
--- a/samples/python/tensorflow_object_detection_api/create_onnx.py
+++ b/samples/python/tensorflow_object_detection_api/create_onnx.py
@@ -254,8 +254,8 @@ class TFODGraphSurgeon:
         concat_node.outputs = []
 
         # Disconnect the last node in second preprocessing branch with parent second TensorListStack node.
-        tile_node = self.graph.find_node_by_op("Tile")
-        tile_node.outputs = []
+        #tile_node = self.graph.find_node_by_op("Tile")
+        #tile_node.outputs = []
 
         # Reshape nodes tend to update the batch dimension to a fixed value of 1, they should use the batch size instead.
         for node in [node for node in self.graph.nodes if node.op == "Reshape"]:

4. Convert Model

$ wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz .
$ tar -xvf ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz
$ python3 create_onnx.py --pipeline_config ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config --saved_model ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/saved_model --onnx model.onnx --input_format NCHW
$ python3 build_engine.py --onnx model.onnx --engine engine.trt --precision fp16

5. Test with TensorRT

$ /usr/src/tensorrt/bin/trtexec --loadEngine=engine.trt
...
[12/29/2021-08:40:07] [I] === Performance summary ===
[12/29/2021-08:40:07] [I] Throughput: 146.935 qps
[12/29/2021-08:40:07] [I] Latency: min = 6.73804 ms, max = 8.90845 ms, mean = 6.79419 ms, median = 6.77783 ms, percentile(99%) = 7.34839 ms
[12/29/2021-08:40:07] [I] End-to-End Host Latency: min = 6.7522 ms, max = 8.92114 ms, mean = 6.80563 ms, median = 6.78809 ms, percentile(99%) = 7.36255 ms
[12/29/2021-08:40:07] [I] Enqueue Time: min = 4.5791 ms, max = 8.78735 ms, mean = 5.32989 ms, median = 5.07129 ms, percentile(99%) = 6.9624 ms
[12/29/2021-08:40:07] [I] H2D Latency: min = 0.0742188 ms, max = 0.0761719 ms, mean = 0.0748875 ms, median = 0.0749512 ms, percentile(99%) = 0.0759277 ms
[12/29/2021-08:40:07] [I] GPU Compute Time: min = 6.65723 ms, max = 8.8291 ms, mean = 6.71306 ms, median = 6.69604 ms, percentile(99%) = 7.26758 ms
[12/29/2021-08:40:07] [I] D2H Latency: min = 0.00512695 ms, max = 0.00732422 ms, mean = 0.00624302 ms, median = 0.00634766 ms, percentile(99%) = 0.00708008 ms
[12/29/2021-08:40:07] [I] Total Host Walltime: 1.09572 s
[12/29/2021-08:40:07] [I] Total GPU Compute Time: 1.0808 s
[12/29/2021-08:40:07] [I] Explanations of the performance metrics are printed in the verbose logs.
[12/29/2021-08:40:07] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --loadEngine=engine.trt
[12/29/2021-08:40:07] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1377, GPU 16207 (MiB)

Thanks.

1 Like

Hello,

I tried the solution you mentioned above and tried to debug the error but its still the same. The onnx file was created successfully but during the engine generation there is an error. I have provided the log file for your reference. Please have a look.

log_file.txt (2.2 MB)

Thank you.

Hi,

Do you use the ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 model from TensorFlow model zoo?
Or a customized version?

We have tested the model and it can work correctly with the above instructions.
If the default version is used, would you mind trying it again with some swap memory?

Thanks.

Hello,
I am using ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 model fine-tuned for my own dataset with 3 classes.

As for the swap memory I am already using 8GB of it.

While debugging the error, I found some other people are having the same error:
https://github.com/pskiran1/TensorRT-support-for-Tensorflow-2-Object-Detection-Models/issues/6

Can you please have a look?

Thank you.

Hi,

Would you mind sharing your model with us so we can give it a check?
You may need some updates if a customized model is used.

Thanks.

ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.zip (19.1 MB)

The file contains model as well as config file.

Thank you.

Hi,

Thanks for sharing the model.

We try to convert your model with create_onnx.py script.
But meet the following error:

This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)
INFO:ModelHelper:Found Concat node 'StatefulPartitionedCall/concat_1' as the tip of WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead
INFO:ModelHelper:Found Concat node 'StatefulPartitionedCall/concat' as the tip of WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead
Traceback (most recent call last):
  File "create_onnx.py", line 671, in <module>
    main(args)
  File "create_onnx.py", line 647, in main
    effdet_gs.process_graph(args.first_nms_threshold, args.second_nms_threshold)
  File "create_onnx.py", line 620, in process_graph
    self.graph.outputs = first_nms(-1, True, first_nms_threshold)
  File "create_onnx.py", line 485, in first_nms
    anchors_tensor = self.extract_anchors_tensor(box_net_split)
  File "create_onnx.py", line 311, in extract_anchors_tensor
    anchors_y = get_anchor(0, "Add")
  File "create_onnx.py", line 300, in get_anchor
    if (node.inputs[1].values).size == 1:
AttributeError: 'Variable' object has no attribute 'values'

It seems that you can convert the ONNX model successfully.
Have you applied any customization based on your model?

Thanks.

Hello,

For me the create_onnx.py works out of the box without any customizations and it runs without any error. May be the problem is with the version of some libraries. Here are the versions I am using.

onnx_graphsurgeon : 0.3.10
tensorflow: 2.6.2
onnx: 1.8.1
tf2onnx: 1.8.1
Python: 3.8.10

May be its the version problem only. Please do have a look.

Thank you

Hi,

Just want to confirm first.

Do you run the sample on JetPack 4.6? Since our default python version is 3.6 rather than 3.8.
If you use python 3.8, have you built the TensorRT python package from the binding source?

Thanks.

Hello,

I am running create_onnx.py script on my ubuntu machine and not on jetson nano. Only the build_engine.py is being run on jetson nano.

Ubuntu machine has python 3.8 and jetson nano has JetPack 4.6 with default python 3.6.

Thank you.

Hi,

Thanks for the information.
We are still checking this internally. Will share more information later.

Thanks.

1 Like

Hello,

Any updates with the engine conversion?

Hi,

Sorry that we are still checking this internally.
Will get back to you later.

Thanks for your patience.

Hi,

Thanks for your patience.

We have a new release for the Jetson platform recently.
Do you mind checking this with JetPack 4.6.1 + TensorRT 8.2 to see if it works?

Thanks.

1 Like