How to run a ONNX format Efficientdet-d0 model exported from TAO 6.0

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): dGPU
• DeepStream Version: 8.0
• JetPack Version (valid for Jetson only)
• TensorRT Version: As per DS 8.0
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I used TAO nvcr.io/nvidia/tao/tao-toolkit:6.0.0-tf2 container to train, prune and retrain an Efficientdet-d0 model which was successful. I used the notebook tao-experiments/efficientdet_tf2/efficientdet.ipynb

DONE (t=0.56s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.613
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.960
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.695
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.277
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.645
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.575
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.635
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.741
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.750
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.386
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.759
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.734
None

I was also able to successfully export the model into onnx format.

Loaded saved model from /tmp/tmp7c44ew6u
Using tensorflow=2.17.0, onnx=1.17.0, tf2onnx=1.16.1/3dd772
Using opset <onnx, 13>
Computed 0 values for constant folding
Optimizing ONNX model
After optimization: BatchNormalization -54 (108->54), Cast -38 (48->10), Const -536 (1136->600), GlobalAveragePool +16 (0->16), Greater -1 (1->0), Identity -2 (2->0), Mul -2 (192->190), ReduceMean -16 (16->0), ReduceSum -1 (1->0), Reshape -70 (114->44), Shape -1 (7->6), Squeeze +13 (22->35), Transpose -765 (780->15), Unsqueeze -8 (28->20)
TF2ONNX graph created successfully
[ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) of Optype (Mul) bound to different types (tensor(int32) and tensor(float) in node (StatefulPartitionedCall/mul).Found Conv node 'StatefulPartitionedCall/efficientdet-d0/stem_conv/Conv2D' as stem entry
Updating Resize node Resize__870 to [1. 1. 2. 2.]Found Concat node 'StatefulPartitionedCall/concat' as the tip of class-predict
Found Concat node 'StatefulPartitionedCall/concat_1' as the tip of box-predict
Created NMS plugin 'EfficientNMS_TRT' with attributes: {'plugin_version': '1', 'background_class': -1, 'max_output_boxes': 100, 'score_threshold': 0.4, 'iou_threshold': 0.5, 'score_activation': True, 'box_coding': 1}
The exported model is saved at: /workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.onnx
Export finished successfully.

When I check the input and output dimensions for the model I get this.

input: input shape: ['dim?', 512, 512, 3]
outputs: ['num_detections', 'detection_boxes', 'detection_scores', 'detection_classes']

which looks normal.

I ran evaluate and inference as well which were successful.

Trt_inference results will be saved at: /workspace/tao-experiments/efficientdet_tf2/export
Starting efficientdet_tf2 trt_inference.

Producing predictions: 100%|██████████| 314/314 [00:09<00:00, 34.18it/s]Finished inference.
Trt_inference finished successfully.

All the bounding boxes were correctly produced.

My real problem starts when I proceed to deploy to Deepstream 8.0. DS expects the model in NCHW format while the ONNX format is NHWC which keeps giving me problems.

I tried to use the trtexec as well to build the engine file which was successfull.

[10/01/2025-03:55:54] [I] Input binding for input with dimensions 1x512x512x3 is created.
[10/01/2025-03:55:54] [I] Output binding for num_detections with dimensions 1x1 is created.
[10/01/2025-03:55:54] [I] Output binding for detection_boxes with dimensions 1x100x4 is created.
[10/01/2025-03:55:54] [I] Output binding for detection_scores with dimensions 1x100 is created.
[10/01/2025-03:55:54] [I] Output binding for detection_classes with dimensions 1x100 is created.

&&&& PASSED TensorRT.trtexec [TensorRT v100900] [b34] # trtexec --onnx=assets/efficientdet-d0.onnx --saveEngine=assets/efficientdet-d0_b1_fp16.engine --fp16 --minShapes=input:1x512x512x3 --optShapes=input:1x512x512x3 --maxShapes=input:1x512x512x3

But this engine also lands into the same issue.

Opening in BLOCKING MODE
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
[NvMultiObjectTracker] Initialized
0:00:00.685705062 268 0x56292d4d6fc0 WARN nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1260> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:00.764122940 268 0x56292d4d6fc0 INFO nvinfer gstnvinfer.cpp:685:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2109> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-8.0/sources/apps/sample_apps/deepstream-nvdsanalytics-test/assets/efficientdet-d0_b1_fp16.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:363 [Implicit Engine Info]: layers num: 0

0:00:00.764154641 268 0x56292d4d6fc0 INFO nvinfer gstnvinfer.cpp:685:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2212> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-8.0/sources/apps/sample_apps/deepstream-nvdsanalytics-test/assets/efficientdet-d0_b1_fp16.engine
0:00:00.764168807 268 0x56292d4d6fc0 ERROR nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::preparePreprocess() <nvdsinfer_context_impl.cpp:1051> [UID = 1]: RGB/BGR input format specified but network input channels is not 3
ERROR: nvdsinfer_context_impl.cpp:1377 Infer Context prepare preprocessing resource failed., nvinfer error:NVDSINFER_CONFIG_FAILED
0:00:00.773772980 268 0x56292d4d6fc0 WARN nvinfer gstnvinfer.cpp:918:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferContext instance
0:00:00.773790994 268 0x56292d4d6fc0 WARN nvinfer gstnvinfer.cpp:918:gst_nvinfer_start:<primary_gie> error: Config file path: /opt/nvidia/deepstream/deepstream-8.0/sources/apps/sample_apps/deepstream-nvdsanalytics-test/assets/pgie_config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
** ERROR: main:1539: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to create NvDsInferContext instance
Debug info: gstnvinfer.cpp(918): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie:
Config file path: /opt/nvidia/deepstream/deepstream-8.0/sources/apps/sample_apps/deepstream-nvdsanalytics-test/assets/pgie_config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
[NvMultiObjectTracker] De-initialized
App run failed

Here is my pgie config and the onnx file. Can you guide me with steps.

efficientdet-d0.onnx.txt (14.8 MB)

pgie_config.txt (716 Bytes)

Can someone please check.

Seems your model input layer dimension is NHWC


But you configured “network-input-order=0” in the pgie_config.txt file. It is wrong. All the gst-nvinfer parameters are explained in Gst-nvinfer — DeepStream documentation, please configure correctly according to your model.

I did write a small parser and used an updated pgie file. The pipeline runs successfully but does not provide any detections.

deepstream-app -c ./assets/pipeline_config.txt
** WARN: <parse_source:725>: Unknown key ‘alert-interval’ for group [source0]
** WARN: <parse_source:725>: Unknown key ‘calc-interval’ for group [source0]
** WARN: <parse_source:725>: Unknown key ‘latitude’ for group [source0]
** WARN: <parse_source:725>: Unknown key ‘longitude’ for group [source0]
** WARN: <parse_source:725>: Unknown key ‘ppm’ for group [source0]
** WARN: <parse_source:725>: Unknown key ‘threshold’ for group [source0]
** WARN: <parse_tracker:1756>: Unknown key ‘enable-batch-process’ for group [tracker]
** WARN: <parse_tracker:1756>: Unknown key ‘enable-past-frame’ for group [tracker]
Unknown or legacy key specified ‘confidence-threshold’ for group [property]
Unknown or legacy key specified ‘nms-iou-threshold’ for group [property]
Warn: ‘threshold’ parameter has been deprecated. Use ‘pre-cluster-threshold’ instead.
Authorization required, but no authorization protocol specified

Opening in BLOCKING MODE
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
[NvMultiObjectTracker] Initialized
0:00:00.330060054 371 0x58fac3d21190 INFO nvinfer gstnvinfer.cpp:685:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2109> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-8.0/sources/apps/sample_apps/js-nvdsanalytics/assets/efficientdet-d0_b1_fp16.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:363 [Implicit Engine Info]: layers num: 0

0:00:00.330099374 371 0x58fac3d21190 INFO nvinfer gstnvinfer.cpp:685:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2212> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-8.0/sources/apps/sample_apps/js-nvdsanalytics/assets/efficientdet-d0_b1_fp16.engine
0:00:00.333843926 371 0x58fac3d21190 INFO nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-8.0/sources/apps/sample_apps/js-nvdsanalytics/assets/pgie_config.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

p: Pause
r: Resume

** INFO: <bus_callback:291>: Pipeline ready

Opening in BLOCKING MODE
** INFO: <bus_callback:277>: Pipeline running

nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:334>: Received EOS. Exiting …

Quitting
[NvMultiObjectTracker] De-initialized
App run successful

nvdsparse_efficientdet.txt (1.4 KB)

pgie_config.txt (1.8 KB)

Why did you customize the bbox parsing function? Is there any issue with “NvDsInferParseCustomEfficientDetTAO” in deepstream_tao_apps/configs/nvinfer/efficientdet_tao/pgie_d0_512_tao_config.txt at release/tao5.1_ds6.4ga · NVIDIA-AI-IOT/deepstream_tao_apps?

Thanks. This works. The documentation is very convoluted. :-)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.