The RTDETR model is used on DeepStream with FP16 precision and no bounding boxes

Hello,I have encountered a problem

training platform:

• Hardware
ubuntu22.04
tao toolkit6.25.9
NVIDIA version 535.183.01
RTX4090
• Network Type (rtdetr)

deployment platform:

NVIDIA Jetson Orin Nano (8GB ram)

When I exported onnx and converted it to engine on Deepstream for inference, there was no problem using FP32 precision, but with FP16 precision, there was no recognition box. There are the following warnings during conversion:

WARNING: [TRT]: Detected layernorm nodes in FP16: /model/stages.1/stages.1.1/norm/ReduceMean_1, /model/stages.2/stages.2.6/norm/ReduceMean_1, /model/stages.1/stages.1.0/norm/ReduceMean_1, /model/stages.2/stages.2.0/norm/ReduceMean_1, /model/stages.2/stages.2.3/norm/ReduceMean_1, /model/downsample_layers.3/downsample_layers.3.0/ReduceMean_1, /model/downsample_layers.2/downsample_layers.2.0/ReduceMean_1, /model/stages.1/stages.1.0/norm/Sqrt, /model/stages.2/stages.2.7/norm/ReduceMean_1, /model/stages.3/stages.3.0/norm/ReduceMean_1, /model/downsample_layers.0/downsample_layers.0.1/Sqrt, /model/stages.0/stages.0.1/norm/Sqrt, /model/stages.0/stages.0.0/norm/Sqrt, /model/stages.3/stages.3.1/norm/ReduceMean_1, /model/stages.2/stages.2.1/norm/ReduceMean_1, /model/stages.2/stages.2.4/norm/ReduceMean_1, /model/downsample_layers.1/downsample_layers.1.0/Sqrt, /model/downsample_layers.0/downsample_layers.0.1/ReduceMean_1, /model/stages.0/stages.0.0/norm/ReduceMean_1, /model/downsample_layers.0/downsample_layers.0.1/Sub, /model/downsample_layers.0/downsample_layers.0.1/Pow, /model/downsample_layers.0/downsample_layers.0.1/Add, /model/downsample_layers.0/downsample_layers.0.1/Div, /model/downsample_layers.0/downsample_layers.0.1/Mul, /model/downsample_layers.0/downsample_layers.0.1/Add_1, /model/stages.0/stages.0.0/norm/Sub, /model/stages.0/stages.0.0/norm/Pow, /model/stages.0/stages.0.0/norm/Add, /model/stages.0/stages.0.0/norm/Div, /model/stages.0/stages.0.0/norm/Mul, /model/stages.0/stages.0.0/norm/Add_1, /model/stages.0/stages.0.1/norm/Sub, /model/stages.0/stages.0.1/norm/Pow, /model/stages.0/stages.0.1/norm/Add, /model/stages.0/stages.0.1/norm/Div, /model/stages.0/stages.0.1/norm/Mul, /model/stages.0/stages.0.1/norm/Add_1, /model/downsample_layers.1/downsample_layers.1.0/Sub, /model/downsample_layers.1/downsample_layers.1.0/Pow, /model/downsample_layers.1/downsample_layers.1.0/Add, /model/downsample_layers.1/downsample_layers.1.0/Div, /model/downsample_layers.1/downsample_layers.1.0/Mul, /model/downsample_layers.1/downsample_layers.1.0/Add_1, /model/stages.1/stages.1.0/norm/Sub, /model/stages.1/stages.1.0/norm/Pow, /model/stages.1/stages.1.0/norm/Add, /model/stages.1/stages.1.0/norm/Div, /model/stages.1/stages.1.0/norm/Mul, /model/stages.1/stages.1.0/norm/Add_1, /model/stages.1/stages.1.1/norm/Sub, /model/stages.1/stages.1.1/norm/Pow, /model/stages.1/stages.1.1/norm/Add, /model/stages.1/stages.1.1/norm/Sqrt, /model/stages.1/stages.1.1/norm/Div, /model/stages.1/stages.1.1/norm/Mul, /model/stages.1/stages.1.1/norm/Add_1, /model/downsample_layers.2/downsample_layers.2.0/Sub, /model/downsample_layers.2/downsample_layers.2.0/Pow, /model/downsample_layers.2/downsample_layers.2.0/Add, /model/downsample_layers.2/downsample_layers.2.0/Sqrt, /model/downsample_layers.2/downsample_layers.2.0/Div, /model/downsample_layers.2/downsample_layers.2.0/Mul, /model/downsample_layers.2/downsample_layers.2.0/Add_1, /model/stages.2/stages.2.0/norm/Sub, /model/stages.2/stages.2.0/norm/Pow, /model/stages.2/stages.2.0/norm/Add, /model/stages.2/stages.2.0/norm/Sqrt, /model/stages.2/stages.2.0/norm/Div, /model/stages.2/stages.2.0/norm/Mul, /model/stages.2/stages.2.0/norm/Add_1, /model/stages.2/stages.2.1/norm/Sub, /model/stages.2/stages.2.1/norm/Pow, /model/stages.2/stages.2.1/norm/Add, /model/stages.2/stages.2.1/norm/Sqrt, /model/stages.2/stages.2.1/norm/Div, /model/stages.2/stages.2.1/norm/Mul, /model/stages.2/stages.2.1/norm/Add_1, /model/stages.2/stages.2.2/norm/Sub, /model/stages.2/stages.2.2/norm/Pow, /model/stages.2/stages.2.2/norm/Add, /model/stages.2/stages.2.2/norm/Sqrt, /model/stages.2/stages.2.2/norm/Div, /model/stages.2/stages.2.2/norm/Mul, /model/stages.2/stages.2.2/norm/Add_1, /model/stages.2/stages.2.3/norm/Sub, /model/stages.2/stages.2.3/norm/Pow, /model/stages.2/stages.2.3/norm/Add, /model/stages.2/stages.2.3/norm/Sqrt, /model/stages.2/stages.2.3/norm/Div, /model/stages.2/stages.2.3/norm/Mul, /model/stages.2/stages.2.3/norm/Add_1, /model/stages.2/stages.2.4/norm/Sub, /model/stages.2/stages.2.4/norm/Pow, /model/stages.2/stages.2.4/norm/Add, /model/s
WARNING: [TRT]: Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.

I set opset to 18 for this, but there was an error when converting the engine:

ERROR: [TRT]: ModelImporter.cpp:768: While parsing node number 772 [TopK -> "/model/decoder/TopK_output_0"]:
ERROR: [TRT]: ModelImporter.cpp:769: --- Begin node ---
ERROR: [TRT]: ModelImporter.cpp:770: input: "/model/decoder/ReduceMax_output_0"
input: "/model/decoder/Reshape_9_output_0"
output: "/model/decoder/TopK_output_0"
output: "/model/decoder/TopK_output_1"
name: "/model/decoder/TopK"
op_type: "TopK"
attribute {
  name: "axis"
  i: 1
  type: INT
}
attribute {
  name: "largest"
  i: 1
  type: INT
}
attribute {
  name: "sorted"
  i: 1
  type: INT
}

ERROR: [TRT]: ModelImporter.cpp:771: --- End node ---
ERROR: [TRT]: ModelImporter.cpp:773: ERROR: onnx2trt_utils.cpp:342 In function convertAxis:
[8] Assertion failed: (axis >= 0 && axis <= nbDims) && "Axis must be in the range [0, nbDims]."
ERROR: Failed to parse onnx file
ERROR: failed to build network since parsing model errors.
ERROR: failed to build network.
0:00:13.074210041 3167867 0xaaaaf299f130 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2129> [UID = 1]: build engine file failed
0:00:13.473886176 3167867 0xaaaaf299f130 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2215> [UID = 1]: build backend context failed
0:00:13.473961026 3167867 0xaaaaf299f130 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1352> [UID = 1]: generate backend failed, check config file settings

@Morganh

Hello, I’m going to trouble you again,Thanks.

To narrow down, please try to run some experiments.

  1. Please try to export an onnx file with a newer opset version. For example, version 18. You can refer to my previous spec.yaml.
  2. Please try to generate engine inside the tao-deploy docker instead of deepstream.
    The tao-deploy docker can be found in GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC . nvcr.io/nvidia/tao/tao-toolkit:6.25.9-deploy
    Then inside the tao-deply docker, run trtexec to generate the TensorRT engine. Refer to TRTEXEC with RT-DETR — Tao Toolkit.

I tried setting opset to 18 in tao Docker to export onnx, then generating an engine with fp16 precision in Docker and testing it in a container, but encountered an error: No bounding box coordinates available

(I did not run this engine on the Deepstream platform because the container environment is different from the deployment environment)

Starting rtdetr trt_inference.

Error drawing bbox for prediction [ 2. nan nan nan nan nan]/s]Error drawing bbox for prediction [ 2. nan nan nan nan nan]
Error drawing bbox for prediction [ 2. nan nan nan nan nan]  1.86s/it]Error drawing bbox for prediction [ 2. nan nan nan nan nan]
Error drawing bbox for prediction [ 2. nan nan nan nan nan]  1.55s/it]Error drawing bbox for prediction [ 2. nan nan nan nan nan]
Error drawing bbox for prediction [ 2. nan nan nan nan nan]  1.66s/it]Error drawing bbox for prediction [ 2. nan nan nan nan nan]
Error drawing bbox for prediction [ 2. nan nan nan nan nan]  1.71s/it]Error drawing bbox for prediction [ 2. nan nan nan nan nan]
Error drawing bbox for prediction [ 2. nan nan nan nan nan]  1.89s/it]Error drawing bbox for prediction [ 2. nan nan nan nan nan]

Any change for previous log?

This is a warning when converting engines on the Deepstream platform (opset<=17). When opset=18, an error will be reported on the Deepstream platform:

ERROR: [TRT]: ModelImporter.cpp:768: While parsing node number 772 [TopK -> "/model/decoder/TopK_output_0"]:
ERROR: [TRT]: ModelImporter.cpp:769: --- Begin node ---
ERROR: [TRT]: ModelImporter.cpp:770: input: "/model/decoder/ReduceMax_output_0"
input: "/model/decoder/Reshape_9_output_0"
output: "/model/decoder/TopK_output_0"
output: "/model/decoder/TopK_output_1"
name: "/model/decoder/TopK"
op_type: "TopK"
attribute {
  name: "axis"
  i: 1
  type: INT
}
attribute {
  name: "largest"
  i: 1
  type: INT
}
attribute {
  name: "sorted"
  i: 1
  type: INT
}

ERROR: [TRT]: ModelImporter.cpp:771: --- End node ---
ERROR: [TRT]: ModelImporter.cpp:773: ERROR: onnx2trt_utils.cpp:342 In function convertAxis:
[8] Assertion failed: (axis >= 0 && axis <= nbDims) && "Axis must be in the range [0, nbDims]."
ERROR: Failed to parse onnx file
ERROR: failed to build network since parsing model errors.
ERROR: failed to build network.
0:00:13.074210041 3167867 0xaaaaf299f130 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2129> [UID = 1]: build engine file failed
0:00:13.473886176 3167867 0xaaaaf299f130 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2215> [UID = 1]: build backend context failed
0:00:13.473961026 3167867 0xaaaaf299f130 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1352> [UID = 1]: generate backend failed, check config file settings

Please share the full log when you use trtexec to generate fp16 engine inside the nvcr.io/nvidia/tao/tao-toolkit:6.25.9-deploy docker. Thanks.

Okay, this is the complete log:

2025-10-14 15:13:03,966 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-10-14 15:13:04,080 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:6.25.9-deploy
2025-10-14 15:13:04,160 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 308: Printing tty value True
sys:1: UserWarning: 
'gen_trt_engine.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.12/dist-packages/nvidia_tao_deploy/cv/common/hydra/hydra_runner.py:99: UserWarning: 
'gen_trt_engine.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  _run_hydra(
/usr/local/lib/python3.12/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Gen_trt_engine results will be saved at: /results/run/person-act/act4/gen_trt_engine/gen_trt_engine
Log file already exists at /results/run/person-act/act4/gen_trt_engine/gen_trt_engine/status.json
Starting rtdetr gen_trt_engine.
[10/14/2025-07:13:10] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2755, GPU +446, now: CPU 3011, GPU 1224 (MiB)Setting up QAT mode: False
[10/14/2025-07:13:10] [TRT] [I] Successfully created plugin: MultiscaleDeformableAttnPlugin_TRTParsing ONNX modelin_namespace:gin.
List inputs:
Input 0 -> inputs.
(3, 640, 640).
-1.
Network Description
Input 'inputs' with shape (-1, 3, 640, 640) and dtype DataType.FLOAT
Output 'pred_logits' with shape (-1, 100, 4) and dtype DataType.FLOAT
Output 'pred_boxes' with shape (-1, 100, 4) and dtype DataType.FLOAT
TensorRT engine build configurations:
  OptimizationProfile: 
    "inputs": (1, 3, 640, 640), (8, 3, 640, 640), (8, 3, 640, 640)
 
  BuilderFlag.FP16
  BuilderFlag.TF32
 
  Note: max representabile value is 2,147,483,648 bytes or 2GB.
  MemoryPoolType.WORKSPACE = 2147483648 bytes
  MemoryPoolType.DLA_MANAGED_SRAM = 0 bytes
  MemoryPoolType.DLA_LOCAL_DRAM = 1073741824 bytes
  MemoryPoolType.DLA_GLOBAL_DRAM = 536870912 bytes
  MemoryPoolType.TACTIC_DRAM = 25393692672 bytes
  MemoryPoolType.TACTIC_SHARED_MEMORY = 1073741824 bytes
 
  Tactic Sources = 24
[10/14/2025-07:16:29] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 38 MiB, GPU 813 MiBquiring 439971328 bytes.FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.Engine build finished successfully.
Gen_trt_engine finished successfully.
2025-10-14 07:16:31,277 - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra - WARNING - Telemetry data couldn't be sent, but the command ran successfully.
2025-10-14 07:16:31,278 - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra - WARNING - 'str' object has no attribute 'decode'
2025-10-14 07:16:31,278 - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra - INFO - Execution status: PASS
2025-10-14 15:16:32,074 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 371: Stopping container.

Then still inside the tao-deploy docker, please try to run inference with this fp16 engine.
Please refer to the command and config in RT-DETR with TAO Deploy — Tao Toolkit.

Please note, when you run command inside the tao-deploy docker, it is not needed to add tao deploy in the beginning of the command, i.e., $ rtdetr inference xxx

I created a new container tao toolkit: 6.25.9-deploy from the image and ran the inference of engine fp16. The log is as follows

root@293ce86a9252:/usr/local/lib/python3.12/dist-packages/nvidia_tao_deploy/cv/rtdetr/scripts# python3 inference.py     --config-path /workspace/tao-experiments/rtdetr/specs     --config-name infer
sys:1: UserWarning: 
'infer' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.12/dist-packages/nvidia_tao_deploy/cv/common/hydra/hydra_runner.py:99: UserWarning: 
'infer' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  _run_hydra(
/usr/local/lib/python3.12/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Trt_inference results will be saved at: /workspace/tao-experiments/rtdetr/run/person-act/act4/infer
Log file already exists at /workspace/tao-experiments/rtdetr/run/person-act/act4/infer/status.json
Starting rtdetr trt_inference.
Producing predictions:   0%|                                                                                                                                      | 0/271 [00:00<?, ?it/s]Error drawing bbox for prediction [ 2. nan nan nan nan nan]
Error drawing bbox for prediction [ 0. nan nan nan nan nan]
Error drawing bbox for prediction [ 1. nan nan nan nan nan]
Error drawing bbox for prediction [ 2. nan nan nan nan nan]
Error drawing bbox for prediction [ 0. nan nan nan nan nan]
Error drawing bbox for prediction [ 1. nan nan nan nan nan]
Error drawing bbox for prediction [ 2. nan nan nan nan nan]

Use the same way, inside the tao-deploy docker, please try to run inference with this fp32 engine. The result is expected, right?

Yes, this is normal:

Trt_inference results will be saved at: /workspace/tao-experiments/rtdetr/run/person-act/act4/infer
Log file already exists at /workspace/tao-experiments/rtdetr/run/person-act/act4/infer/status.json
Starting rtdetr trt_inference.
Producing predictions:   4%|████▌                                                                                                                        | 10/271 [00:13<05:44,  1.32s/it]

OK.

For FP16, please generate a new onnx with opset 17 and retest.
In nvcr.io/nvidia/tao/tao-toolkit:6.25.9-deploy, the TensorRT version is 10.8.0.43. Refer to Release Notes — NVIDIA TensorRT Documentation,


Suggest to test with opset17 also.

Same result:

Trt_inference results will be saved at: /workspace/tao-experiments/rtdetr/run/person-act/act4/infer
Log file already exists at /workspace/tao-experiments/rtdetr/run/person-act/act4/infer/status.json
Starting rtdetr trt_inference.
Producing predictions:   0%|                                            | 0/271 [00:00<?, ?it/s]Error drawing bbox for prediction [ 2. nan nan nan nan nan]
Error drawing bbox for prediction [ 0. nan nan nan nan nan]
Error drawing bbox for prediction [ 1. nan nan nan nan nan]
Error drawing bbox for prediction [ 2. nan nan nan nan nan]

Hello, I have temporarily resolved the issue. I found the ResNet50 backbone (which you posted) in other people’s posts on the forum, and trained RTDETR using 544 * 960. I exported FP16 precision in the container and tested it with bounding boxes. There are also bounding boxes on Deepstream, and there are no previous warnings

Thanks for the info. Glad to know it is working now.
According to The issue of width and height when exporting onnx from rtdetr - #16 by 2295098451, you were using convnextv2_nano backbone.

May I know your latest spec yaml file? Thanks.

Sure, this is a file in yml format. Since I can only upload files in txt format, I have modified the extension

train-act.txt (3.3 KB)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.