Hello,I have encountered a problem
training platform:
• Hardware
ubuntu22.04
tao toolkit6.25.9
NVIDIA version 535.183.01
RTX4090
• Network Type (rtdetr)
deployment platform:
NVIDIA Jetson Orin Nano (8GB ram)
When I exported onnx and converted it to engine on Deepstream for inference, there was no problem using FP32 precision, but with FP16 precision, there was no recognition box. There are the following warnings during conversion:
WARNING: [TRT]: Detected layernorm nodes in FP16: /model/stages.1/stages.1.1/norm/ReduceMean_1, /model/stages.2/stages.2.6/norm/ReduceMean_1, /model/stages.1/stages.1.0/norm/ReduceMean_1, /model/stages.2/stages.2.0/norm/ReduceMean_1, /model/stages.2/stages.2.3/norm/ReduceMean_1, /model/downsample_layers.3/downsample_layers.3.0/ReduceMean_1, /model/downsample_layers.2/downsample_layers.2.0/ReduceMean_1, /model/stages.1/stages.1.0/norm/Sqrt, /model/stages.2/stages.2.7/norm/ReduceMean_1, /model/stages.3/stages.3.0/norm/ReduceMean_1, /model/downsample_layers.0/downsample_layers.0.1/Sqrt, /model/stages.0/stages.0.1/norm/Sqrt, /model/stages.0/stages.0.0/norm/Sqrt, /model/stages.3/stages.3.1/norm/ReduceMean_1, /model/stages.2/stages.2.1/norm/ReduceMean_1, /model/stages.2/stages.2.4/norm/ReduceMean_1, /model/downsample_layers.1/downsample_layers.1.0/Sqrt, /model/downsample_layers.0/downsample_layers.0.1/ReduceMean_1, /model/stages.0/stages.0.0/norm/ReduceMean_1, /model/downsample_layers.0/downsample_layers.0.1/Sub, /model/downsample_layers.0/downsample_layers.0.1/Pow, /model/downsample_layers.0/downsample_layers.0.1/Add, /model/downsample_layers.0/downsample_layers.0.1/Div, /model/downsample_layers.0/downsample_layers.0.1/Mul, /model/downsample_layers.0/downsample_layers.0.1/Add_1, /model/stages.0/stages.0.0/norm/Sub, /model/stages.0/stages.0.0/norm/Pow, /model/stages.0/stages.0.0/norm/Add, /model/stages.0/stages.0.0/norm/Div, /model/stages.0/stages.0.0/norm/Mul, /model/stages.0/stages.0.0/norm/Add_1, /model/stages.0/stages.0.1/norm/Sub, /model/stages.0/stages.0.1/norm/Pow, /model/stages.0/stages.0.1/norm/Add, /model/stages.0/stages.0.1/norm/Div, /model/stages.0/stages.0.1/norm/Mul, /model/stages.0/stages.0.1/norm/Add_1, /model/downsample_layers.1/downsample_layers.1.0/Sub, /model/downsample_layers.1/downsample_layers.1.0/Pow, /model/downsample_layers.1/downsample_layers.1.0/Add, /model/downsample_layers.1/downsample_layers.1.0/Div, /model/downsample_layers.1/downsample_layers.1.0/Mul, /model/downsample_layers.1/downsample_layers.1.0/Add_1, /model/stages.1/stages.1.0/norm/Sub, /model/stages.1/stages.1.0/norm/Pow, /model/stages.1/stages.1.0/norm/Add, /model/stages.1/stages.1.0/norm/Div, /model/stages.1/stages.1.0/norm/Mul, /model/stages.1/stages.1.0/norm/Add_1, /model/stages.1/stages.1.1/norm/Sub, /model/stages.1/stages.1.1/norm/Pow, /model/stages.1/stages.1.1/norm/Add, /model/stages.1/stages.1.1/norm/Sqrt, /model/stages.1/stages.1.1/norm/Div, /model/stages.1/stages.1.1/norm/Mul, /model/stages.1/stages.1.1/norm/Add_1, /model/downsample_layers.2/downsample_layers.2.0/Sub, /model/downsample_layers.2/downsample_layers.2.0/Pow, /model/downsample_layers.2/downsample_layers.2.0/Add, /model/downsample_layers.2/downsample_layers.2.0/Sqrt, /model/downsample_layers.2/downsample_layers.2.0/Div, /model/downsample_layers.2/downsample_layers.2.0/Mul, /model/downsample_layers.2/downsample_layers.2.0/Add_1, /model/stages.2/stages.2.0/norm/Sub, /model/stages.2/stages.2.0/norm/Pow, /model/stages.2/stages.2.0/norm/Add, /model/stages.2/stages.2.0/norm/Sqrt, /model/stages.2/stages.2.0/norm/Div, /model/stages.2/stages.2.0/norm/Mul, /model/stages.2/stages.2.0/norm/Add_1, /model/stages.2/stages.2.1/norm/Sub, /model/stages.2/stages.2.1/norm/Pow, /model/stages.2/stages.2.1/norm/Add, /model/stages.2/stages.2.1/norm/Sqrt, /model/stages.2/stages.2.1/norm/Div, /model/stages.2/stages.2.1/norm/Mul, /model/stages.2/stages.2.1/norm/Add_1, /model/stages.2/stages.2.2/norm/Sub, /model/stages.2/stages.2.2/norm/Pow, /model/stages.2/stages.2.2/norm/Add, /model/stages.2/stages.2.2/norm/Sqrt, /model/stages.2/stages.2.2/norm/Div, /model/stages.2/stages.2.2/norm/Mul, /model/stages.2/stages.2.2/norm/Add_1, /model/stages.2/stages.2.3/norm/Sub, /model/stages.2/stages.2.3/norm/Pow, /model/stages.2/stages.2.3/norm/Add, /model/stages.2/stages.2.3/norm/Sqrt, /model/stages.2/stages.2.3/norm/Div, /model/stages.2/stages.2.3/norm/Mul, /model/stages.2/stages.2.3/norm/Add_1, /model/stages.2/stages.2.4/norm/Sub, /model/stages.2/stages.2.4/norm/Pow, /model/stages.2/stages.2.4/norm/Add, /model/s
WARNING: [TRT]: Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
I set opset to 18 for this, but there was an error when converting the engine:
ERROR: [TRT]: ModelImporter.cpp:768: While parsing node number 772 [TopK -> "/model/decoder/TopK_output_0"]:
ERROR: [TRT]: ModelImporter.cpp:769: --- Begin node ---
ERROR: [TRT]: ModelImporter.cpp:770: input: "/model/decoder/ReduceMax_output_0"
input: "/model/decoder/Reshape_9_output_0"
output: "/model/decoder/TopK_output_0"
output: "/model/decoder/TopK_output_1"
name: "/model/decoder/TopK"
op_type: "TopK"
attribute {
name: "axis"
i: 1
type: INT
}
attribute {
name: "largest"
i: 1
type: INT
}
attribute {
name: "sorted"
i: 1
type: INT
}
ERROR: [TRT]: ModelImporter.cpp:771: --- End node ---
ERROR: [TRT]: ModelImporter.cpp:773: ERROR: onnx2trt_utils.cpp:342 In function convertAxis:
[8] Assertion failed: (axis >= 0 && axis <= nbDims) && "Axis must be in the range [0, nbDims]."
ERROR: Failed to parse onnx file
ERROR: failed to build network since parsing model errors.
ERROR: failed to build network.
0:00:13.074210041 3167867 0xaaaaf299f130 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2129> [UID = 1]: build engine file failed
0:00:13.473886176 3167867 0xaaaaf299f130 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2215> [UID = 1]: build backend context failed
0:00:13.473961026 3167867 0xaaaaf299f130 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1352> [UID = 1]: generate backend failed, check config file settings


