I have an onnx engine that works fine on a Jetson XavierNX and a Jetson TX2 NX device. When I try and run the same model on a Jetson Orin Nano 8GB device it emits the following malloc error:
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:28.188148599 116580 0xaaaaf09f4b30 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<person-detector-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2092> [UID = 1]: deserialized trt engine from :../models/person_detection/person_detection_orin_fp16.engine
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1:0 3x544x960
1 OUTPUT kFLOAT output_cov/Sigmoid:0 1x34x60
2 OUTPUT kFLOAT output_bbox/BiasAdd:0 4x34x60
ERROR: [TRT]: 3: Cannot find binding of given name: conv2d_bbox
0:00:28.601311011 116580 0xaaaaf09f4b30 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<person-detector-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:2059> [UID = 1]: Could not find output layer 'conv2d_bbox' in engine
ERROR: [TRT]: 3: Cannot find binding of given name: conv2d_cov/Sigmoid.
0:00:28.601361028 116580 0xaaaaf09f4b30 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<person-detector-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:2059> [UID = 1]: Could not find output layer 'conv2d_cov/Sigmoid.' in engine
0:00:28.601379557 116580 0xaaaaf09f4b30 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<person-detector-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2195> [UID = 1]: Use deserialized engine model: ../models/person_detection/person_detection_orin_fp16.engine
malloc_consolidate(): unaligned fastbin chunk detected
I generated the engine plan using the docker image nvcr.io/nvidia/l4t-tensorrt:r8.6.2-runtime.
I flashed the Jetson Orin Nano device using jetson_linux_r36.2.0_aarch64.tbz2 and tegra_linux_sample-root-filesystem_r36.2.0_aarch64.tbz2.
This is the python code that builds the inference detector:
person_detector = make_elm_or_print_err("nvinfer", "person-detector-engine", "Person Detector")
person_detector.set_property("config-file-path", "../model_configs/person_detection/person_detection.txt")
person_detector.set_property("unique-id", inference_common.PERSON_DETECTOR_UID)
person_detector.set_property("model-engine-file", f"../models/person_detection/person_detection_{PROCESSOR_TYPE}_fp16.engine")
This is the contents of config file that it loads:
[property]
# The following is generated by the transfer learning toolkit (TAO) and should be replaced after updating the model
net-scale-factor=0.00392156862745098
offsets=0.0;0.0;0.0
infer-dims=3;544;960
tlt-model-key=********
network-type=0
num-detected-classes=1
model-color-format=0
maintain-aspect-ratio=0
output-tensor-meta=0
# Device ID of GPU to use for pre-processing/inference (dGPU only)
gpu-id=0
# Pixel normalization factor (ignored if input-tensor-meta enabled)
# Really unclear of how this should be set, but this works.
#net-scale-factor=<this is set by TAO - see above>
# Pathname of the serialized model engine file
# In our application, this is set in code to get the proper processor type
# model-engine-file=../../models/person_detection/person_detection_xaviernx_fp16.engine
# Pathname of a text file containing the labels for the model
labelfile-path=../../models/person_detection/labels.txt
# Pathname of the TAO toolkit encoded model.
tlt-encoded-model=../../models/person_detection/person_detection.onnx
# Key for the TAO toolkit encoded model.
#tlt-model-key=<this is configured by TAO - see above>
# Pathname of the INT8 calibration file for dynamic range adjustment with an FP32 model.
#int8-calib-file=../../../../samples/models/Primary_Detector/cal_trt.bin
# When a network supports both implicit batch dimension and full dimension, force the implicit batch dimension mode.
force-implicit-batch-dim=1
# Number of frames or objects to be inferred together in a batch.
batch-size=1
# Data format to be used by inference. Integer 0: FP32 1: INT8 2: FP16.
network-mode=2
# Number of classes detected by the network
#num-detected-classes=<this is configured by TAO - see above>
# Specifies the number of consecutive batches to be skipped for inference.
interval=0
# Unique ID to be assigned to the GIE to enable the application and other elements to identify detected bounding boxes and labels.
gie-unique-id=1
# Filter out detected objects belonging to specified class-ids. Semicolon delimited integer array.
# 1;2 are bags and heads
#filter-out-class-ids=1;2
# Array of output layer names. Semicolon delimited string array.
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid.
#scaling-filter=0
#scaling-compute-hw=0
# Clustering algorithm to use. Refer to the next table for configuring the algorithm specific parameters.
# Integer 0: OpenCV groupRectangles() 1: DBSCAN 2: Non Maximum Suppression 3: DBSCAN + NMS Hybrid 4: No clustering.
cluster-mode=2
# Detection threshold
#threshold=0.01
[class-attrs-all]
# Detection threshold to be applied prior to clustering operation
pre-cluster-threshold=0.01
# Keep only top K objects with highest detection scores. Better definition in DeepStream 6.2 docs: Specify top k detection results to keep after nms, where 0 means keep all.
topk=20
# Maximum IOU score between two proposals after which the proposal with the lower confidence will be rejected.
nms-iou-threshold=0.2
The above work fine on the Jetson Xavier NX and Jetson TX2 NX.