Loss of precision to onnx converter for engine by deepstream 6.3

When converting the YOLOv8 ONNX model to the DeepStream engine, there is a loss of accuracy, especially for larger objects. However, when converting the model outside of DeepStream, from ONNX to Engine, accuracy is not affected. The same problem happens for yolov4 trained on tao

Warning logs when a conversion done

o from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in 1. Introduction — CUDA C Programming Guide
WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
WARNING: [TRT]: Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 17179869184 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
WARNING: [TRT]: Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 17179869184 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
WARNING: [TRT]: Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 17179869184 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
WARNING: [TRT]: Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 17179869184 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
Building complete

yolov8 model inference configuration file

[property]
gpu-id=0

net-scale-factor=0.0039215697906911373
num-detected-classes=1
model-color-format=0
infer-dims=3;640;640
process-mode=2

onnx-file=/app/models/yolov8/yolov8m_best.onnx
model-engine-file=/app/models/yolov8/yolo_best.onnx_b1_gpu0_fp32.engine
labelfile-path=/app/models/yolov8/labels.txt
#int8-calib-file=calib.table

batch-size=1
network-mode=0
num-detected-classes=15
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1

#workspace-size=2000
engine-create-func-name=NvDsInferYoloCudaEngineGet
output-blob-names=BatcheYolo
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=/app/config/yolov8/libnvdsinfer_custom_impl_Yolo.so

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

Hardware Platform (Jetson / GPU)? GPU
DeepStream Version? 6.3
TensorRT Version? 8.5.1
NVIDIA GPU Driver Version (valid for GPU only)? 510.73.08
Issue Type? Bug

Could you attach the onnx model and the video source you are using? You’d better attach the comparison of the results and the commands you used to converting the model outside of DeepStream. Thanks

The problem is the loss of accuracy in detecting the bbox itself. Below is an example:
Example of model detection converted to tensor outside deepstream:

Example of model detection converted to tensor by deepstream:

Video of example:
204.2x54.7.mp4.zip (6.9 MB)

Model Onnx:
yolo_best.onnx.zip (80.7 MB)

Command used to convert the model to .engine outside of DeepStream, was given by yolov8 herself:
!yolo export model=yolo_best.pt format=engine imgsz=640 device=0

So is the way you run the demo these two times exactly the same, but only the way of generating the engine is different?

Could you attach the command to run the demo and the source code of the libnvdsinfer_custom_impl_Yolo.so?

Is there a detailed reference for how to use this command?

I want to try that on my side so we can analyze it faster.

I carried out some more tests and I believed that the problem is with the conversion to the .engine carried out within Deepstream, as the same problem happens when I use either a yoloV8.onnx or a yoloV4.etlt. yoloV8 was trained by the Ultralytics package and was generated in .pt, then converted to ONNX. YoloV4 was trained using NVIDIA’s TAO and I exported it to .etlt within TAO.

When I use the .pt model, the generated BBOX are in the correct positions, and the same happens when I use the .etlt model within TAO, the BBOX are also correct. However, when both models are placed in Deepstream to generate the .engine, the detections present displaced BBOX, presenting a loss of quality.

Below are attachments for attempted reproduction:

Could you please post the parameters you used in the above 2 scenarios?
You can also refer to our FAQ to tune some parameters.

In this case, do you want the parameters for when the inference with yolov4 and v8 was done outside of deepstream?

Yes. And you’d better briefly describe the inference steps outside of deepstream.

For inference with yolov8 with the pt model, the Ultralytics package was used.
Command used was:
!yolo task=detect mode=predict model=best.pt source=pintura2_204.2x54.7.mp4 imgsz=640 name=yolov8n_v8_50 hide_labels=False

To predict the outputs in TAO, the following command was used:

yolo_v4 inference -i /images/ -o /detections/ -e /train_yolov4.txt -m /yolov4_resnet18_epoch_060.tlt -k MnIwOHMzbml2cjB2ZTUzZmhzYnUyNHUwbGg6MDM5OGIwM2UtNzE4ZC00ZWM1LWFmNjgtZjMyNjkxNDJhZmJh

The TAO.zip file obtained from the drive link below contains the input images folder, the train_yolov4.txt configuration file, the yolov4_resnet18_epoch_060.tlt model, the output folder with the detections generated, and a folder etlt_model with the files generated after exporting the .tlt to .etlt.

OK. Could you file a topic on the TAO Forum to learn how to use the tao-deloy to do the inference.

It can directly infer the engine file, which can further narrow down the scope. Thanks