Loss of precision to onnx converter for engine by deepstream 6.3

When converting the YOLOv8 ONNX model to the DeepStream engine, there is a loss of accuracy, especially for larger objects. However, when converting the model outside of DeepStream, from ONNX to Engine, accuracy is not affected. The same problem happens for yolov4 trained on tao

Warning logs when a conversion done

o from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in 1. Introduction — CUDA C Programming Guide
WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
WARNING: [TRT]: Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 17179869184 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
WARNING: [TRT]: Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 17179869184 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
WARNING: [TRT]: Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 17179869184 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
ERROR: [TRT]: 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
WARNING: [TRT]: Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 17179869184 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
Building complete

yolov8 model inference configuration file

[property]
gpu-id=0

net-scale-factor=0.0039215697906911373
num-detected-classes=1
model-color-format=0
infer-dims=3;640;640
process-mode=2

onnx-file=/app/models/yolov8/yolov8m_best.onnx
model-engine-file=/app/models/yolov8/yolo_best.onnx_b1_gpu0_fp32.engine
labelfile-path=/app/models/yolov8/labels.txt
#int8-calib-file=calib.table

batch-size=1
network-mode=0
num-detected-classes=15
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1

#workspace-size=2000
engine-create-func-name=NvDsInferYoloCudaEngineGet
output-blob-names=BatcheYolo
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=/app/config/yolov8/libnvdsinfer_custom_impl_Yolo.so

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

Hardware Platform (Jetson / GPU)? GPU
DeepStream Version? 6.3
TensorRT Version? 8.5.1
NVIDIA GPU Driver Version (valid for GPU only)? 510.73.08
Issue Type? Bug

Could you attach the onnx model and the video source you are using? You’d better attach the comparison of the results and the commands you used to converting the model outside of DeepStream. Thanks

The problem is the loss of accuracy in detecting the bbox itself. Below is an example:
Example of model detection converted to tensor outside deepstream:

Example of model detection converted to tensor by deepstream:

Video of example:
204.2x54.7.mp4.zip (6.9 MB)

Model Onnx:
yolo_best.onnx.zip (80.7 MB)

Command used to convert the model to .engine outside of DeepStream, was given by yolov8 herself:
!yolo export model=yolo_best.pt format=engine imgsz=640 device=0

So is the way you run the demo these two times exactly the same, but only the way of generating the engine is different?

Could you attach the command to run the demo and the source code of the libnvdsinfer_custom_impl_Yolo.so?

Is there a detailed reference for how to use this command?

I want to try that on my side so we can analyze it faster.

I carried out some more tests and I believed that the problem is with the conversion to the .engine carried out within Deepstream, as the same problem happens when I use either a yoloV8.onnx or a yoloV4.etlt. yoloV8 was trained by the Ultralytics package and was generated in .pt, then converted to ONNX. YoloV4 was trained using NVIDIA’s TAO and I exported it to .etlt within TAO.

When I use the .pt model, the generated BBOX are in the correct positions, and the same happens when I use the .etlt model within TAO, the BBOX are also correct. However, when both models are placed in Deepstream to generate the .engine, the detections present displaced BBOX, presenting a loss of quality.

Below are attachments for attempted reproduction:

Could you please post the parameters you used in the above 2 scenarios?
You can also refer to our FAQ to tune some parameters.

In this case, do you want the parameters for when the inference with yolov4 and v8 was done outside of deepstream?

Yes. And you’d better briefly describe the inference steps outside of deepstream.

For inference with yolov8 with the pt model, the Ultralytics package was used.
Command used was:
!yolo task=detect mode=predict model=best.pt source=pintura2_204.2x54.7.mp4 imgsz=640 name=yolov8n_v8_50 hide_labels=False

To predict the outputs in TAO, the following command was used:

yolo_v4 inference -i /images/ -o /detections/ -e /train_yolov4.txt -m /yolov4_resnet18_epoch_060.tlt -k MnIwOHMzbml2cjB2ZTUzZmhzYnUyNHUwbGg6MDM5OGIwM2UtNzE4ZC00ZWM1LWFmNjgtZjMyNjkxNDJhZmJh

The TAO.zip file obtained from the drive link below contains the input images folder, the train_yolov4.txt configuration file, the yolov4_resnet18_epoch_060.tlt model, the output folder with the detections generated, and a folder etlt_model with the files generated after exporting the .tlt to .etlt.

OK. Could you file a topic on the TAO Forum to learn how to use the tao-deloy to do the inference.

It can directly infer the engine file, which can further narrow down the scope. Thanks

I’m doing the tests together with FranJose.

The first step was to train a yolov4 on TAO 4.0.4, the docker image used was nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5. The GPU is a Tesla V100-PCIE-16GB.

Training command:

yolo_v4 train -e train_yolov4.txt -r results/ -k my_key

Then I exported the .tlt to .etlt:

yolo_v4 export -m my_model.tlt -o my_model.etlt -e train_yolov4.txt -k my_key --data_type fp16 --gen_ds_config

After that, I installed tao-deploy and converted the etlt model to engine using the following command:

yolo_v4 gen_trt_engine -m my_model.etlt -e train_yolov4.txt -r results/ --data_type fp16 -k my_key --engine_file my_model.engine

Then, I did the inference using the generated .engine:

yolo_v4 inference -e train_yolov4.txt -m my_model.engine -r output/ -i my_images/

The inference worked and the images had the correct BBOXs, which shows that the .etlt and .engine generated are correct. However, when I take the .engine generated in tao-deploy to Deep Stream, an error occurs when reading the model. So I changed to using .etlt and it worked, DeepStream generated a new .engine from .etlt and can make inferences from it. However, the inferences come out with low precision and with the BBOXs displaced, just like the images at the top of this post that FranJose showed.

These tests prove that the problem is when generating the .engine through DeepStream, or that there is a problem with the DeepStream configuration files.

After that I tried to do other tests to see if I could solve the problem using a Yolov8 trained with the Ultralytics Package. The trained model is in .pt format and the hardware I am using now for testing is a Jetson Orin NX with Jetpack 5.1.2 - L4T.35.4.1.

As a first step, I followed this tutorial and did all the installations, then I used the following commands to generate the .engine and make an inference:

Export a YOLOv8n PyTorch model to TensorRT format

yolo export model=yolov8n.pt format=engine # creates ‘yolov8n.engine’

Run inference with the exported model

yolo prediction model=yolov8n.engine source=‘/path/image.png’

Again, the images generated by this inference have the correct BBOXs, exactly the same as those generated using .pt. The problem is when I use the .engine generated using the yolo export in DeepStream. The following error occurs:

gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is OFF
[NvMultiObjectTracker] Initialized
ERROR: [TRT]: 1: [stdArchiveReader.cpp::StdArchiveReader::32] Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match)
ERROR: [TRT]: 4: [runtime.cpp::deserializeCudaEngine::65] Error Code 4: Internal Error (Engine deserialization failed.)
ERROR: Deserialize engine failed from file: /app/models/yolov8/yolo_best_generated_ultralicts.engine
0:00:07.596045417 4474 0x68a7210 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<plate_detector> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/app/models/yolov8/yolo_best_generated_ultralicts.engine failed
0:00:07.883237850 4474 0x68a7210 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<plate_detector> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/app/models/yolov8/yolo_best_generated_ultralicts.engine failed, try rebuild
0:00:07.883418177 4474 0x68a7210 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<plate_detector> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
File does not exist:
Darknet weights file does not exist
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:11.115029669 4474 0x68a7210 ERROR nvinfer gstnvinfer.cpp:674:gst_nvinfer_logger:<plate_detector> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1943> [UID = 1]: build engine file failed
0:00:11.386976905 4474 0x68a7210 ERROR nvinfer gstnvinfer.cpp:674:gst_nvinfer_logger:<plate_detector> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2029> [UID = 1]: build backend context failed
0:00:11.388740843 4474 0x68a7210 ERROR nvinfer gstnvinfer.cpp:674:gst_nvinfer_logger:<plate_detector> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cp

When I generate the .onnx from the .pt and use it in DeepStream, the .engine is generated, however, again the predictions come out with the BBOXs displaced. My suspicion is that there is an error in the generation of the .engine by DeepStream, or that there is a misconfigured configuration file or plugin.

Below is the link to a drive folder with the yolov8 files and models generated in the tests:

Drive folder

Description of files:

  • config_infer_primary_yoloV8.txt : configuration file of yolov8 in DeepStream;

  • libnvdsinfer_custom_impl_Yolo.so : file used in DeepStream

  • yolo_best.pt : Pt model trained using the Ultralytics package.

  • yolo_best_vm.onnx : ONNX generated using the Ultralytics package.

  • yolo_best_generated_ultralicts.engine : .engine generated using the Ultralytics package outside the DeepStream.

  • model_b1_gpu0_fp32_from_vm.engine : .engine generated inside the DeepStream using the yolo_best_vm.onnx.

This should be caused by the inconsistency version between the tensorRT version you used to build the engine and the tensorRT version you used for the DeepStream. Since you don’t provide the label file, I just use the labels.txt here.

I tried your model and config file on my Jetson Orin with DeepStream 7.0. It looks like that the acuracy is okay.

What model was used in this test? Was it the .engine or the .onnx?

I converted from .onnx to .engine within the DeepStream container, so it was supposed to be using the same version of TensorRT.

I’m using version 6.2 of DeepStream. I tried to install DeepStream7.0 but I had errors with some plugins.
What version of Ubuntu are you using?

Could you provide me the DeepStream7.0 Docker-file and config file of the model that you use?

The model, config files are all from the google driver you attached before.

You need to flash your jetpack version to 6.0 if you want to install DeepStream 7.0.

The only change I made was to configure the width and height in nvstreammux to be the width and height of the video. I mentioned this before, I don’t know if you change that 15.4.2.

Yes, I changed the width and height in nvstreammux. I saw the tips in the link of the SDK fac.

Could you run the same model that you used in the previous video but using this new video.
pintura1.mp4.zip (2.3 MB)

I just checked the same video you ran the application and the BBOXs were correct here in my application, but in this other one the BBOXs are displaced. Using the .pt model they come out correct.

I’m thinking it has something to do with the FPS of the video. The first video that the BBOXs are corrects is in 30FPS, but this new video is in 5FPS.

I did other tests and apparently the problem really is the FPS of the input video. When I use 30FPS the BBOXs are in the correct locations, and when I lower it to 5FPS the BBOXs are displaced.
I searched for a parameter in the application that represents the FPS of the input video but I didn’t find it.
Do you know how I can solve this problem?

The fps of the new video you attached is 30fps too. You can try to update your DeepStream to 7.0 first and check the result.
Please refer to our Guide compatibility to flash your jetson board.

At the moment I can’t update my Jetson’s. I will have to continue using DeepStream6.2.

The second video was obtained from a camera at 5FPS, it is at 30FPS because it is configured that way, but if you look closely you can see that the speed of the plates is much higher compared to the first video. And also, the distance between the same plate is greater from one frame to the next.

I did some tests using the camera directly and confirmed that the problem is the FPS of the video, when the camera is at 30FPS the BBOXs come out correctly, but when I change to 5FPS they become displaced. I did a lot of tests with the same plates and just changing the FPS of the camera. Always when the camera was at 30FPS the BBOXs were correct, but when it changed to 5FPS they were displaced.
I also tested intermediate FPS values, and each time I increased the FPS, the BBOXs got closer to the correct locations. The lower the FPS, the more displaced they were.

I can’t leave the camera at 30FPS due to the computational cost of the application. Also, I didn’t find any parameter in DeepStream that has a relation between the inference and the FPS of the video. The only thing that came to mind it was something related to the tracking, because in the videos obtained at 30 FPS the variation in position from one plate to another is small between the frames, and in the videos that were obtained at 5FPS the distance from the plate between the frames is larger.

Is there any parameter that makes the inferences take into account the track of the detected objects?

No. It’s weird that I use ffmpeg command to change the fps of your video to 5fps. It works well too. This is my app config file.
app_config.txt (1.2 KB)

But how can I make the inferences correct from the camera?