INT8 Calibration with DS 6.3 worse than with DS 6.0

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Orin NX (8GB)

• DeepStream Version 6.3

• JetPack Version (valid for Jetson only) 5.1.2

• TensorRT Version JP 5.1.2 Standard (8,5?)

• NVIDIA GPU Driver Version (valid for GPU only) JP 5.1.2 standard

• Issue Type( questions, new requirements, bugs) Question

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

Trying to create and INT8 optimized TRT engine from my YOLOv4 ONNX model for use with DeepStream-YOLO / nvinfer. I am using the same model file and the same Calibration Images as from Deepstream 6.0.0. I am running the int8 calibration inside Deepstream_app as described here: DeepStream-Yolo/docs/YOLOv5.md at master · marcoslucianops/DeepStream-Yolo · GitHub (without the Ultralytics part at the beginning)

With DS 6.0 on a Jeston NX (not Orin) this resulted in a model with 94% recall over my test dataset. With DS 6.3 and Orin NX i only get 87% (tested many, many calibration runs with different parameters).

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

config_nvinfer_primary.txt

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=yolov4.cfg
model-file=yolov4.weights
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
labelfile-path=dc_vehicles.training.names
batch-size=1
network-mode=1
num-detected-classes=14
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=0
symmetric-padding=1
force-implicit-batch-dim=0
workspace-size=6000
#parse-bbox-func-name=NvDsInferParseYolo
parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
# DLA does not make sense for us with JP 5.1.2
# because our model requires a lot of GPU fallbacks, reducing speed by 40%.
#enable-dla=1
#use-dla-core=0
#gpu-fallback=1

[class-attrs-all]
nms-iou-threshold=0.3
pre-cluster-threshold=0.3
topk=300

Is this something others also experience? This reduction in detection rates makes it a problem for us to upgrade to Jetpack 5.

DeepStream 6.0.1 is based on TensorRT 8.0.1, DeepStream 6.3 is based on TensorRT 8.5.2.2.
The calibration file generated with TensorRT8.0.1 can’t be used with TensorRT 8.5.2.

Please make sure your calibration file are generated with the same TensorRT version you are using. For calibration file generating method, you may refer to NVIDIA-AI-IOT/yolo_deepstream: yolo model qat and deploy with deepstream&tensorrt

This is exactly what i am doing: I am using my model to generate a new int8 calibration for the new Tensorrt Version on the new device. I am using GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models however. Still the resulting model is not as good as the int8-calibrated model on the old version.

Can you try the QAT method in yolo_deepstream/yolov7_qat at main · NVIDIA-AI-IOT/yolo_deepstream?

I tried GitHub - NVIDIA-AI-IOT/yolo_deepstream: yolo model qat and deploy with deepstream&tensorrt but i get an error when trying to use the onnx version of my model to generate a trt engine:
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: INT8 calibration file not specified. Trying FP16 mode.

The yolo_deepstream/yolov7_qat at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub works with yolov7 pt files. I am using Yolov5/darknet. Is it possible to use cfg and . weights files with this, too?

which tensorrt version are you using? and please show me the convert command.

Please use this repo as your code base: GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Which is based on pytorch. Darknet is not friendly for QAT training

Tensorrt Version is 8.5.2.2-1+cuda11.4 (JP 5.1.2)
i tried with the instructions given in the DeepStream-Yolo repository (there it works but the result is worse than with JP 4.6.2/NX before) and with the repo at GitHub - NVIDIA-AI-IOT/yolo_deepstream: yolo model qat and deploy with deepstream&tensorrt. With this one i could not make an INT8 conversion as described in earlier post. it will go to FP16 instead. Will now try using Pytorch and GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite As far as i understant this, the engine needs to be prepared on the device it is used on, right? This would be a Jeston Orin NX in my case.

So i converted my yolov4 model to pt format and tried loading and exporting it like this:

from ultralytics import YOLO

# Load a YOLO11n PyTorch model
model = YOLO("../yolov4.pt")

# Export the model to TensorRT
model.export(format="engine")  # creates 'yolo11n.engine'

# Load the exported TensorRT model
trt_model = YOLO("yolov4.engine")

# Run inference
results = trt_model("https://ultralytics.com/images/bus.jpg")

It runs for a moment, then bails out with:

Traceback (most recent call last):
  File "convert.py", line 4, in <module>
    model = YOLO("../yolov4.pt")
  File "/home/nvidia/.local/lib/python3.8/site-packages/ultralytics/models/yolo/model.py", line 23, in __init__
    super().__init__(model=model, task=task, verbose=verbose)
  File "/home/nvidia/.local/lib/python3.8/site-packages/ultralytics/engine/model.py", line 145, in __init__
    self._load(model, task=task)
  File "/home/nvidia/.local/lib/python3.8/site-packages/ultralytics/engine/model.py", line 285, in _load
    self.model, self.ckpt = attempt_load_one_weight(weights)
  File "/home/nvidia/.local/lib/python3.8/site-packages/ultralytics/nn/tasks.py", line 912, in attempt_load_one_weight
    model = (ckpt.get("ema") or ckpt["model"]).to(device).float()  # FP32 model
KeyError: 'model'

hi, As far as I know, You can not directly load your yolov4.pt via ultralytics.YOLO because the offical yolov4 is tranied via darknet but the ultralytics.YOLO do not support it, You can find some help in GitHub - ultralytics/yolov5: YOLOv5’ 's github issue page.

Back to your original question. You meet a accuracy regression in yolov4’s onnx while deploying it to deepstream between DS6.0 and DS6.3.
my question is:

  1. did you got the onnx model?
  2. can you show your command how you got the model_b1_gpu0_int8.engine?(did you got it via trtexec?)

As i said in my original post: I was using this repo: DeepStream-Yolo/docs/YOLOv5.md at master · marcoslucianops/DeepStream-Yolo · GitHub

We have been using this plugin in DS 6.0 on Jetson NX and now we use the same in DS 6.3 on Jetson Orin NX.

It contains a wrapper for the nvinfer plugin from tensorrt. When you start the pipeline with DeepStream-App and there is no int8 enigne, it will create one. It has the option to load an ONNX model or to directly load a YOLOv4 model.

When it creates the int8 model, it usses our validation images from the training to make int8 calibration.

We do exactly the same steps in DS6.0 and DS6.3. But in DS 6.0 on Jetson NX und JP4.6.2 if a start it about 3 times and use the best engine, i get a much better result than on DS 6.3 on Orin and JP5.1.2.

I used Ultralytics only because you told me to. For that i converted it first from darknet to .pt and then tried to load it with Ultralytics to do the conversion. But this did not work.

Sorry I am not familiar with DeepStream-Yolo/docs/YOLOv5.md at master · marcoslucianops/DeepStream-Yolo · GitHub as it is not a nvidia owned repo. So, can you got the onnx and the calibration file? If you can, I can show you some tensorrt command to help you got almost the same accuraccy cross multiple DS version

ONNX export was done on Colab with:


from ultralytics import YOLO

# Load your trained YOLOv8 model
model = YOLO('/content/runs/detect/train/weights/best.pt')

# Export the model to ONNX format with INT8 quantization
#exported_model_path = model.export(format='onnx', dynamic=True, simplify=True, opset=17, int8=True, data='/content/data.yaml')
exported_model_path = model.export(
    format='onnx',
    imgsz=(640, 640),  # Set your desired input size here, e.g., (640, 640)
    simplify=True,
    opset=17,
    data='/content/data.yaml',
    half=True
)

int8 calibration was done by starting the DS pipeline from DeepStream-Yolo with the calibration config above. This will use tensorrt to perform the int8 calibration.
The ouput showed all images were used. It produce a calibration table and the int8 engine.

U can use the
trtexec —onnx=yolov4.onnx —int8 —fp16 —calib=calibrationfilegotfromds60.calib —saveEngine=yolov4.engine
Then use engine-file to start deepstream. The same calibration file and same onnx should get the same accuracy

where would i get this from?

ah. you mean the calib.table generated by the DS 6 calibration?
interesting idea.

seems like calibration.table from DS 6.0 cannot be used in DS 6.3:

Building the TensorRT Engine

ERROR: [TRT]: 4: [standardEngineBuilder.cpp::initCalibrationParams::1460] Error Code 4: Internal Error (Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.)
Building engine failed

can you upload your onnx model and calibrationfile which can reproduce the error?
I will check in my side.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks