INT8 Yolo model conversion led to accuracy drop in deepstream

Hi there,
As stated here , I was able to calibrate and generate an int8 engine in the YOLO example. However, the performance(mAP) of the int8 model dropped about 7-15% compared with the fp32 model. Is this normal? How can I improve it?

My setup is the following:
Jetson Xavier
DeepStream 5.0
JetPack 4.4
TensorRT 7.1.3
NVIDIA GPU Driver Version 10.2


It’s possible to have some accuracy drop when inferencing with INT8.
The amount depends on the calibration and the model’s property.

But 7-15% seems too much.
Would you mind to share the original file (ex. onnx, pb or .caffe) with us?
As well as the data and source you used for generating the calibration cache?



Thanks for the swift response. Original model files I used were a darknet weight file and a cfg file. As for calibration, I firstly selected 200 random images from the training set as calib dataset. Then I used the entire training set as calib dataset. It seems that the latter option offered better performance.

I uploaded model files and a small subset of the training set.

Thanks in advance.


Thanks for sharing the data with us.

Have you tried the model with TensorRT API directly?
If not, would you mind to give it a try?

This will help us to distinguish the issue is from Deepstream or TensorRT.


I tried. Two methods give very similar results with a tiny difference in mAP for less than 0.5%.


Thanks for sharing the subset with us.
Could you also share the source for generating calibration file with us?


Here is the tensorrt API source, and the Deepstream source.


We are checking this issue internally.
Will update more with you later.


Thanks. Looking forward to your update


We check the calibration shared in this comment.

In general, TensorRT will merge/combine several layers together for acceleration (ex. conv+scale+activation).
However, the layer are calibrated without merging in your cache file.

Not sure if this causes some unexpected accuracy drop.
Would you mind to try the calibration tool shared in the below GitHub again:

We have verified that the cache files in the GitHub can output the detections correctly.


Thanks for your feedback!

I’ll give it a try.

After using the cache file generated from the recommended repo in the DS yolo app, the performance speed dropped significantly, to about 8fps.

It seems that tensorrt did not know how to perform INT8 quantization based on the give calibration cache so it ended up making a FLOAT32 or 16 engine.

I might have misunderstood some of your statements. So when you say

did you test it out in the deepstream yolo-app?

Thanks again for your help.


Could you share your detailed procedure with us?

The INT8 is a configuration set by the user.
So the model will inference in INT8 mode if configure and cache provided correctly.


int8-calib-file=[the cache file generated above]
## 0=FP32, 1=INT8, 2=FP16 mode


I followed the demo#5 to create an onnx file, and followed demo#6 to calibrate and get a calibration cache. Then I used the cache in the deepstream yolo-app as int8-calib-file.


Could you share the .cfg, .weight, .onnx and the corresponding cache file with us?

More, we test the default YoloV3 Tiny model cache.
And can get the expected output result.

Please validate if this also works on your side.

Here are the files.

When I used tensorrt-demo generated caches within the repo, they all worked fine. When I moved the cache into deepstream, I got the following:

ERROR: [TRT]: Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.
ERROR: [TRT]: Builder failed while configuring INT8 mode.
Building engine failed!

I also tried the yolov3-tiny cache as you suggested, and the same thing happened – it only works for the given repo and cannot be transferred to Deepstream. The error is the same as mentioned above.


Thanks for sharing the model and cache.

We can reproduce this issue internally, and is checking.
Will get back to you late.


We change the layer name 000_net into data in calib_yolov3-int8-608.bin.

data: 3c010a14

And Deepstream can run the model with cache successfully.
Could you also give it a try?


Of course. I’ll work on it as soon as I can.


It did work. However the accuracy only increased for about 1.2 percent, which means the INT8 quantization still caused about 6 percent accuracy drop. Is there any other way to improve this?